[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730272#action_12730272
 ] 

Michael McCandless commented on LUCENE-1726:


{quote}
I haven't really figured out a clean way to move the reader
creation out of the reader pool synchronization. It turns out to
be somewhat tricky, unless we redesign our synchronization.
{quote}

Maybe we should simply hold off for now?

I don't think this sync is costing much in practice, now.
Ie, IndexReader.open is not concurrent when opening its segments; nor
would we expect multiple threads to be calling IndexWriter.getReader
concurrently.

There is a wee bit of concurrency we are preventing, ie for a merge or
applyDeletes to get a reader just as an NRT reader is being opened,
but realistically 1) that's not very costly, and 2) we can't gain that
concurrency back anyway because we synchronize on IW when opening the
reader.


 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1740) Lucli: Command to change the Analyzer

2009-07-13 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1740.


Resolution: Fixed

I just committed this.  Thanks Bernd!

 Lucli: Command to change the Analyzer
 -

 Key: LUCENE-1740
 URL: https://issues.apache.org/jira/browse/LUCENE-1740
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 2.9
Reporter: Bernd Fondermann
 Fix For: 2.9

 Attachments: analyzer_command.patch


 Currently, Lucli is hardcoded to use StandardAnalyzer. The provided patch 
 introduces a command analyzer to specify a different Analyzer class. 
 If something fails, StandardAnalyzer is the fall-back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Relevance's scores on TopFieldCollector/FieldComparator

2009-07-13 Thread Michael McCandless
It's odd that this was necessary.

The ScoreCachingWrapperScorer simply wraps (and caches) the result
from calling score(), per hit, so that if score() is called more than
once we don't have to re-compute it.  I don't understand why you were
always seeing 0 score come back from it.

Mike

On Thu, Jul 9, 2009 at 9:09 AM, Raimon Boschraimon.bo...@gmail.com wrote:


 It Worked for me changing:

 public void setScorer(Scorer scorer) {
      this.scorer = new ScoreCachingWrappingScorer(scorer);
 }

 by

 public void setScorer(Scorer scorer) {
      this.scorer = scorer;
 }

 in my PseudoRandomFieldComparator.

 Regards,
 Raimon Bosch.


 Raimon Bosch wrote:

 Hi,

 I've just implemented my PseudoRandomFieldComparator (migrated from
 PseudoRandomComparatorSource) on Solr. The problem that I see is that I
 don't have acces to the relevance's scores like in the deprecated class
 ComparatorSource.

 I saw that the TopFieldCollector is filling the scorer of my
 PseudoRandomFieldComparator, but the method scorer.score() is always
 returning 0. It's like we pass correctly the scorer, but the scorer is
 making reference to a wrong scores's array.

 How can I have my relevance's scores on my PseudoRandomFieldComparator?
 Any ideas?


 Regards,
 Raimon Bosch.


 --
 View this message in context: 
 http://www.nabble.com/Relevance%27s-scores-on-TopFieldCollector-FieldComparator-tp24407379p24409794.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-13 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1566:
--

Assignee: Simon Willnauer  (was: Michael McCandless)

Assigning this one back to you Simon!

 Large Lucene index can hit false OOM due to Sun JRE issue
 -

 Key: LUCENE-1566
 URL: https://issues.apache.org/jira/browse/LUCENE-1566
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.4.1
Reporter: Michael McCandless
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1566.patch, LUCENE-1566.patch


 This is not a Lucene issue, but I want to open this so future google
 diggers can more easily find it.
 There's this nasty bug in Sun's JRE:
   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
 The gist seems to be, if you try to read a large (eg 200 MB) number of
 bytes during a single RandomAccessFile.read call, you can incorrectly
 hit OOM.  Lucene does this, with norms, since we read in one byte per
 doc per field with norms, as a contiguous array of length maxDoc().
 The workaround was a custom patch to do large file reads as several
 smaller reads.
 Background here:
   http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1678) Deprecate Analyzer.tokenStream

2009-07-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730303#action_12730303
 ] 

Uwe Schindler commented on LUCENE-1678:
---

Your solution is also cool, to fix the last problems with the core token 
streams in LUCENE-1693: If somebody overrides a deprecated method in one of the 
core tokenstreams (that are not final), the method is never called, because the 
indexer uses incrementToken per default. The same can be used to fix this 
problem in TokenStream, too.

I will prepare a patch for this (I am currently preparing a new patch with some 
tests and the solution for the problems with number of attribute instances may 
not be equals number of attributes).

 Deprecate Analyzer.tokenStream
 --

 Key: LUCENE-1678
 URL: https://issues.apache.org/jira/browse/LUCENE-1678
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1678.patch


 The addition of reusableTokenStream to the core analyzers unfortunately broke 
 back compat of external subclasses:
 
 http://www.nabble.com/Extending-StandardAnalyzer-considered-harmful-td23863822.html
 On upgrading, such subclasses would silently not be used anymore, since 
 Lucene's indexing invokes reusableTokenStream.
 I think we should should at least deprecate Analyzer.tokenStream, today, so 
 that users see deprecation warnings if their classes override this method.  
 But going forward when we want to change the API of core classes that are 
 extended, I think we have to  introduce entirely new classes, to keep back 
 compatibility.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-07-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730306#action_12730306
 ] 

Uwe Schindler commented on LUCENE-1693:
---

Mike implemented a nice idea to solve the problems with tokenstreams overriding 
deprecated methods in LUCENE-1678.

I will try this out here and also fix the problems with # of attribute 
instances != # of attributes and the iterator problems because of this.

 AttributeSource/TokenStream API improvements
 

 Key: LUCENE-1693
 URL: https://issues.apache.org/jira/browse/LUCENE-1693
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, 
 LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, 
 LUCENE-1693.patch, lucene-1693.patch, TestCompatibility.java, 
 TestCompatibility.java, TestCompatibility.java, TestCompatibility.java


 This patch makes the following improvements to AttributeSource and
 TokenStream/Filter:
 - removes the set/getUseNewAPI() methods (including the standard
   ones). Instead by default incrementToken() throws a subclass of
   UnsupportedOperationException. The indexer tries to call
   incrementToken() initially once to see if the exception is thrown;
   if so, it falls back to the old API.
 - introduces interfaces for all Attributes. The corresponding
   implementations have the postfix 'Impl', e.g. TermAttribute and
   TermAttributeImpl. AttributeSource now has a factory for creating
   the Attribute instances; the default implementation looks for
   implementing classes with the postfix 'Impl'. Token now implements
   all 6 TokenAttribute interfaces.
 - new method added to AttributeSource:
   addAttributeImpl(AttributeImpl). Using reflection it walks up in the
   class hierarchy of the passed in object and finds all interfaces
   that the class or superclasses implement and that extend the
   Attribute interface. It then adds the interface-instance mappings
   to the attribute map for each of the found interfaces.
 - AttributeImpl now has a default implementation of toString that uses
   reflection to print out the values of the attributes in a default
   formatting. This makes it a bit easier to implement AttributeImpl,
   because toString() was declared abstract before.
 - Cloning is now done much more efficiently in
   captureState. The method figures out which unique AttributeImpl
   instances are contained as values in the attributes map, because
   those are the ones that need to be cloned. It creates a single
   linked list that supports deep cloning (in the inner class
   AttributeSource.State). AttributeSource keeps track of when this
   state changes, i.e. whenever new attributes are added to the
   AttributeSource. Only in that case will captureState recompute the
   state, otherwise it will simply clone the precomputed state and
   return the clone. restoreState(AttributeSource.State) walks the
   linked list and uses the copyTo() method of AttributeImpl to copy
   all values over into the attribute that the source stream
   (e.g. SinkTokenizer) uses. 
 The cloning performance can be greatly improved if not multiple
 AttributeImpl instances are used in one TokenStream. A user can
 e.g. simply add a Token instance to the stream instead of the individual
 attributes. Or the user could implement a subclass of AttributeImpl that
 implements exactly the Attribute interfaces needed. I think this
 should be considered an expert API (addAttributeImpl), as this manual
 optimization is only needed if cloning performance is crucial. I ran
 some quick performance tests using Tee/Sink tokenizers (which do
 cloning) and the performance was roughly 20% faster with the new
 API. I'll run some more performance tests and post more numbers then.
 Note also that when we add serialization to the Attributes, e.g. for
 supporting storing serialized TokenStreams in the index, then the
 serialization should benefit even significantly more from the new API
 than cloning. 
 Also, the TokenStream API does not change, except for the removal 
 of the set/getUseNewAPI methods. So the patches in LUCENE-1460
 should still work.
 All core tests pass, however, I need to update all the documentation
 and also add some unit tests for the new AttributeSource
 functionality. So this patch is not ready to commit yet, but I wanted
 to post it already for some feedback. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: 

[jira] Created: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)
Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
files in smaller parts
---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.1


This is a followup for java-user thred: 
http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b

It is easy to implement, just add a setter method for this parameter to MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1741:
--

Attachment: LUCENE-1741.patch

Patch that allows configuration of chunk size. I will commit in the evening 
(MEZ).

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1741:
--

Fix Version/s: (was: 3.1)
   2.9

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730368#action_12730368
 ] 

Michael McCandless commented on LUCENE-1741:


Should we default the chunking size to something smaller (128 MB?) on 32 bit 
JRE?

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730373#action_12730373
 ] 

Uwe Schindler commented on LUCENE-1741:
---

Good idea. Do we have still this 64bit detection property in the utils? If yes, 
this could be easily done.

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1741:
--

Attachment: LUCENE-1741.patch

Attached is a patch using the JRE_IS_64BIT in Constants. I set the default to 
256 MiBytes (128 seems to small for large indexes, if the index is e.g. about 
1.5 GiBytes, you would get 6 junks.

I have no test data which size is good, it is just trying out (and depends e.g. 
on how often you reboot Windows, as Eks said).

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread eks dev

I have no test data which size is good, it is just trying out

Sure, for this you need bad OS and large index, you are not as lucky as I am to 
have it  :)

Anyhow, I would argument against default value. An algorithm is quite simple, 
if you hit OOM on map(), reduce this value until it fits :)
no need to touch it if it works...




- Original Message 
 From: Uwe Schindler (JIRA) j...@apache.org
 To: java-dev@lucene.apache.org
 Sent: Monday, 13 July, 2009 17:21:15
 Subject: [jira] Updated: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user 
 configureable to support chunking the index files in smaller parts
 
 
  [ 
 https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
  
 ]
 
 Uwe Schindler updated LUCENE-1741:
 --
 
 Attachment: LUCENE-1741.patch
 
 Attached is a patch using the JRE_IS_64BIT in Constants. I set the default to 
 256 MiBytes (128 seems to small for large indexes, if the index is e.g. about 
 1.5 GiBytes, you would get 6 junks.
 
 I have no test data which size is good, it is just trying out (and depends 
 e.g. 
 on how often you reboot Windows, as Eks said).
 
  Make MMapDirectory.MAX_BBUF user configureable to support chunking the 
  index 
 files in smaller parts
  
 ---
 
  Key: LUCENE-1741
  URL: https://issues.apache.org/jira/browse/LUCENE-1741
  Project: Lucene - Java
   Issue Type: Improvement
 Affects Versions: 2.9
 Reporter: Uwe Schindler
 Assignee: Uwe Schindler
 Priority: Minor
  Fix For: 2.9
 
  Attachments: LUCENE-1741.patch, LUCENE-1741.patch
 
 
  This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
  It is easy to implement, just add a setter method for this parameter to 
 MMapDir.
 
 -- 
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.
 
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730378#action_12730378
 ] 

Michael McCandless commented on LUCENE-1741:


Patch looks good!

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-13 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730424#action_12730424
 ] 

Jason Rutherglen commented on LUCENE-1726:
--

I was thinking the sync on all of readerPool could delay someone
trying to call IW.getReader who would wait for a potentially
large new segment to be warmed. However because IW.mergeMiddle
isn't loading the term index, IW.getReader will pay the cost of
loading the term index. So yeah, it doesn't seem necessary.

 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1712) Set default precisionStep for NumericField and NumericRangeFilter

2009-07-13 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1712:
--

Attachment: LUCENE-1712.patch

Attached is a patch with the default precisionStep of 4. The javadocs of 
NumericRangeQuery list all possible and senseful values.

This patch also contains some cleanup in NumericUtils (rename constants) and a 
lot of other JavaDocs fixes. It also changes the token types of the TokenStream 
(no difference between 32/64 bit vals needed) and adds a test for them.

 Set default precisionStep for NumericField and NumericRangeFilter
 -

 Key: LUCENE-1712
 URL: https://issues.apache.org/jira/browse/LUCENE-1712
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1712.patch


 This is a spinoff from LUCENE-1701.
 A user using Numeric* should not need to understand what's
 under the hood in order to do their indexing  searching.
 They should be able to simply:
 {code}
 doc.add(new NumericField(price, 15.50);
 {code}
 And have a decent default precisionStep selected for them.
 Actually, if we add ctors to NumericField for each of the supported
 types (so the above code works), we can set the default per-type.  I
 think we should do that?
 4 for int and 6 for long was proposed as good defaults.
 The default need not be perfect, as advanced users can always
 optimize their precisionStep, and for users experiencing slow
 RangeQuery performance, NumericRangeQuery with any of the defaults we
 are discussing will be much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1712) Set default precisionStep for NumericField and NumericRangeFilter

2009-07-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730538#action_12730538
 ] 

Michael McCandless commented on LUCENE-1712:


Patch looks good Uwe!

 Set default precisionStep for NumericField and NumericRangeFilter
 -

 Key: LUCENE-1712
 URL: https://issues.apache.org/jira/browse/LUCENE-1712
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Michael McCandless
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1712.patch


 This is a spinoff from LUCENE-1701.
 A user using Numeric* should not need to understand what's
 under the hood in order to do their indexing  searching.
 They should be able to simply:
 {code}
 doc.add(new NumericField(price, 15.50);
 {code}
 And have a decent default precisionStep selected for them.
 Actually, if we add ctors to NumericField for each of the supported
 types (so the above code works), we can set the default per-type.  I
 think we should do that?
 4 for int and 6 for long was proposed as good defaults.
 The default need not be perfect, as advanced users can always
 optimize their precisionStep, and for users experiencing slow
 RangeQuery performance, NumericRangeQuery with any of the defaults we
 are discussing will be much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730543#action_12730543
 ] 

Uwe Schindler commented on LUCENE-1741:
---

Eks Dev wrote in java-dev:

bq. I have no test data which size is good, it is just trying out

Sure, for this you need bad OS and large index, you are not as lucky as I am to 
have it  :)

Anyhow, I would argument against default value. An algorithm is quite simple, 
if you hit OOM on map(), reduce this value until it fits :)
no need to touch it if it works...


 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730547#action_12730547
 ] 

Uwe Schindler commented on LUCENE-1741:
---

OK, we have two patches, we can think about using one of them.

In my opinion, there is no problem with limiting the chunk size on 32 bit 
systems. The overhead of choosing the right chunk is neglectible, as it only 
affects seeking. Normal sequential reads must only check, if the current chunk 
has enough data and if not, move to the next. The non-chunked stream does this 
check, too (to throw EOF). With a chunk size of 256 MB, the theoretical maximum 
number of chunks is 8 (which can be never reached...).

Any other comments?

Eks: What was you value, that fixed your problem without rebooting. And: How 
big was your biggest index file?

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Paul Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730551#action_12730551
 ] 

Paul Smith commented on LUCENE-1741:


An algorithm is nice if there are no specific settings specified, but in an 
environment where large indexes may be opened more frequently than the common 
use cases, then what is happening is that the Memory layer is getting OOM 
conditions too much, forcing too much GC activity to attempt the operation.  

I'd vote for checking if settings have been requested and using them, and if 
not set rely on a self-tuning algorithm.  

In a really long running application, the process address space may become more 
and more fragmented, and the malloc library may not be able to defragment it, 
so the auto-tuning is nice, but it may not be great for all peoples needs.  

For example, our specific use case (crazy as this may be) is to have many 
different indexes open at any one time, closing and opening them frequently 
(the Realtime Search stuff we are following very closely indeed.. :) ).  I'm 
just thinking that our VM (64bit) may find it difficult to find the contiguous 
non-heap space for the MMap operation after many days/weeks in operation.  

Maybe I'm just paranoid. But for operational purposes, it'd be nice to know we 
could change the setting based on our observations.

thanks!



 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730560#action_12730560
 ] 

Eks Dev commented on LUCENE-1741:
-

Uwe, you convinced me, I looked at the code, and indeed, no performance penalty 
for this. 

what helped me  was 1.1G... (I've tried to find maximum); Max file size is 1.4G 
... but 1.1 is just OS coincidence, no magic about it. 

I guess 512mb makes a good value, if memory is so fragmented that you cannot 
allocate 0.5G, you are definitely having some other problems around. We are 
taliking here about VM memory, and even on windows having 512Mb in block is not 
an issue (or better said, I have never seen problems with this value).

@Paul: It is misunderstanding, my algorithm was meant to be manual... no 
catching OOM and retry (I've burned my fingers already on catching 
RuntimeException, do only when absolutely desperate :). Uwe made this value 
user settable anyhow.  

Thanks Uwe!

  
   

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730580#action_12730580
 ] 

Michael McCandless commented on LUCENE-1741:


I'd be more comfortable w/ 256 MB (or, smaller); I think fragmentation could 
easily cause 512MB to give the false OOM.  I don't think we'll see real perf 
costs from buffer switching unless chunk size is very small (eg  1 MB).

In any event, Uwe can you add to the javadocs describing this false OOM problem 
and what to do if you hit it?

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1741) Make MMapDirectory.MAX_BBUF user configureable to support chunking the index files in smaller parts

2009-07-13 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730590#action_12730590
 ] 

Uwe Schindler commented on LUCENE-1741:
---

Javadocs state (in FileChannel#map): For most operating systems, mapping a 
file into memory is more expensive than reading or writing a few tens of 
kilobytes of data via the usual read and write methods. From the standpoint of 
performance it is generally only worth mapping relatively large files into 
memory.

So it should be as big as possible. A second problem with too many buffers is, 
that the MMU/TLB cannot handle too many of them effective.

In my opinion, maybe we could enhance MMapDirectory to work together with 
FileSwitchDirectory or something like that, to only use mmap for large files 
and all others handled by NIO/Simple. E.g. mapping the segments.gen file into 
memory is really wasting resources. So MMapDir would only return the 
MMapIndexInput, if the underlying file is  X Bytes (e.g. 8 Megabytes per 
default) and fall back to SimpleFSIndexInput otherwise.

bq. In any event, Uwe can you add to the javadocs describing this false OOM 
problem and what to do if you hit it?

Will do this tomorrow, will go to bed now.

Here are also some other numbers about this problem: 
http://groups.google.com/group/jsr203-interest/browse_thread/thread/66f6a5042f2b0c4a/12228bbd57d1956d

 Make MMapDirectory.MAX_BBUF user configureable to support chunking the index 
 files in smaller parts
 ---

 Key: LUCENE-1741
 URL: https://issues.apache.org/jira/browse/LUCENE-1741
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 2.9
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1741.patch, LUCENE-1741.patch


 This is a followup for java-user thred: 
 http://www.lucidimagination.com/search/document/9ba9137bb5d8cb78/oom_with_2_9#9bf3b5b8f3b1fb9b
 It is easy to implement, just add a setter method for this parameter to 
 MMapDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Lucene contrib build failing on jdk 1.4

2009-07-13 Thread Adriano Crestani
Hi,

I checked out Lucene java trunk and ran build-contrib. When I run using
Sun JDK 1.6.0_07 it builds successfully, but using Sun JDK 1.4.2_19 I get
the error below. Am I doing something wrong?

Thanks in advance,
Adriano Crestani Campos

...
build-artifacts-and-tests:

bdb:

BUILD FAILED
/home/adcampos/lucene/trunk2/build.xml:626: The following error occurred
while executing this line:
/home/adcampos/lucene/trunk2/build.xml:616: The following error occurred
while executing this line:
/home/adcampos/lucene/trunk2/contrib/db/build.xml:29: The following error
occurred while executing this line:
/home/adcampos/lucene/trunk2/contrib/db/bdb/build.xml:14:
java.lang.UnsupportedClassVersionError: com/sleepycat/db/internal/Db
(Unsupported major.minor version 49.0)

Total time: 17 seconds


[jira] Updated: (LUCENE-1567) New flexible query parser

2009-07-13 Thread Adriano Crestani (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adriano Crestani updated LUCENE-1567:
-

Attachment: lucene_1567_adriano_crestani_07_13_2009.patch

Hey guys,

Here is a patch containing some changes I did on top of last Luis' patch (  
lucene_trunk_FlexQueryParser_2009July10_v5.patch):

- javadoc reviewed and improved

- 2 new classes: QueryParserHelper and LuceneQueryParserHelper, they make it 
easier to use the new query parser

- added the ability to set the prefix length for fuzzy queries, it was still 
missing in the new query parser

- resolved some TODOs

- AnalyzerQueryNodeProcessor is now using only the new TokenStream API...is it 
required to be compatible with the old API even if it is in contrib?

- I duplicated the test cases so they run using the query parser API directly, 
the query parser helpers and the query parser wrappers, this way we test the 
three ways the user can actually use the query parser.

I think that is everything. I will keep reviewing and improving the 
documentation, I think there might be some broken javadoc links yet.

I also would like to rename the package and everythiing else that does 
reference to lucene2 to lucene. I think it does not make sense to have a 
package name tied to a version. So, the package 
org.apache.lucene.queryParser.lucene2 would be renamed to 
org.apache.lucene.queryParser.lucene. I know it's kind of weird, because there 
are 2 lucene in the package declararion, but I think it's better than 
lucene2. Anyway, suggestions about this are welcome :) ... if nobody replies 
I will feel free to rename it and submit a new patch soon.

I will also work on writing a documentation for Lucene wiki that explains how 
to easily migrate from the old query parser to the new one, but I will only add 
it to the wiki when the code is committed to the trunk, it doesn't make sense a 
wiki documentation about something that is not even committed, agreed?

Suggestions?

Regards,
Adriano Crestani Campos

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, 
 lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
 lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
 lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in 

[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-13 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730627#action_12730627
 ] 

Adriano Crestani commented on LUCENE-1567:
--

Ah, I also couldn't run ant build-contrib using Java 1.4, it fails, I even 
tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis?

I already opened a thread on the ML about this: 
http://markmail.org/thread/3fyldf7t423fhwbm

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, 
 lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
 lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
 lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few days. We now have a 100%
 compatible Lucene query parser, which means the syntax is identical and
 all query parser test cases pass on the new one too using a wrapper.
 Recent posts show that there is demand for query syntax improvements,
 e.g improved range query syntax or operator precedence. There are
 already different QP implementations in Lucene+contrib, however I think
 we did not keep them all up to date and in sync. This is not too
 surprising, because usually when fixes and changes are made to the main
 query parser, people don't make the corresponding changes in the contrib
 parsers. (I'm guilty here too)
 With this new architecture it will be much easier to maintain different
 query syntaxes, as the actual code for the first layer is not 

Re: Lucene contrib build failing on jdk 1.4

2009-07-13 Thread Mark Miller
Hey Adriano,

Only core is fully 1.4. To build all of contrib you must use at least java
1.5. If you want to build a contrib that is supposed to 1.4 with 1.4, use
the individual build file for that contrib.

-- 
- Mark

http://www.lucidimagination.com


On Mon, Jul 13, 2009 at 7:44 PM, Adriano Crestani 
adrianocrest...@apache.org wrote:

 Hi,

 I checked out Lucene java trunk and ran build-contrib. When I run using
 Sun JDK 1.6.0_07 it builds successfully, but using Sun JDK 1.4.2_19 I get
 the error below. Am I doing something wrong?

 Thanks in advance,
 Adriano Crestani Campos

 ...
 build-artifacts-and-tests:

 bdb:

 BUILD FAILED
 /home/adcampos/lucene/trunk2/build.xml:626: The following error occurred
 while executing this line:
 /home/adcampos/lucene/trunk2/build.xml:616: The following error occurred
 while executing this line:
 /home/adcampos/lucene/trunk2/contrib/db/build.xml:29: The following error
 occurred while executing this line:
 /home/adcampos/lucene/trunk2/contrib/db/bdb/build.xml:14:
 java.lang.UnsupportedClassVersionError: com/sleepycat/db/internal/Db
 (Unsupported major.minor version 49.0)

 Total time: 17 seconds




[jira] Created: (LUCENE-1742) Wrap SegmentInfos in public class

2009-07-13 Thread Jason Rutherglen (JIRA)
Wrap SegmentInfos in public class 
--

 Key: LUCENE-1742
 URL: https://issues.apache.org/jira/browse/LUCENE-1742
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Jason Rutherglen


Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not 
need to be in the org.apache.lucene.index package.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1742) Wrap SegmentInfos in public class

2009-07-13 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1742:
-

  Component/s: Index
Fix Version/s: 3.0
 Priority: Trivial  (was: Major)
Affects Version/s: 2.4.1

 Wrap SegmentInfos in public class 
 --

 Key: LUCENE-1742
 URL: https://issues.apache.org/jira/browse/LUCENE-1742
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 3.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not 
 need to be in the org.apache.lucene.index package.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Lucene contrib build failing on jdk 1.4

2009-07-13 Thread Adriano Crestani
Hi Mark,

Thanks for the explanation, it makes sense now : )

Adriano Crestani Campos

On Mon, Jul 13, 2009 at 5:41 PM, Mark Miller markrmil...@gmail.com wrote:

 Hey Adriano,

 Only core is fully 1.4. To build all of contrib you must use at least java
 1.5. If you want to build a contrib that is supposed to 1.4 with 1.4, use
 the individual build file for that contrib.

 --
 - Mark

 http://www.lucidimagination.com



 On Mon, Jul 13, 2009 at 7:44 PM, Adriano Crestani 
 adrianocrest...@apache.org wrote:

 Hi,

 I checked out Lucene java trunk and ran build-contrib. When I run using
 Sun JDK 1.6.0_07 it builds successfully, but using Sun JDK 1.4.2_19 I get
 the error below. Am I doing something wrong?

 Thanks in advance,
 Adriano Crestani Campos

 ...
 build-artifacts-and-tests:

 bdb:

 BUILD FAILED
 /home/adcampos/lucene/trunk2/build.xml:626: The following error occurred
 while executing this line:
 /home/adcampos/lucene/trunk2/build.xml:616: The following error occurred
 while executing this line:
 /home/adcampos/lucene/trunk2/contrib/db/build.xml:29: The following error
 occurred while executing this line:
 /home/adcampos/lucene/trunk2/contrib/db/bdb/build.xml:14:
 java.lang.UnsupportedClassVersionError: com/sleepycat/db/internal/Db
 (Unsupported major.minor version 49.0)

 Total time: 17 seconds






[jira] Commented: (LUCENE-1742) Wrap SegmentInfos in public class

2009-07-13 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730640#action_12730640
 ] 

Jason Rutherglen commented on LUCENE-1742:
--

In order for this class to be compatible with out current
default LogMergePolicy, we'll need to expose readers from the IW
reader pool. This is because presumably classes may need to
access readers such as in
LogMergePolicy.findMergesToExpungeDeletes.

 Wrap SegmentInfos in public class 
 --

 Key: LUCENE-1742
 URL: https://issues.apache.org/jira/browse/LUCENE-1742
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 3.0

   Original Estimate: 48h
  Remaining Estimate: 48h

 Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not 
 need to be in the org.apache.lucene.index package.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-13 Thread Adriano Crestani (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730641#action_12730641
 ] 

Adriano Crestani commented on LUCENE-1567:
--

{quote}
Ah, I also couldn't run ant build-contrib using Java 1.4, it fails, I even 
tried a clean trunk and it did not work. Were you able to run it using 1.4 Luis?

I already opened a thread on the ML about this: 
http://markmail.org/thread/3fyldf7t423fhwbm
{quote}

Mark Miller just replied to the thread and based on his response there is no 
need for contrib projects to be able to compile using JDK 1.4. So, Luis, could 
you rollback your changes you did on the build files?

Thanks,
Adriano Crestani Campos

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, 
 lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
 lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
 lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few days. We now have a 100%
 compatible Lucene query parser, which means the syntax is identical and
 all query parser test cases pass on the new one too using a wrapper.
 Recent posts show that there is demand for query syntax improvements,
 e.g improved range query syntax or operator precedence. There are
 already different QP implementations in Lucene+contrib, however I think
 we did not keep them all up to date and in sync. This is not too
 surprising, because usually when fixes and changes are 

[jira] Updated: (LUCENE-1742) Wrap SegmentInfos in public class

2009-07-13 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1742:
-

Attachment: LUCENE-1742.patch

After looking at it, I wasn't sure why we couldn't simply make
the read only methods in SegmentInfo and SegmentInfos (and the
classes) public. 

Maybe this can make it into 2.9?

 Wrap SegmentInfos in public class 
 --

 Key: LUCENE-1742
 URL: https://issues.apache.org/jira/browse/LUCENE-1742
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 3.0

 Attachments: LUCENE-1742.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 Wrap SegmentInfos in a public class so that subclasses of MergePolicy do not 
 need to be in the org.apache.lucene.index package.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1567) New flexible query parser

2009-07-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730653#action_12730653
 ] 

Mark Miller commented on LUCENE-1567:
-

Hang on a sec - it sounds like the target was 1.4 because this was going to 
replace a 1.4 core piece of functionality.

I don't know that all of the details are fully straightened out though.

1. I'm not pro moving the QueryParser to contrib myself, unless we actually 
move forward on that 'modules' thread - if not, it doesn't appear very helpful 
to me.

2. If we move this to contrib, perhaps it can be 1.5? But then in 3.0, can we 
have 1.5 already? Or is that 3.1? If its 3.1, than if we remove the deprecated 
query parser in 3.0, you won't have a java 1.4 replacement to move to (if 
course we could keep the old QueryParser till 4.0 ... ). I'm not clear that we 
can't add new functionality to 3.0 though. I know Mike has mentioned it, but I 
can't find where it says that - I just see that we can remove deprecations, not 
that we can't also add new features. I may be missing something though?

We should get things fully straightened out before you spend too much time 
switching between 1.4 and 1.5 though.

 New flexible query parser
 -

 Key: LUCENE-1567
 URL: https://issues.apache.org/jira/browse/LUCENE-1567
 Project: Lucene - Java
  Issue Type: New Feature
  Components: QueryParser
 Environment: N/A
Reporter: Luis Alves
Assignee: Grant Ingersoll
 Attachments: lucene_1567_adriano_crestani_07_13_2009.patch, 
 lucene_trunk_FlexQueryParser_2009July09_v4.patch, 
 lucene_trunk_FlexQueryParser_2009July10_v5.patch, 
 lucene_trunk_FlexQueryParser_2009March24.patch, 
 lucene_trunk_FlexQueryParser_2009March26_v3.patch, new_query_parser_src.tar, 
 QueryParser_restructure_meetup_june2009_v2.pdf


 From New flexible query parser thread by Micheal Busch
 in my team at IBM we have used a different query parser than Lucene's in
 our products for quite a while. Recently we spent a significant amount
 of time in refactoring the code and designing a very generic
 architecture, so that this query parser can be easily used for different
 products with varying query syntaxes.
 This work was originally driven by Andreas Neumann (who, however, left
 our team); most of the code was written by Luis Alves, who has been a
 bit active in Lucene in the past, and Adriano Campos, who joined our
 team at IBM half a year ago. Adriano is Apache committer and PMC member
 on the Tuscany project and getting familiar with Lucene now too.
 We think this code is much more flexible and extensible than the current
 Lucene query parser, and would therefore like to contribute it to
 Lucene. I'd like to give a very brief architecture overview here,
 Adriano and Luis can then answer more detailed questions as they're much
 more familiar with the code than I am.
 The goal was it to separate syntax and semantics of a query. E.g. 'a AND
 b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query.
 We distinguish the semantics of the different query components, e.g.
 whether and how to tokenize/lemmatize/normalize the different terms or
 which Query objects to create for the terms. We wanted to be able to
 write a parser with a new syntax, while reusing the underlying
 semantics, as quickly as possible.
 In fact, Adriano is currently working on a 100% Lucene-syntax compatible
 implementation to make it easy for people who are using Lucene's query
 parser to switch.
 The query parser has three layers and its core is what we call the
 QueryNodeTree. It is a tree that initially represents the syntax of the
 original query, e.g. for 'a AND b':
   AND
  /   \
 A B
 The three layers are:
 1. QueryParser
 2. QueryNodeProcessor
 3. QueryBuilder
 1. The upper layer is the parsing layer which simply transforms the
 query text string into a QueryNodeTree. Currently our implementations of
 this layer use javacc.
 2. The query node processors do most of the work. It is in fact a
 configurable chain of processors. Each processors can walk the tree and
 modify nodes or even the tree's structure. That makes it possible to
 e.g. do query optimization before the query is executed or to tokenize
 terms.
 3. The third layer is also a configurable chain of builders, which
 transform the QueryNodeTree into Lucene Query objects.
 Furthermore the query parser uses flexible configuration objects, which
 are based on AttributeSource/Attribute. It also uses message classes that
 allow to attach resource bundles. This makes it possible to translate
 messages, which is an important feature of a query parser.
 This design allows us to develop different query syntaxes very quickly.
 Adriano wrote the Lucene-compatible syntax in a matter of hours, and the
 underlying processors and builders in a few 

[jira] Commented: (LUCENE-1566) Large Lucene index can hit false OOM due to Sun JRE issue

2009-07-13 Thread Thulasi Ram Naidu P (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730704#action_12730704
 ] 

Thulasi Ram Naidu P commented on LUCENE-1566:
-

any temporary solution for this problem?

 Large Lucene index can hit false OOM due to Sun JRE issue
 -

 Key: LUCENE-1566
 URL: https://issues.apache.org/jira/browse/LUCENE-1566
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.4.1
Reporter: Michael McCandless
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1566.patch, LUCENE-1566.patch


 This is not a Lucene issue, but I want to open this so future google
 diggers can more easily find it.
 There's this nasty bug in Sun's JRE:
   http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6478546
 The gist seems to be, if you try to read a large (eg 200 MB) number of
 bytes during a single RandomAccessFile.read call, you can incorrectly
 hit OOM.  Lucene does this, with norms, since we read in one byte per
 doc per field with norms, as a contiguous array of length maxDoc().
 The workaround was a custom patch to do large file reads as several
 smaller reads.
 Background here:
   http://www.nabble.com/problems-with-large-Lucene-index-td22347854.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org