RE: software grants

2009-07-08 Thread Uwe Schindler
Hi Grant,

 

 I think it is pretty clear that when the code lives in the public

 somewhere else (i.e. source forge or Google code, etc.) it needs to go

 through a grant. 

 

 That being said, I'm not particularly concerned about Trie, for the

 record.

 

Trie was in Sourceforge's SVN as part of panFMP, so it lived in public
before. The last revision was 342:

 

http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/search/TrieRangeQuery.java?revision=315
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/m
etadataportal/search/TrieRangeQuery.java?revision=315view=markuppathrev=34
2 view=markuppathrev=342

http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/me
tadataportal/utils/TrieUtils.java?revision=308
http://panfmp.svn.sourceforge.net/viewvc/panfmp/main/trunk/src/de/pangaea/m
etadataportal/utils/TrieUtils.java?revision=308view=markuppathrev=342
view=markuppathrev=342

 

The first version in Lucenes contrib was a modified version of the above SVN
revision (see LUCENE-1470).

 

After that it was deleted from panFMP's SVN and the new and further
optimized Lucene version was used for this project. If you like, we can fill
out a software grant to be sure (if it is still possible to do this after
the code transfer). I am the only person that must sign the grant on my
side. I can do a checkout of these two files, tar and md5 them.

 

Uwe



Re: broken links when building web-site

2009-07-08 Thread Grant Ingersoll
Yes, I've seen those too and have always wrote them off as Forrest  
errors.  I could never track down anything actually wrong on the site,  
so I ignored it.  The broken-links.xml file has been checked in for a  
good long time, I believe.



On Jul 7, 2009, at 3:00 PM, Uwe Schindler wrote:


I tried to build the docs inside trunk and also the docs in the site
(https://svn.apache.org/repos/asf/lucene/java/site), which both fail  
to

build.

The error is the same here (Win XP), except, that it says, that it  
cannot

find the images (which are indeed not available).

The last time I generated the site docs for revision 784758, after  
that
Grant applied LUCENE-1706. Maybe he missed to commit some new images  
for the

lucidimagination powered search.

But from the change in broken-links.xml, I see, that Grant must have  
seen
the same error, but ignored it. The docs seem to be correct, so I  
think this

error is not fatal.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


-Original Message-
From: Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Tuesday, July 07, 2009 8:25 PM
To: java-dev@lucene.apache.org
Subject: broken links when building web-site

I'm trying to regen the web site docs (w/ forrest), for LUCENE-1522,
but I'm hitting a BUILD FAILED at the end, I think because of these
broken links:

X [0]
images/instruction_arrow.pngBROKEN:
/lucene/h2.1522/src/site/src/documentation/content/xdocs/ 
images.instructio

n_arrow.png
(No such file or directory)
X [0]
skin/images/current.gif BROKEN:
/tango/offload/usr/local/src/apache-forrest-0.8/main/webapp/. (Is a
directory)
X [0]
skin/images/chapter.gif BROKEN:
/tango/offload/usr/local/src/apache-forrest-0.8/main/webapp/. (Is a
directory)
X [0] skin/images/page.gif

BROKEN:

/tango/offload/usr/local/src/apache-forrest-0.8/main/webapp/. (Is a
directory)

Does anyone else see this?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: software grants

2009-07-08 Thread Yonik Seeley
On Tue, Jul 7, 2009 at 10:27 PM, Grant Ingersollgsing...@apache.org wrote:
 I think it is pretty clear that when the code lives in the public somewhere
 else (i.e. source forge or Google code, etc.) it needs to go through a
 grant.

It's not clear to me... I think it's just another factor to consider.
It also matters how big of a body of code it is, how many people
developed it over how long, what licenses were used over it's
development history, etc.  Just because someone may make a patch or
feature available on github first does not mean a software grant is
automatically needed.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated LUCENE-1726:
-

Attachment: LUCENE-1726.trunk.test.patch

I tried the test on trunk and get the same error. They're all
docstore related files so maybe extra doc stores are being
opened?

{code} 
[junit] MockRAMDirectory: cannot close: there are still open files: 
{_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, 
_ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, 
_cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, 
_87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, 
_fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, 
_fr.fdx=2, _dw.tvf=2, _87.tvx=2}
[junit] java.lang.RuntimeException: MockRAMDirectory: cannot close: there 
are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, 
_g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, 
_dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, 
_87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, 
_g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, 
_n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2}
[junit] at 
org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.java:278)
[junit] at 
org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
{code}

 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728787#action_12728787
 ] 

Jason Rutherglen edited comment on LUCENE-1726 at 7/8/09 9:47 AM:
--

I tried the test on trunk and get the same error. They're all
docstore related files so maybe extra doc stores are being
opened?

{code} 
   [junit] MockRAMDirectory: cannot close: there are still open
files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2,
_g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2,
_n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, _cp.cfx=1,
_s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2,
_fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2,
_n9.fdt=2, _g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2,
_dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2,
_fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit]
java.lang.RuntimeException: MockRAMDirectory: cannot close:
there are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2,
_g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, _ks.tvf=2,
_n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1,
_n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2,
_fr.tvx=2, _87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2,
_s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2,
_fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2,
_n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2} [junit]
at
org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.j
ava:278) [junit]at
org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48)
[junit] at
org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java
:88)
{code}

  was (Author: jasonrutherglen):
I tried the test on trunk and get the same error. They're all
docstore related files so maybe extra doc stores are being
opened?

{code} 
[junit] MockRAMDirectory: cannot close: there are still open files: 
{_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, _g2.tvd=2, _g2.tvx=2, 
_ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, _dw.cfx=1, _n9.tvf=2, 
_cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, _87.tvf=2, _fr.tvd=2, 
_87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, _g2.fdt=2, _87.tvd=2, 
_fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, _n9.tvd=2, _fr.tvf=2, 
_fr.fdx=2, _dw.tvf=2, _87.tvx=2}
[junit] java.lang.RuntimeException: MockRAMDirectory: cannot close: there 
are still open files: {_s4.fdt=2, _g2.fdx=2, _s4.fdx=2, _g2.tvf=2, _dw.fdx=2, 
_g2.tvd=2, _g2.tvx=2, _ks.tvf=2, _n9.tvx=2, _ks.tvx=2, _n9.fdx=2, _ks.fdx=2, 
_dw.cfx=1, _n9.tvf=2, _cp.cfx=1, _s4.tvf=2, _dw.tvx=2, _87.fdx=2, _fr.tvx=2, 
_87.tvf=2, _fr.tvd=2, _87.fdt=2, _ks.tvd=2, _s4.tvd=2, _dw.tvd=2, _n9.fdt=2, 
_g2.fdt=2, _87.tvd=2, _fr.fdt=2, _dw.fdt=2, _dj.cfx=1, _s4.tvx=2, _ks.fdt=2, 
_n9.tvd=2, _fr.tvf=2, _fr.fdx=2, _dw.tvf=2, _87.tvx=2}
[junit] at 
org.apache.lucene.store.MockRAMDirectory.close(MockRAMDirectory.java:278)
[junit] at 
org.apache.lucene.index.Test1726.testIndexing(Test1726.java:48)
[junit] at 
org.apache.lucene.util.LuceneTestCase.runTest(LuceneTestCase.java:88)
{code}
  
 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-07-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728823#action_12728823
 ] 

Mark Miller commented on LUCENE-1693:
-

Mr. Busch my friend, I'll buy both you and Uwe *many* beers if you resolve this 
issue soon!

 AttributeSource/TokenStream API improvements
 

 Key: LUCENE-1693
 URL: https://issues.apache.org/jira/browse/LUCENE-1693
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, 
 LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, 
 TestCompatibility.java, TestCompatibility.java, TestCompatibility.java, 
 TestCompatibility.java


 This patch makes the following improvements to AttributeSource and
 TokenStream/Filter:
 - removes the set/getUseNewAPI() methods (including the standard
   ones). Instead by default incrementToken() throws a subclass of
   UnsupportedOperationException. The indexer tries to call
   incrementToken() initially once to see if the exception is thrown;
   if so, it falls back to the old API.
 - introduces interfaces for all Attributes. The corresponding
   implementations have the postfix 'Impl', e.g. TermAttribute and
   TermAttributeImpl. AttributeSource now has a factory for creating
   the Attribute instances; the default implementation looks for
   implementing classes with the postfix 'Impl'. Token now implements
   all 6 TokenAttribute interfaces.
 - new method added to AttributeSource:
   addAttributeImpl(AttributeImpl). Using reflection it walks up in the
   class hierarchy of the passed in object and finds all interfaces
   that the class or superclasses implement and that extend the
   Attribute interface. It then adds the interface-instance mappings
   to the attribute map for each of the found interfaces.
 - AttributeImpl now has a default implementation of toString that uses
   reflection to print out the values of the attributes in a default
   formatting. This makes it a bit easier to implement AttributeImpl,
   because toString() was declared abstract before.
 - Cloning is now done much more efficiently in
   captureState. The method figures out which unique AttributeImpl
   instances are contained as values in the attributes map, because
   those are the ones that need to be cloned. It creates a single
   linked list that supports deep cloning (in the inner class
   AttributeSource.State). AttributeSource keeps track of when this
   state changes, i.e. whenever new attributes are added to the
   AttributeSource. Only in that case will captureState recompute the
   state, otherwise it will simply clone the precomputed state and
   return the clone. restoreState(AttributeSource.State) walks the
   linked list and uses the copyTo() method of AttributeImpl to copy
   all values over into the attribute that the source stream
   (e.g. SinkTokenizer) uses. 
 The cloning performance can be greatly improved if not multiple
 AttributeImpl instances are used in one TokenStream. A user can
 e.g. simply add a Token instance to the stream instead of the individual
 attributes. Or the user could implement a subclass of AttributeImpl that
 implements exactly the Attribute interfaces needed. I think this
 should be considered an expert API (addAttributeImpl), as this manual
 optimization is only needed if cloning performance is crucial. I ran
 some quick performance tests using Tee/Sink tokenizers (which do
 cloning) and the performance was roughly 20% faster with the new
 API. I'll run some more performance tests and post more numbers then.
 Note also that when we add serialization to the Attributes, e.g. for
 supporting storing serialized TokenStreams in the index, then the
 serialization should benefit even significantly more from the new API
 than cloning. 
 Also, the TokenStream API does not change, except for the removal 
 of the set/getUseNewAPI methods. So the patches in LUCENE-1460
 should still work.
 All core tests pass, however, I need to update all the documentation
 and also add some unit tests for the new AttributeSource
 functionality. So this patch is not ready to commit yet, but I wanted
 to post it already for some feedback. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728824#action_12728824
 ] 

Michael McCandless commented on LUCENE-1726:


Hmm... I'll dig into this test case.

 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728833#action_12728833
 ] 

Jason Rutherglen commented on LUCENE-1726:
--

Mike,

I was wondering if you can recommend techniques or tools for
debugging this type of multithreading issue? (i.e. how do you go
about figuring this type of issue out?) 

 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728853#action_12728853
 ] 

Michael McCandless commented on LUCENE-1726:


I don't have any particular tools...

First I simplify the test as much as possible while still hitting the
failure (eg this failure happens w/ only 2 threads), then see if the
error will happen if I turn on IndexWriter's infoStream (it doesn't
for this, so far).  If so, I scrutinize the series of events to find
the hazard; else, I turn off infoStream and add back in a small number
of prints, as long as failure still happens.

Often I use a simple Python script that runs the test over  over
until a failure happens, saving the log, and then scrutinize that.

It's good to start with a rough guess, eg this failure is w/ only doc
stores so it seems likely the merging logic that opens doc stores just
before kicking off the merge may be to blame.


 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1693) AttributeSource/TokenStream API improvements

2009-07-08 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728879#action_12728879
 ] 

Michael Busch commented on LUCENE-1693:
---

Alright, I hope you are coming to Oakland in November! 

I had a few (literally) sleepless nights last week to meet some internal 
deadlines; but it looks like I'll now have time to work on Lucene, so I'll 
continue on this issue tonight!

 AttributeSource/TokenStream API improvements
 

 Key: LUCENE-1693
 URL: https://issues.apache.org/jira/browse/LUCENE-1693
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, 
 LUCENE-1693.patch, LUCENE-1693.patch, LUCENE-1693.patch, lucene-1693.patch, 
 TestCompatibility.java, TestCompatibility.java, TestCompatibility.java, 
 TestCompatibility.java


 This patch makes the following improvements to AttributeSource and
 TokenStream/Filter:
 - removes the set/getUseNewAPI() methods (including the standard
   ones). Instead by default incrementToken() throws a subclass of
   UnsupportedOperationException. The indexer tries to call
   incrementToken() initially once to see if the exception is thrown;
   if so, it falls back to the old API.
 - introduces interfaces for all Attributes. The corresponding
   implementations have the postfix 'Impl', e.g. TermAttribute and
   TermAttributeImpl. AttributeSource now has a factory for creating
   the Attribute instances; the default implementation looks for
   implementing classes with the postfix 'Impl'. Token now implements
   all 6 TokenAttribute interfaces.
 - new method added to AttributeSource:
   addAttributeImpl(AttributeImpl). Using reflection it walks up in the
   class hierarchy of the passed in object and finds all interfaces
   that the class or superclasses implement and that extend the
   Attribute interface. It then adds the interface-instance mappings
   to the attribute map for each of the found interfaces.
 - AttributeImpl now has a default implementation of toString that uses
   reflection to print out the values of the attributes in a default
   formatting. This makes it a bit easier to implement AttributeImpl,
   because toString() was declared abstract before.
 - Cloning is now done much more efficiently in
   captureState. The method figures out which unique AttributeImpl
   instances are contained as values in the attributes map, because
   those are the ones that need to be cloned. It creates a single
   linked list that supports deep cloning (in the inner class
   AttributeSource.State). AttributeSource keeps track of when this
   state changes, i.e. whenever new attributes are added to the
   AttributeSource. Only in that case will captureState recompute the
   state, otherwise it will simply clone the precomputed state and
   return the clone. restoreState(AttributeSource.State) walks the
   linked list and uses the copyTo() method of AttributeImpl to copy
   all values over into the attribute that the source stream
   (e.g. SinkTokenizer) uses. 
 The cloning performance can be greatly improved if not multiple
 AttributeImpl instances are used in one TokenStream. A user can
 e.g. simply add a Token instance to the stream instead of the individual
 attributes. Or the user could implement a subclass of AttributeImpl that
 implements exactly the Attribute interfaces needed. I think this
 should be considered an expert API (addAttributeImpl), as this manual
 optimization is only needed if cloning performance is crucial. I ran
 some quick performance tests using Tee/Sink tokenizers (which do
 cloning) and the performance was roughly 20% faster with the new
 API. I'll run some more performance tests and post more numbers then.
 Note also that when we add serialization to the Attributes, e.g. for
 supporting storing serialized TokenStreams in the index, then the
 serialization should benefit even significantly more from the new API
 than cloning. 
 Also, the TokenStream API does not change, except for the removal 
 of the set/getUseNewAPI methods. So the patches in LUCENE-1460
 should still work.
 All core tests pass, however, I need to update all the documentation
 and also add some unit tests for the new AttributeSource
 functionality. So this patch is not ready to commit yet, but I wanted
 to post it already for some feedback. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] Updated: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1726:
---

Attachment: LUCENE-1726.patch

OK the problem happens when a segment is first opened by a merge that
doesn't need to merge the doc stores; later, an NRT reader is opened
that separately opens the doc stores of the same [pooled]
SegmentReader, but then it's the merge that closes the read-only clone
of the reader.

In this case the separately opened (by the NRT reader) doc stores are
not closed by the merge thread.  It's the mirror image of LUCENE-1639.

I've fixed it by pulling all shared readers in a SegmentReader into a
separate static class (CoreReaders).  Cloned SegmentReaders share the
same instance of this class so that if a clone later opens the doc
stores, any prior ancestor (that the clone was created from) would
also close those readers if it's the reader to decRef to 0.

I did something similar for LUCENE-1609 (which I'll now hit conflicts
on after committing this... sigh).

I plan to commit in a day or so.


 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728909#action_12728909
 ] 

Jason Rutherglen commented on LUCENE-1726:
--

The test now passes, needs to go in the patch, perhaps in
TestIndexWriterReader? Great work on this, it's easier to
understand SegmentReader now that all the shared objects are in
one object (CoreReaders). It should make debugging go more
smoothly. 

Is there a reason we're not synchronizing on SR.core in
openDocStores? Couldn't we synchronize on core for the cloning
methods? 

 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: A Comparison of Open Source Search Engines

2009-07-08 Thread Otis Gospodnetic

Interesting, I never realized there was lucene-java-...@apache.org .

My thoughts are on 
http://www.jroller.com/otis/entry/open_source_search_engine_benchmark (and in 
several comments in the blog itself).

Otis



- Original Message 
 From: Sean Owen sro...@gmail.com
 To: lucene-java-...@apache.org
 Sent: Monday, July 6, 2009 11:06:14 AM
 Subject: A Comparison of Open Source Search Engines
 
 http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/
 
 I imagine many of you already saw this -- Lucene does pretty well in
 this shootout.
 The only area it tended to lag, it seems, is memory usage and speed in
 some cases.
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12728938#action_12728938
 ] 

Michael McCandless commented on LUCENE-1726:


bq. Is there a reason we're not synchronizing on SR.core in openDocStores?

I was going to say because IW sychronizes but in fact it doesn't,
properly, because when merging we go and open doc stores in
unsynchronized context.  So I'll synchronize(core) in
SR.openDocStores.

bq. Couldn't we synchronize on core for the cloning methods?

I don't think that's needed?  The core is simply carried over to the
newly cloned reader.



 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: A Comparison of Open Source Search Engines

2009-07-08 Thread Jorge Handl
On Mon, Jul 6, 2009 at 6:01 PM, Earwin Burrfoot ear...@gmail.com wrote:

 Anybody knows other interesting open-source search engines?


http://hounder.org


[jira] Commented: (LUCENE-1706) Site search powered by Lucene/Solr

2009-07-08 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729007#action_12729007
 ] 

Grant Ingersoll commented on LUCENE-1706:
-

Checking...

 Site search powered by Lucene/Solr
 --

 Key: LUCENE-1706
 URL: https://issues.apache.org/jira/browse/LUCENE-1706
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 2.9

 Attachments: LUCENE-1706.patch, LUCENE-1706.patch


 For a number of years now, the Lucene community has been criticized for not 
 eating our own dog food when it comes to search. My company has built and 
 hosts a site search (http://www.lucidimagination.com/search) that is powered 
 by Apache Solr and Lucene and we'd like to donate it's use to the Lucene 
 community. Additionally, it allows one to search all of the Lucene content 
 from a single place, including web, wiki, JIRA and mail archives. See also 
 http://www.lucidimagination.com/search/document/bf22a570bf9385c7/search_on_lucene_apache_org
 You can see it live on Mahout, Tika and Solr
 Lucid has a fault tolerant setup with replication and fail over as well as 
 monitoring services in place. We are committed to maintaining and expanding 
 the search capabilities on the site.
 The following patch adds a skin to the Forrest site that enables the Lucene 
 site to search Lucene only content using Lucene/Solr. When a search is 
 submitted, it automatically selects the Lucene facet such that only Lucene 
 content is searched. From there, users can then narrow/broaden their search 
 criteria.
 I plan on committing in a 3 or 4 days.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1726) IndexWriter.readerPool create new segmentReader outside of sync block

2009-07-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729024#action_12729024
 ] 

Jason Rutherglen commented on LUCENE-1726:
--

{quote}I don't think that's needed? The core is simply carried
over to the newly cloned reader.{quote}

Right however wouldn't it be somewhat cleaner to sync on core
for all clone operations given we don't want those to occur
(external to IW) at the same time? Ultimately we want core to be
the controller of it's resources rather than the SR being cloned?

I ran the test with the SRMapValue sync code, (4 threads) with
the sync on SR.core in openDocStore for 10 minutes, 2 core
Windows XML laptop Java 6.14 and no errors. Then same with 2
threads for 5 minutes and no errors. I'll keep on running it to
see if we can get an error.

I'm still a little confused as to why we're going to see the bug
if readerPool.get is syncing on the SRMapValue. I guess there's
a slight possibility of the error, and perhaps a more randomized
test would produce it.

 IndexWriter.readerPool create new segmentReader outside of sync block
 -

 Key: LUCENE-1726
 URL: https://issues.apache.org/jira/browse/LUCENE-1726
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.patch, 
 LUCENE-1726.patch, LUCENE-1726.patch, LUCENE-1726.trunk.test.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 I think we will want to do something like what field cache does
 with CreationPlaceholder for IndexWriter.readerPool. Otherwise
 we have the (I think somewhat problematic) issue of all other
 readerPool.get* methods waiting for an SR to warm.
 It would be good to implement this for 2.9.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1731) Allow ConstantScoreQuery to use custom rewrite method if using for highlighting

2009-07-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12729034#action_12729034
 ] 

Mark Miller commented on LUCENE-1731:
-

Hey Ashley,

This was added to the SpanScorer Scorer for the Highlighter a while back as 
part of resolving that Solr issue. Hopefully I will have to time to make it the 
default by 2.9's release, but its there as an option now if you use the 
SpanScorer.

The issue was:   LUCENE-1425 - Add ConstantScore highlighting support 
to SpanScorer

 Allow ConstantScoreQuery to use custom rewrite method if using for 
 highlighting
 ---

 Key: LUCENE-1731
 URL: https://issues.apache.org/jira/browse/LUCENE-1731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 2.4, 2.4.1
Reporter: Ashley Sole
Priority: Minor

 I'd like to submit a patch for ConstantScoreQuery which simply contains a 
 setter method to state whether it is being used for highlighting or not. 
 If it is being used for highlighting, then the rewrite method can take each 
 of the terms in the filter and create a BooleanQuery to return (if the number 
 of terms in the filter are less than 1024), otherwise it simply uses the old 
 rewrite method.
 This allows you to highlight upto 1024 terms when using a ConstantScoreQuery, 
 which since it is a filter, will currently not be highlighted.
 The idea for this came from Mark Millers article Bringing the Highlighter 
 back to Wildcard Queries in Solr 1.4, I would just like to make it available 
 in core lucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org