date:20110513

[jira] [Created] (SOLR-2516) Solr should not cache Searchers

2011-05-13 Thread John Wang (JIRA)

Solr should not cache Searchers
---

 Key: SOLR-2516
 URL: https://issues.apache.org/jira/browse/SOLR-2516
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: John Wang


only IndexReaders should be cached (where data resides) Searcher is a thin 
execution wrapper around it and thus should not be cached.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2515) Custom written Similarity class does not read solr parameter values from schema.xml

2011-05-13 Thread Pradeep (JIRA)

Custom written Similarity class does not read solr parameter values from 
schema.xml
---

 Key: SOLR-2515
 URL: https://issues.apache.org/jira/browse/SOLR-2515
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Pradeep
Priority: Minor


Writing new custom written similarity class extending DefaultSimilarty class 
does not set parameter values.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3096) MultiSearcher does not work correctly with Not on NumericRange

2011-05-13 Thread John Wang (JIRA)

MultiSearcher does not work correctly with Not on NumericRange
--

 Key: LUCENE-3096
 URL: https://issues.apache.org/jira/browse/LUCENE-3096
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 3.0.2
Reporter: John Wang


Hi, Keith

My colleague xiaoyang and I just confirmed that this is actually due to a 
lucene bug on Multisearcher. In particular,

If we search with Not on NumericRange and we use MultiSearcher, we
will wrong search results (However, if we use IndexSearcher, the
result is correct).  Basically the NotOfNumericRange does not have
impact on multisearcher. We suspect it is because the createWeight()
function in MultiSearcher and hope you can help us to fix this bug of
lucene. I attached the code to reproduce this case. Please check it
out.

In the attached code, I have two separate functions :

(1) testNumericRangeSingleSearcher(Query query)
where I create 6 documents, with a field called "id"= 1,2,3,4,5,6
respectively . Then I search by the query which is
+MatchAllDocs -NumericRange(3,3). The expected result then should
be 5 hits since the document 3 is MUST_NOT.

(2) testNumericRangeMultiSearcher(Query query)
where i create 2 RamDirectory(), each of which has 3 documents,
1,2,3; and 4,5,6. Then I search by the same query as above using
multiSearcher. The expected result should also be 5 hits.

However, from (1), we get 5 hits = expected results, while in (2) we
get 6 hits != expected results.

We also experimented this with our zoie/bobo open source tools and get
the same results because our multi-bobo-browser is built on
multi-searcher in lucene.


I already emailed the lucene community group. Hopefully we can get some 
feedback soon.
If you have any further concern, pls let me know! 

Thank you very much!


Code:  (based on lucene 3.0.x)



import java.io.IOException;
import java.io.PrintStream;
import java.text.DecimalFormat;

import org.apache.lucene.analysis.WhitespaceAnalyzer;

import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.NumericField;
import org.apache.lucene.index.CorruptIndexException;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.Term;

import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.FieldCache;
import org.apache.lucene.search.IndexSearcher;

import org.apache.lucene.search.MatchAllDocsQuery;
import org.apache.lucene.search.MultiSearcher;

import org.apache.lucene.search.NumericRangeQuery;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.Searchable;
import org.apache.lucene.search.Sort;
import org.apache.lucene.search.SortField;
import org.apache.lucene.search.TermQuery;
import org.apache.lucene.search.TopDocs;

import org.apache.lucene.search.BooleanClause.Occur;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.LockObtainFailedException;

import org.apache.lucene.store.RAMDirectory;

import com.convertlucene.ConvertFrom2To3;

public class TestNumericRange
{
 public final static void main(String[] args)
 {
   try

   {
 BooleanQuery query = new  BooleanQuery();
 query.add(NumericRangeQuery.newIntRange("numId", 3, 3, true,
true), Occur.MUST_NOT);
 query.add(new MatchAllDocsQuery(), Occur.MUST);

 testNumericRangeSingleSearcher(query);
 testNumericRangeMultiSearcher(query);

   }
   catch(Exception e)
   {
 e.printStackTrace();
   }
 }



 public static void testNumericRangeSingleSearcher(Query query)
throws CorruptIndexException, LockObtainFailedException, IOException
 {
String[] ids = {"1", "2", "3", "4", "5", "6"};


   Directory directory = new RAMDirectory();

   IndexWriter writer = new IndexWriter(directory, new
WhitespaceAnalyzer(),  IndexWriter.MaxFieldLength.UNLIMITED);

   for (int i = 0; i < ids.length; i++)
   {
 Document doc = new Document();
 doc.add(new Field("id", ids[i],
   Field.Store.YES,
   Field.Index.NOT_ANALYZED));
 doc.add(new NumericField("numId").setIntValue(Integer.valueOf(ids[i])));
 writer.addDocument(doc);
   }
   writer.close();


   IndexSearcher searcher = new IndexSearcher(directory);

   TopDocs docs = searcher.search(query, 10);
   System.out.println("SingleSearcher: testNumericRange: hitNum: " +
docs.totalHits);
   for(ScoreDoc doc : docs.scoreDocs)
   {
 System.out.println(searcher.explain(query, doc.doc));
   }
   searcher.close();

   directory.close();
 }

 public static void testNumericRangeMultiSearcher(Query query) throws
CorruptIndexException, LockObtainFailedException, IOException
 {
String[] ids1 = {"1", "2", "3"};
   Directory directory1 = new RAMDirectory();
   IndexWriter writer1 = new IndexWriter(directory1, new
WhitespaceAnalyzer(),  IndexWriter.MaxFi

Re: 3.2.0 (or 3.1.1)

2011-05-13 Thread Shai Erera

+1 for 3.2!

And also, we should adopt that approach going forward (no more bug fix
releases for the stable branch, except for the last release before 4.0
is out). That means updating the release TODO with e.g., not creating
a branch for 3.2.x, only tag it. When 4.0 is out, we branch 3.x.y out
of the last 3.x tag.

Shai

On Saturday, May 14, 2011, Ryan McKinley  wrote:
> On Fri, May 13, 2011 at 6:40 PM, Grant Ingersoll  wrote:
>> It's been just over 1 month since the last release.  We've all said we want 
>> to get to about a 3 month release cycle (if not more often).  I think this 
>> means we should start shooting for a next release sometime in June.  Which, 
>> in my mind, means we should start working on wrapping up issues now, IMO.
>>
>> Here's what's open for 3.2 against:
>> Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070
>> Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172
>>
>> Thoughts?
>>
>
> +1 for 3.2 with a new feature freeze pretty soon
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2480) Text extraction of password protected files

2011-05-13 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2480:
-

Attachment: password-is-solrcell.docx
SOLR-2480.patch

Attached the next patch and password protected word file that is used for test.

I added test cases for ignoreTikaException=true|false cases.

I think this is ready to commit.

> Text extraction of password protected files
> ---
>
> Key: SOLR-2480
> URL: https://issues.apache.org/jira/browse/SOLR-2480
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 1.4.1, 3.1
>Reporter: Shinichiro Abe
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2480-idea1.patch, SOLR-2480.patch, SOLR-2480.patch, 
> password-is-solrcell.docx
>
>
> Proposal:
> There are password-protected files. PDF, Office documents in 2007 format/97 
> format.
> These files are posted using SolrCell.
> We do not have to read these files if we do not know the reading password of 
> files.
> So, these files may not be extracted text.
> My requirement is that these files should be processed normally without 
> extracting text, and without throwing exception.
> This background:
> Now, when you post a password-protected file, solr returns 500 server error.
> Solr catches the error in ExtractingDocumentLoader and throws TikException.
> I use ManifoldCF.
> If the solr server responds 500, ManifoldCF judge is that "this
> document should be retried because I have absolutely no idea what
> happened".
> And it attempts to retry posting many times without getting the password.
> In the other case, my customer posts the files with embedded images.
> Sometimes it seems that solr throws TikaException of unknown cause.
> He wants to post just metadata without extracting text, but makes him stop 
> posting by the exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2480) Text extraction of password protected files

2011-05-13 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2480:
-

Attachment: SOLR-2480.patch

A patch that introduces ignoreTikaException flag.

> Text extraction of password protected files
> ---
>
> Key: SOLR-2480
> URL: https://issues.apache.org/jira/browse/SOLR-2480
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 1.4.1, 3.1
>Reporter: Shinichiro Abe
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2480-idea1.patch, SOLR-2480.patch
>
>
> Proposal:
> There are password-protected files. PDF, Office documents in 2007 format/97 
> format.
> These files are posted using SolrCell.
> We do not have to read these files if we do not know the reading password of 
> files.
> So, these files may not be extracted text.
> My requirement is that these files should be processed normally without 
> extracting text, and without throwing exception.
> This background:
> Now, when you post a password-protected file, solr returns 500 server error.
> Solr catches the error in ExtractingDocumentLoader and throws TikException.
> I use ManifoldCF.
> If the solr server responds 500, ManifoldCF judge is that "this
> document should be retried because I have absolutely no idea what
> happened".
> And it attempts to retry posting many times without getting the password.
> In the other case, my customer posts the files with embedded images.
> Sometimes it seems that solr throws TikaException of unknown cause.
> He wants to post just metadata without extracting text, but makes him stop 
> posting by the exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: GSoC: LUCENE-2308: Separately specify a field's type

2011-05-13 Thread Chris Male

2011/5/14 Nikola Tanković 

> 2011/5/12 Michael McCandless 
>
>> 2011/5/9 Nikola Tanković :
>>
>>
>> >> > Introduction of an FieldType class that will hold all the extra
>> >> > properties
>> >> > now stored inside Field instance other than field value itself.
>> >>
>> >> Seems like this is an easy first baby step -- leave current Field
>> >> class, but break out the "type" details into a separate class that can
>> >> be shared across Field instances.
>> >
>> > Yes, I agree, this could be a good first step. Mike submitted a patch on
>> > issue #2308. I think it's a solid base for this.
>>
>> Make that Chris.
>>
>
> Ouch, sorry!
>
>
>>
>> >> > New FieldTypeAttribute interface will be added to handle extension
>> with
>> >> > new
>> >> > field properties inspired by IndexWriterConfig.
>> >>
>> >> How would this work?  What's an example compelling usage?  An app
>> >> could use this for extensibility, and then make a matching codec that
>> >> picks up this attr?  EG, say, maybe for marking that a field is a
>> >> "primary key field" and then codec could optimize accordingly...?
>> >
>> > Well that could be very interesting scenario. It didn't rang a bell to
>> me
>> > for possible codec usage, but it seems very reasonable. Attributes
>> otherwise
>> > don't make much sense, unless propertly used in custom codecs.
>> >
>> > How will we ensure attribute and codec compatibility?
>>
>> I'm just thinking we should have concrete reasons in mind for cutting
>> over to attributes here... I'd rather see a fixed, well thought out
>> concrete FieldType hierarchy first...
>>
>
> Yes, I couldn't agree more, and I also think Chris has some great ideas on
> this field, given his work on Spatial indexing which tends to have use of
> this additional attributes.
>

I think Attributes should be used sparingly, but I do think they make sense.
 I do use a similar idea in some spatial work where different fields have
different requirements but need to work with the same set of strategies.  I
feel this is metadata and doesn't belong in an extension to Field.  But
equally its not 'core' to FieldType either, which is why I added the
FieldTypeAttribute idea.

In the end I feel we should provide maximum flexibility here, especially if
we are going to move over to a more minimal API for the indexer.  We need to
allow custom extensions to FieldType and I'm not sure having 'instanceof'
statements everytime I need to something specific to a subtype, is the best
way to go.


>
>
>>
>> >> > Refactoring and dividing of settings for term frequency and
>> positioning
>> >> > can
>> >> > also be done (LUCENE-2048)
>> >>
>> >> Ahh great!  So we can omit-positions-but-not-TF.
>> >>
>> >> > Discuss possible effects of completion of LUCENE-2310 on this project
>> >>
>> >> This one is badly needed... but we should keep your project focused.
>> >
>> >
>> > We'll tackle this one afterwards.
>>
>> Good.
>>
>>
>> >> > Adequate Factory class for easier configuration of new Field
>> instances
>> >> > together with manually added new FieldTypeAttributes
>> >> > FieldType, once instantiated is read-only. Only fields value can be
>> >> > changed.
>> >>
>> >> OK.
>> >>
>> >> > Simple hierarchy of Field classes with core properties logically
>> >> > predefaulted. E.g.:
>> >> >
>> >> > NumberField,
>> >>
>> >> Can't this just be our existing NumericField?
>> >
>> > Yes, this is classic NumericField with changes proposed in LUCENE-2310.
>> Tim
>> > Smith mentioned that Fieldable class should be kept for custom
>> > implementations to reduce number of setters (for defaults).
>> > Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it
>> > should be implemented instead of Fieldable for custom implementations,
>> so
>> > both Fieldable and AbstractField are not needed anymore.
>> > In my opinion Field shoud become abstract extended with others.
>> > Another proposal: how about keeping only Field (with no hierarchy) and
>> move
>> > hierarchy to FieldType, such as NumericFieldType, StringFieldType since
>> this
>> > hierarchy concerns type information only?
>>
>> I think hierarchy of both types and the "value containers" that hold
>> the corresponding values could make sense?
>>
>
> Hmm, I think we should get more opinions on this one also.
>

I'm unsure about this.  What information would a StringFieldType have over a
NumericFieldType? I can imagine NumericFieldType maybe having precision
step.  Couldn't that be an Attribute?  I can see the benefit of a
StringField though, and a NumericField, since they are providing different
implementations of the same fundamental needs of a Field; its name, its
value, its type and its tokenstream.  I think we should use hierarchies
sparingly as well, since really we want to make this as simple as possible.
 But we should also keep our eye on those fundamental needs of the indexer.


>
>
>>
>> > e.g. Usage:
>> > FieldType number = new NumericFieldType();
>> > Field price = new Field();
>> > price.setType(number);
>

[jira] [Commented] (SOLR-2480) Text extraction of password protected files

2011-05-13 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033429#comment-13033429
 ] 

Koji Sekiguchi commented on SOLR-2480:
--

bq. And I think SOLR-445 can resolve improvement ideas(2).

No. You should consider the difference between this issue and SOLR-445. (see my 
comment above)

As I understand your requirement that was described in Description, and it is 
quite similar SOLR-2512 that has been resolved, I'll try a patch that has 
ignoreErrors flag for TikaException.

I added an ability to ignore exceptions when trying to extract mata data from 
text in SOLR-2512, i.g. Solr indexed the text but gave up meta data. On the 
other hand, the ignore flag in this ticket is for giving up text but indexing 
meta data. It cannot be resolved by SOLR-445.

> Text extraction of password protected files
> ---
>
> Key: SOLR-2480
> URL: https://issues.apache.org/jira/browse/SOLR-2480
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 1.4.1, 3.1
>Reporter: Shinichiro Abe
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2480-idea1.patch
>
>
> Proposal:
> There are password-protected files. PDF, Office documents in 2007 format/97 
> format.
> These files are posted using SolrCell.
> We do not have to read these files if we do not know the reading password of 
> files.
> So, these files may not be extracted text.
> My requirement is that these files should be processed normally without 
> extracting text, and without throwing exception.
> This background:
> Now, when you post a password-protected file, solr returns 500 server error.
> Solr catches the error in ExtractingDocumentLoader and throws TikException.
> I use ManifoldCF.
> If the solr server responds 500, ManifoldCF judge is that "this
> document should be retried because I have absolutely no idea what
> happened".
> And it attempts to retry posting many times without getting the password.
> In the other case, my customer posts the files with embedded images.
> Sometimes it seems that solr throws TikaException of unknown cause.
> He wants to post just metadata without extracting text, but makes him stop 
> posting by the exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2480) Text extraction of password protected files

2011-05-13 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-2480:
-

Affects Version/s: 1.4.1
Fix Version/s: 4.0
   3.2

> Text extraction of password protected files
> ---
>
> Key: SOLR-2480
> URL: https://issues.apache.org/jira/browse/SOLR-2480
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - Solr Cell (Tika extraction)
>Affects Versions: 1.4.1, 3.1
>Reporter: Shinichiro Abe
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2480-idea1.patch
>
>
> Proposal:
> There are password-protected files. PDF, Office documents in 2007 format/97 
> format.
> These files are posted using SolrCell.
> We do not have to read these files if we do not know the reading password of 
> files.
> So, these files may not be extracted text.
> My requirement is that these files should be processed normally without 
> extracting text, and without throwing exception.
> This background:
> Now, when you post a password-protected file, solr returns 500 server error.
> Solr catches the error in ExtractingDocumentLoader and throws TikException.
> I use ManifoldCF.
> If the solr server responds 500, ManifoldCF judge is that "this
> document should be retried because I have absolutely no idea what
> happened".
> And it attempts to retry posting many times without getting the password.
> In the other case, my customer posts the files with embedded images.
> Sometimes it seems that solr throws TikaException of unknown cause.
> He wants to post just metadata without extracting text, but makes him stop 
> posting by the exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-2113) Create TermsQParser that deals with toInternal() conversion of external terms

2011-05-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened SOLR-2113:


  Assignee: Hoss Man

> Create TermsQParser that deals with toInternal() conversion of external terms
> -
>
> Key: SOLR-2113
> URL: https://issues.apache.org/jira/browse/SOLR-2113
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2113.patch
>
>
> For converting facet.field response constraints into filter queries, it would 
> be helpful to have a QParser that generated a TermQuery using the 
> toInternal() converted result of the raw "q" param

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2113) Create TermsQParser that deals with toInternal() conversion of external terms

2011-05-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2113.


   Resolution: Fixed
Fix Version/s: 3.2

Committed revision 1102922. - 3x backport


> Create TermsQParser that deals with toInternal() conversion of external terms
> -
>
> Key: SOLR-2113
> URL: https://issues.apache.org/jira/browse/SOLR-2113
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Hoss Man
>Assignee: Hoss Man
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2113.patch
>
>
> For converting facet.field response constraints into filter queries, it would 
> be helpful to have a QParser that generated a TermQuery using the 
> toInternal() converted result of the raw "q" param

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 3.2.0 (or 3.1.1)

2011-05-13 Thread Ryan McKinley

On Fri, May 13, 2011 at 6:40 PM, Grant Ingersoll  wrote:
> It's been just over 1 month since the last release.  We've all said we want 
> to get to about a 3 month release cycle (if not more often).  I think this 
> means we should start shooting for a next release sometime in June.  Which, 
> in my mind, means we should start working on wrapping up issues now, IMO.
>
> Here's what's open for 3.2 against:
> Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070
> Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172
>
> Thoughts?
>

+1 for 3.2 with a new feature freeze pretty soon

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 3.2.0 (or 3.1.1)

2011-05-13 Thread Robert Muir

On Fri, May 13, 2011 at 6:40 PM, Grant Ingersoll  wrote:
> It's been just over 1 month since the last release.  We've all said we want 
> to get to about a 3 month release cycle (if not more often).  I think this 
> means we should start shooting for a next release sometime in June.  Which, 
> in my mind, means we should start working on wrapping up issues now, IMO.
>
> Here's what's open for 3.2 against:
> Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070
> Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172
>
> Thoughts?
>
> -Grant

My vote would be to just spend our time on 3.2. people get bugfixes,
better test coverage, and a couple of new features and optimizations,
too.
Is it really going to be harder to release 3.2 than to release 3.1.1?

we could just announce in advance we'd like to feature freeze 3.2 on
 ?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-139) Support updateable/modifiable documents

2011-05-13 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-139:
---

Fix Version/s: (was: 3.2)

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

3.2.0 (or 3.1.1)

2011-05-13 Thread Grant Ingersoll

It's been just over 1 month since the last release.  We've all said we want to 
get to about a 3 month release cycle (if not more often).  I think this means 
we should start shooting for a next release sometime in June.  Which, in my 
mind, means we should start working on wrapping up issues now, IMO.

Here's what's open for 3.2 against:
Lucene: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12316070
Solr: https://issues.apache.org/jira/browse/SOLR/fixforversion/12316172

Thoughts?

-Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles

2011-05-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2451.


   Resolution: Fixed
Fix Version/s: 3.2

Committed revision 1102910. - 3x


> Enhance SolrTestCaseJ4 to allow tests to account for small deltas when 
> comparing floats/doubles
> ---
>
> Key: SOLR-2451
> URL: https://issues.apache.org/jira/browse/SOLR-2451
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2451.patch, SOLR-2451.patch, 
> SOLR-2451_assertQScore.patch
>
>
> Attached is a patch that adds the following method to SolrTestCaseJ4:  (just 
> javadoc & signature shown)
> {code:java}
>   /**
>* Validates that the document at the specified index in the results has 
> the specified score, within 0.0001.
>*/
>   public static void assertQScore(SolrQueryRequest req, int docIdx, float 
> targetScore) {
> {code}
> This is especially useful for geospatial in which slightly different 
> precision deltas might occur when trying different geospatial indexing 
> strategies are used, assuming the score is some geospatial distance.  This 
> patch makes a simple modification to DistanceFunctionTest to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles

2011-05-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man reopened SOLR-2451:



forgot i was in the middle of backporting


> Enhance SolrTestCaseJ4 to allow tests to account for small deltas when 
> comparing floats/doubles
> ---
>
> Key: SOLR-2451
> URL: https://issues.apache.org/jira/browse/SOLR-2451
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2451.patch, SOLR-2451.patch, 
> SOLR-2451_assertQScore.patch
>
>
> Attached is a patch that adds the following method to SolrTestCaseJ4:  (just 
> javadoc & signature shown)
> {code:java}
>   /**
>* Validates that the document at the specified index in the results has 
> the specified score, within 0.0001.
>*/
>   public static void assertQScore(SolrQueryRequest req, int docIdx, float 
> targetScore) {
> {code}
> This is especially useful for geospatial in which slightly different 
> precision deltas might occur when trying different geospatial indexing 
> strategies are used, assuming the score is some geospatial distance.  This 
> patch makes a simple modification to DistanceFunctionTest to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2510) Proximity search is not symmetric

2011-05-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2510.


Resolution: Not A Problem

This is the expected behavior for Phrase queries.

"slop" is specified as an edit distance...
http://lucene.apache.org/java/2_9_1/api/core/org/apache/lucene/search/PhraseQuery.html#setSlop%28int%29

These two queries are not equivalent...

{noformat}
  "WORD_D WORD_G"~3
  "WORD_G WORD_D"~3
{noformat}

the order of the terms as specified in the PhrasQuery matters for determining 
the edit distance.

> Proximity search is not symmetric
> -
>
> Key: SOLR-2510
> URL: https://issues.apache.org/jira/browse/SOLR-2510
> Project: Solr
>  Issue Type: Bug
>  Components: search, web gui
>Affects Versions: 3.1
> Environment: Ubuntu 10.04
>Reporter: mark risher
>
> The proximity search is incorrect on words occurring *before* the matching 
> term. It matches documents that are _less-than_ N words before and 
> _less-than-or-equal-to_ N words after.
> For example, use the following document:
>{{WORD_A WORD_B WORD_C WORD_D WORD_E WORD_F WORD_G}}
> *Expected result:* Both of the following queries should match:
> 1) {{"WORD_D WORD_G"~3}}
> 2) {{"WORD_G WORD_D"~3}}
> *Actual result:* Only #1 matches. For some reason, it thinks the distance 
> from D to G is 3, but from G to D is 4.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles

2011-05-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2451:
---

Affects Version/s: (was: 3.2)
Fix Version/s: 4.0
 Assignee: Hoss Man
  Summary: Enhance SolrTestCaseJ4 to allow tests to account for 
small deltas when comparing floats/doubles  (was: Add assertQScore() to 
SolrTestCaseJ4 to account for small deltas )

Committed revision 1102907.


> Enhance SolrTestCaseJ4 to allow tests to account for small deltas when 
> comparing floats/doubles
> ---
>
> Key: SOLR-2451
> URL: https://issues.apache.org/jira/browse/SOLR-2451
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2451.patch, SOLR-2451.patch, 
> SOLR-2451_assertQScore.patch
>
>
> Attached is a patch that adds the following method to SolrTestCaseJ4:  (just 
> javadoc & signature shown)
> {code:java}
>   /**
>* Validates that the document at the specified index in the results has 
> the specified score, within 0.0001.
>*/
>   public static void assertQScore(SolrQueryRequest req, int docIdx, float 
> targetScore) {
> {code}
> This is especially useful for geospatial in which slightly different 
> precision deltas might occur when trying different geospatial indexing 
> strategies are used, assuming the score is some geospatial distance.  This 
> patch makes a simple modification to DistanceFunctionTest to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2451) Enhance SolrTestCaseJ4 to allow tests to account for small deltas when comparing floats/doubles

2011-05-13 Thread Hoss Man (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-2451.


Resolution: Fixed

thanks for bringing this up david

> Enhance SolrTestCaseJ4 to allow tests to account for small deltas when 
> comparing floats/doubles
> ---
>
> Key: SOLR-2451
> URL: https://issues.apache.org/jira/browse/SOLR-2451
> Project: Solr
>  Issue Type: Improvement
>Reporter: David Smiley
>Assignee: Hoss Man
>Priority: Minor
> Fix For: 4.0
>
> Attachments: SOLR-2451.patch, SOLR-2451.patch, 
> SOLR-2451_assertQScore.patch
>
>
> Attached is a patch that adds the following method to SolrTestCaseJ4:  (just 
> javadoc & signature shown)
> {code:java}
>   /**
>* Validates that the document at the specified index in the results has 
> the specified score, within 0.0001.
>*/
>   public static void assertQScore(SolrQueryRequest req, int docIdx, float 
> targetScore) {
> {code}
> This is especially useful for geospatial in which slightly different 
> precision deltas might occur when trying different geospatial indexing 
> strategies are used, assuming the score is some geospatial distance.  This 
> patch makes a simple modification to DistanceFunctionTest to use it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM

2011-05-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3095:
---

Attachment: LUCENE-3095.patch

Nice catch selckin!

I was finally able to repro the OOME.

I think the attached patch should fix it.

> TestIndexWriter#testThreadInterruptDeadlock fails with OOM 
> ---
>
> Key: LUCENE-3095
> URL: https://issues.apache.org/jira/browse/LUCENE-3095
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index, Tests
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3095.patch
>
>
> Selckin reported a repeatedly failing test that throws OOM Exceptions. 
> According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes 
> about 400MB heapspace containing 4194304 entries. Seems kind of way too many 
> though :)
> {noformat}
>  [junit] java.lang.OutOfMemoryError: Java heap space
> [junit] Dumping heap to /tmp/java_pid25990.hprof ...
> [junit] Heap dump file created [520807744 bytes in 4.250 secs]
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] 
> [junit] junit.framework.AssertionFailedError: 
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] FAILED; unexpected exception
> [junit] java.lang.OutOfMemoryError: Java heap space
> [junit]   at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85)
> [junit]   at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58)
> [junit]   at 
> org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
> [junit]   at 
> org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171)
> [junit]   at 
> org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155)
> [junit]   at 
> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223)
> [junit]   at 
> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189)
> [junit]   at 
> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959)
> [junit]   at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154)
> [junit] -  ---
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testThreadInterruptDeadlock 
> -Dtests.seed=7183538093651149:3431510331342554160
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testThreadInterruptDeadlock 
> -Dtests.seed=7183538093651149:3431510331342554160
> [ju

[jira] [Assigned] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM

2011-05-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3095:
--

Assignee: Michael McCandless

> TestIndexWriter#testThreadInterruptDeadlock fails with OOM 
> ---
>
> Key: LUCENE-3095
> URL: https://issues.apache.org/jira/browse/LUCENE-3095
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index, Tests
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
> Fix For: 4.0
>
>
> Selckin reported a repeatedly failing test that throws OOM Exceptions. 
> According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes 
> about 400MB heapspace containing 4194304 entries. Seems kind of way too many 
> though :)
> {noformat}
>  [junit] java.lang.OutOfMemoryError: Java heap space
> [junit] Dumping heap to /tmp/java_pid25990.hprof ...
> [junit] Heap dump file created [520807744 bytes in 4.250 secs]
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] 
> [junit] junit.framework.AssertionFailedError: 
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] FAILED; unexpected exception
> [junit] java.lang.OutOfMemoryError: Java heap space
> [junit]   at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85)
> [junit]   at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58)
> [junit]   at 
> org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
> [junit]   at 
> org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171)
> [junit]   at 
> org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155)
> [junit]   at 
> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223)
> [junit]   at 
> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189)
> [junit]   at 
> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959)
> [junit]   at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154)
> [junit] -  ---
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testThreadInterruptDeadlock 
> -Dtests.seed=7183538093651149:3431510331342554160
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testThreadInterruptDeadlock 
> -Dtests.seed=7183538093651149:3431510331342554160
> [junit] The following exceptions were thrown by threads:
> [junit] *** Thread: Thread-379 ***
> [junit] java.lang.RuntimeException: Mock

[jira] [Resolved] (LUCENE-3058) FST should allow more than one output for the same input

2011-05-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-3058.


Resolution: Fixed

> FST should allow more than one output for the same input
> 
>
> Key: LUCENE-3058
> URL: https://issues.apache.org/jira/browse/LUCENE-3058
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch
>
>
> For the block tree terms dict, it turns out I need this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3058) FST should allow more than one output for the same input

2011-05-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033314#comment-13033314
 ] 

Michael McCandless commented on LUCENE-3058:


OK thanks Uwe... I'll commit.

> FST should allow more than one output for the same input
> 
>
> Key: LUCENE-3058
> URL: https://issues.apache.org/jira/browse/LUCENE-3058
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch
>
>
> For the block tree terms dict, it turns out I need this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3058) FST should allow more than one output for the same input

2011-05-13 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033309#comment-13033309
 ] 

Uwe Schindler commented on LUCENE-3058:
---

After reviewing, this seems the only solution. The cast is guarded by the 
instanceof check, but compiler does not know this.

Only the (Object) cast in second param is not needed:

{code}
@SuppressWarnings("unchecked") final Builder b = (Builder) 
builder;
b.add(pair.input, _outputs.get(twoLongs.first));
b.add(pair.input, _outputs.get(twoLongs.second));
{code}

> FST should allow more than one output for the same input
> 
>
> Key: LUCENE-3058
> URL: https://issues.apache.org/jira/browse/LUCENE-3058
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch
>
>
> For the block tree terms dict, it turns out I need this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM

2011-05-13 Thread selckin (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033296#comment-13033296
 ] 

selckin commented on LUCENE-3095:
-

I believe the test is wrong, and it can get back into the thread's while(true) 
before setting the finish flag and after the last interrupt and therefor never 
end, the inner while(true) should probably be a while(!finish) aswel.


> TestIndexWriter#testThreadInterruptDeadlock fails with OOM 
> ---
>
> Key: LUCENE-3095
> URL: https://issues.apache.org/jira/browse/LUCENE-3095
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index, Tests
>Affects Versions: 4.0
>Reporter: Simon Willnauer
> Fix For: 4.0
>
>
> Selckin reported a repeatedly failing test that throws OOM Exceptions. 
> According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes 
> about 400MB heapspace containing 4194304 entries. Seems kind of way too many 
> though :)
> {noformat}
>  [junit] java.lang.OutOfMemoryError: Java heap space
> [junit] Dumping heap to /tmp/java_pid25990.hprof ...
> [junit] Heap dump file created [520807744 bytes in 4.250 secs]
> [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
> [junit] Testcase: 
> testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] 
> [junit] junit.framework.AssertionFailedError: 
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Testcase: 
> testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter):   FAILED
> [junit] Some threads threw uncaught exceptions!
> [junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
> exceptions!
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
> [junit]   at 
> org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
> [junit] 
> [junit] 
> [junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec
> [junit] 
> [junit] - Standard Output ---
> [junit] FAILED; unexpected exception
> [junit] java.lang.OutOfMemoryError: Java heap space
> [junit]   at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85)
> [junit]   at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58)
> [junit]   at 
> org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
> [junit]   at 
> org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171)
> [junit]   at 
> org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155)
> [junit]   at 
> org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223)
> [junit]   at 
> org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189)
> [junit]   at 
> org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959)
> [junit]   at 
> org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230)
> [junit]   at 
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211)
> [junit]   at 
> org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154)
> [junit] -  ---
> [junit] - Standard Error -
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testThreadInterruptDeadlock 
> -Dtests.seed=7183538093651149:3431510331342554160
> [junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
> -Dtestmethod=testThreadInterruptDeadlock 
> -Dtests.seed=71

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8027 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8027/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-13 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033279#comment-13033279
 ] 

Earwin Burrfoot commented on LUCENE-2793:
-

As mentioned @LUCENE-3092, it would be nice not to include the OneMerge, but 
some meaningful value like 'expectedSize', 'expectedSegmentSize' or whatnot, 
that would work both for merges *and* flushes, and also won't introduce 
needless dependency on MergePolicy.

> Directory createOutput and openInput should take an IOContext
> -
>
> Key: LUCENE-2793
> URL: https://issues.apache.org/jira/browse/LUCENE-2793
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Reporter: Michael McCandless
>Assignee: Simon Willnauer
>  Labels: gsoc2011, lucene-gsoc-11, mentor
> Attachments: LUCENE-2793.patch
>
>
> Today for merging we pass down a larger readBufferSize than for searching 
> because we get better performance.
> I think we should generalize this to a class (IOContext), which would hold 
> the buffer size, but then could hold other flags like DIRECT (bypass OS's 
> buffer cache), SEQUENTIAL, etc.
> Then, we can make the DirectIOLinuxDirectory fully usable because we would 
> only use DIRECT/SEQUENTIAL during merging.
> This will require fixing how IW pools readers, so that a reader opened for 
> merging is not then used for searching, and vice/versa.  Really, it's only 
> all the open file handles that need to be different -- we could in theory 
> share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3058) FST should allow more than one output for the same input

2011-05-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3058:
---

Attachment: LUCENE-3058.patch

I think we should just suppress the warning?  

> FST should allow more than one output for the same input
> 
>
> Key: LUCENE-3058
> URL: https://issues.apache.org/jira/browse/LUCENE-3058
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3058.patch, LUCENE-3058.patch, LUCENE-3058.patch
>
>
> For the block tree terms dict, it turns out I need this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8030 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8030/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-3058) FST should allow more than one output for the same input

2011-05-13 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-3058:



Reopening -- this commit has generics violations in TestFSTs.

> FST should allow more than one output for the same input
> 
>
> Key: LUCENE-3058
> URL: https://issues.apache.org/jira/browse/LUCENE-3058
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.0
>
> Attachments: LUCENE-3058.patch, LUCENE-3058.patch
>
>
> For the block tree terms dict, it turns out I need this case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8026 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8026/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1113 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: 
https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1113/

No tests ran.

Build Log (for compile errors):
[...truncated 26 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8029 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8029/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8025 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8025/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2514) Upgrade velocity-tools to released version

2011-05-13 Thread Craig Lewis (JIRA)

Upgrade velocity-tools to released version
--

 Key: SOLR-2514
 URL: https://issues.apache.org/jira/browse/SOLR-2514
 Project: Solr
  Issue Type: Task
  Components: web gui
Affects Versions: 3.1
 Environment: JBoss 6.0.0.Final on FreeBSD 8.2
Reporter: Craig Lewis
Priority: Minor


I'm deploying Solr 3.1.0 in JBoss 6.0.0.Final

In JBoss, I'm trying to deploy apache-solr-3.1.0/example/webapps/solr.war as a 
Web Application.  During deployment, JBoss returns an error:
  Deployment 
"vfs:///usr/local/jboss-6.0.0.Final/server/default/deploy/solr.war" is in error 
due to the following reason(s): org.xml.sax.SAXException: Element type 
"tlibversion" must be declared. @ 
vfs:///usr/local/jboss-6.0.0.Final/server/default/deploy/solr.war/WEB-INF/lib/velocity-tools-2.0-beta3.jar/META-INF/velocity-view.tld[22,16]

at 
org.rhq.plugins.jbossas5.util.DeploymentUtils.deployArchive(DeploymentUtils.java:146)
 [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0]
at 
org.rhq.plugins.jbossas5.deploy.AbstractDeployer.deploy(AbstractDeployer.java:119)
 [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0]
at 
org.rhq.plugins.jbossas5.helper.CreateChildResourceFacetDelegate.createContentBasedResource(CreateChildResourceFacetDelegate.java:124)
 [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0]
at 
org.rhq.plugins.jbossas5.helper.CreateChildResourceFacetDelegate.createResource(CreateChildResourceFacetDelegate.java:56)
 [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0]
at 
org.rhq.plugins.jbossas5.ApplicationServerComponent.createResource(ApplicationServerComponent.java:304)
 [jopr-jboss-as-5-plugin-3.0.0.jar:3.0.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [:1.6.0]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
[:1.6.0]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 [:1.6.0]
at java.lang.reflect.Method.invoke(Method.java:616) [:1.6.0]
at 
org.rhq.core.pc.inventory.ResourceContainer$ComponentInvocationThread.call(ResourceContainer.java:525)
 [:3.0.0]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) 
[:1.6.0]
at java.util.concurrent.FutureTask.run(FutureTask.java:166) [:1.6.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) 
[:1.6.0]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) 
[:1.6.0]
at java.lang.Thread.run(Thread.java:679) [:1.6.0]


After a bit of digging, I found that there was a bug in 
solr.war/WEB-INF/lib/velocity-tools-2.0-beta3.jar/META-INF/velocity-view.tld, 
[https://issues.apache.org/jira/browse/VELTOOLS-120]

The latest version of velocity-tools, velocity-tools-2.0.jar (available at 
[http://velocity.apache.org/download.cgi] ) include this bugfix.

To test, I unziped solr.war, deleted 
solr.war/WEB-INF/lib/velocity-tools-2.0-beta3.jar, added 
solr.war/WEB-INF/lib/velocity-tools-2.0.jar, and re-zipped solr.war.  I am able 
to deploy this new .war file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8028 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8028/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3092) NRTCachingDirectory, to buffer small segments in a RAMDir

2011-05-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033240#comment-13033240
 ] 

Michael McCandless commented on LUCENE-3092:


{quote}
IOCtx should have a value 'expectedSize', or 'priority', or something similar.
This does not introduce a transitive dependency of Directory from MergePolicy 
(to please you once more - a true WTF),
{quote}

Ahh, good point.  So, for this dir impl I want to say "if net seg size is < X 
MB, cache it in RAM", so I guess we could have something like 
"expectedSizeOfSegmentMB" (covers all files that will be flushed for this 
segment, hmm minus the doc stores) in the IOCtx.

> NRTCachingDirectory, to buffer small segments in a RAMDir
> -
>
> Key: LUCENE-3092
> URL: https://issues.apache.org/jira/browse/LUCENE-3092
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Reporter: Michael McCandless
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3092-listener.patch, LUCENE-3092.patch
>
>
> I created this simply Directory impl, whose goal is reduce IO
> contention in a frequent reopen NRT use case.
> The idea is, when reopening quickly, but not indexing that much
> content, you wind up with many small files created with time, that can
> possibly stress the IO system eg if merges, searching are also
> fighting for IO.
> So, NRTCachingDirectory puts these newly created files into a RAMDir,
> and only when they are merged into a too-large segment, does it then
> write-through to the real (delegate) directory.
> This lets you spend some RAM to reduce I0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Commented] (LUCENENET-412) Replacing ArrayLists, Hashtables etc. with appropriate Generics.

2011-05-13 Thread Digy (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033235#comment-13033235
 ] 

Digy commented on LUCENENET-412:


Some samples to show the diffs of 2.9.4 & 2.9.4g on Readability.
{code}
From:
((System.Collections.IList) ((System.Collections.ArrayList) 
segmentInfos).GetRange(start, start + merge.segments.Count - start)).Clear();
To:
segmentInfos.RemoveRange(start, start + merge.segments.Count - start);

-
From:
System.Collections.IEnumerator it = ((System.Collections.ICollection) 
readerToFields[reader]).GetEnumerator();
while (it.MoveNext())
{
if (fieldSelector.Accept((System.String) it.Current) != 
FieldSelectorResult.NO_LOAD)
{
include = true;
break;
}
}
To:
foreach (string x in readerToFields[reader])
{
if (fieldSelector.Accept(x) != FieldSelectorResult.NO_LOAD)
{
include = true;
break;
}
}

-
From:
for (System.Collections.IEnumerator iter = weights.GetEnumerator(); 
iter.MoveNext(); )
{
((Weight) iter.Current).Normalize(norm);
}
To:
foreach(Weight w in weights)
{
w.Normalize(norm);
}

-
From:   
public virtual System.Collections.IList GetTermArrays()
{
return (System.Collections.IList) 
System.Collections.ArrayList.ReadOnly(new 
System.Collections.ArrayList(termArrays));
}
To:
public virtual List GetTermArrays()
{
return new List(termArrays);
}

-
From:   
System.Collections.ArrayList results = new 
System.Collections.ArrayList();

return (TermFreqVector[]) results.ToArray(typeof(TermFreqVector));
To:
 List results = new List();
 ...
 return results.ToArray();

{code}

DIGY

> Replacing ArrayLists, Hashtables etc. with appropriate Generics.
> 
>
> Key: LUCENENET-412
> URL: https://issues.apache.org/jira/browse/LUCENENET-412
> Project: Lucene.Net
>  Issue Type: Improvement
>Affects Versions: Lucene.Net 2.9.4
>Reporter: Digy
>Priority: Minor
> Fix For: Lucene.Net 2.9.4
>
> Attachments: IEquatable for Query&Subclasses.patch, 
> LUCENENET-412.patch, lucene_2.9.4g_exceptions_fix
>
>
> This will move Lucene.Net.2.9.4 closer to lucene.3.0.3 and allow some 
> performance gains.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8024 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8024/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3094.
-

Resolution: Fixed

Committed revision 1102875.

> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033229#comment-13033229
 ] 

Robert Muir commented on LUCENE-3094:
-

Thanks guys, I'll add a test for this and commit.

> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter

2011-05-13 Thread Gabriele Kahlout (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033227#comment-13033227
 ] 

Gabriele Kahlout commented on SOLR-2513:


+1 for @lucene.internal. I'd use language features such as final only if there 
was a 'technical' reason (@see 
http://download.oracle.com/javase/tutorial/java/IandI/final.html).

> Allow to subclass org.apache.solr.response.XMLWriter 
> -
>
> Key: SOLR-2513
> URL: https://issues.apache.org/jira/browse/SOLR-2513
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Reporter: Gabriele Kahlout
>Assignee: Ryan McKinley
>Priority: Trivial
> Attachments: SOLR-2513.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Hacking/debugging/extending Solr with one's own ResponseWriter one might want 
> to inherit functionality from XMLWriter. A trivial example is overriding 
> writeDate(..) to use a different calendar/format.
> I asked about why it's made final on the mailing list[1].
> [1] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8027 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8027/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter

2011-05-13 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033215#comment-13033215
 ] 

Ryan McKinley commented on SOLR-2513:
-

ResponseWriters in general are pretty ugly

maybe just mark @lucene.internal

and let people subclass at their own risk.  The likelyhood of this getting a 
real cleanup soon is pretty low

> Allow to subclass org.apache.solr.response.XMLWriter 
> -
>
> Key: SOLR-2513
> URL: https://issues.apache.org/jira/browse/SOLR-2513
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Reporter: Gabriele Kahlout
>Assignee: Ryan McKinley
>Priority: Trivial
> Attachments: SOLR-2513.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Hacking/debugging/extending Solr with one's own ResponseWriter one might want 
> to inherit functionality from XMLWriter. A trivial example is overriding 
> writeDate(..) to use a different calendar/format.
> I asked about why it's made final on the mailing list[1].
> [1] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033210#comment-13033210
 ] 

Michael McCandless commented on LUCENE-3094:


+1 -- we shouldn't create these scary states.

> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8023 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8023/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033209#comment-13033209
 ] 

Dawid Weiss commented on LUCENE-3094:
-

This looks good to me. And even if it doesn't affect performance it definitely 
should help those poor souls wishing to actually understand this algorithm :)

> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter

2011-05-13 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033207#comment-13033207
 ] 

Hoss Man commented on SOLR-2513:


there was some discussion about this in the past.

the crux of the concern was that the API is really ugly and wide open and never 
really intended for general use (i think the class was initially a static 
private inner class of the XmlResponseWriter before some refactoring) and it 
should probably be cleaned up before encouraging general use from people who 
write plugins 


> Allow to subclass org.apache.solr.response.XMLWriter 
> -
>
> Key: SOLR-2513
> URL: https://issues.apache.org/jira/browse/SOLR-2513
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Reporter: Gabriele Kahlout
>Assignee: Ryan McKinley
>Priority: Trivial
> Attachments: SOLR-2513.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Hacking/debugging/extending Solr with one's own ResponseWriter one might want 
> to inherit functionality from XMLWriter. A trivial example is overriding 
> writeDate(..) to use a different calendar/format.
> I asked about why it's made final on the mailing list[1].
> [1] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter

2011-05-13 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-2513:


Attachment: SOLR-2513.patch

trivial path (but added some cleanup whie we are at it)

> Allow to subclass org.apache.solr.response.XMLWriter 
> -
>
> Key: SOLR-2513
> URL: https://issues.apache.org/jira/browse/SOLR-2513
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Reporter: Gabriele Kahlout
>Priority: Trivial
> Attachments: SOLR-2513.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Hacking/debugging/extending Solr with one's own ResponseWriter one might want 
> to inherit functionality from XMLWriter. A trivial example is overriding 
> writeDate(..) to use a different calendar/format.
> I asked about why it's made final on the mailing list[1].
> [1] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter

2011-05-13 Thread Ryan McKinley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-2513:


Assignee: Ryan McKinley

any objections?




> Allow to subclass org.apache.solr.response.XMLWriter 
> -
>
> Key: SOLR-2513
> URL: https://issues.apache.org/jira/browse/SOLR-2513
> Project: Solr
>  Issue Type: Improvement
>  Components: Response Writers
>Reporter: Gabriele Kahlout
>Assignee: Ryan McKinley
>Priority: Trivial
> Attachments: SOLR-2513.patch
>
>   Original Estimate: 1m
>  Remaining Estimate: 1m
>
> Hacking/debugging/extending Solr with one's own ResponseWriter one might want 
> to inherit functionality from XMLWriter. A trivial example is overriding 
> writeDate(..) to use a different calendar/format.
> I asked about why it's made final on the mailing list[1].
> [1] 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8026 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8026/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2513) Allow to subclass org.apache.solr.response.XMLWriter

2011-05-13 Thread Gabriele Kahlout (JIRA)

Allow to subclass org.apache.solr.response.XMLWriter 
-

 Key: SOLR-2513
 URL: https://issues.apache.org/jira/browse/SOLR-2513
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Reporter: Gabriele Kahlout
Priority: Trivial


Hacking/debugging/extending Solr with one's own ResponseWriter one might want 
to inherit functionality from XMLWriter. A trivial example is overriding 
writeDate(..) to use a different calendar/format.

I asked about why it's made final on the mailing list[1].


[1] 
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3cbanlktin4mxiybzw3ck-k4gwq4o6nnc2...@mail.gmail.com%3E
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8022 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8022/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8025 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8025/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8021 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8021/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8024 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8024/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: GSoC: LUCENE-2308: Separately specify a field's type

2011-05-13 Thread Nikola Tanković

2011/5/12 Michael McCandless 

> 2011/5/9 Nikola Tanković :
>
> >> > Introduction of an FieldType class that will hold all the extra
> >> > properties
> >> > now stored inside Field instance other than field value itself.
> >>
> >> Seems like this is an easy first baby step -- leave current Field
> >> class, but break out the "type" details into a separate class that can
> >> be shared across Field instances.
> >
> > Yes, I agree, this could be a good first step. Mike submitted a patch on
> > issue #2308. I think it's a solid base for this.
>
> Make that Chris.
>

Ouch, sorry!


>
> >> > New FieldTypeAttribute interface will be added to handle extension
> with
> >> > new
> >> > field properties inspired by IndexWriterConfig.
> >>
> >> How would this work?  What's an example compelling usage?  An app
> >> could use this for extensibility, and then make a matching codec that
> >> picks up this attr?  EG, say, maybe for marking that a field is a
> >> "primary key field" and then codec could optimize accordingly...?
> >
> > Well that could be very interesting scenario. It didn't rang a bell to me
> > for possible codec usage, but it seems very reasonable. Attributes
> otherwise
> > don't make much sense, unless propertly used in custom codecs.
> >
> > How will we ensure attribute and codec compatibility?
>
> I'm just thinking we should have concrete reasons in mind for cutting
> over to attributes here... I'd rather see a fixed, well thought out
> concrete FieldType hierarchy first...
>

Yes, I couldn't agree more, and I also think Chris has some great ideas on
this field, given his work on Spatial indexing which tends to have use of
this additional attributes.


>
> >> > Refactoring and dividing of settings for term frequency and
> positioning
> >> > can
> >> > also be done (LUCENE-2048)
> >>
> >> Ahh great!  So we can omit-positions-but-not-TF.
> >>
> >> > Discuss possible effects of completion of LUCENE-2310 on this project
> >>
> >> This one is badly needed... but we should keep your project focused.
> >
> >
> > We'll tackle this one afterwards.
>
> Good.
>
> >> > Adequate Factory class for easier configuration of new Field instances
> >> > together with manually added new FieldTypeAttributes
> >> > FieldType, once instantiated is read-only. Only fields value can be
> >> > changed.
> >>
> >> OK.
> >>
> >> > Simple hierarchy of Field classes with core properties logically
> >> > predefaulted. E.g.:
> >> >
> >> > NumberField,
> >>
> >> Can't this just be our existing NumericField?
> >
> > Yes, this is classic NumericField with changes proposed in LUCENE-2310.
> Tim
> > Smith mentioned that Fieldable class should be kept for custom
> > implementations to reduce number of setters (for defaults).
> > Chris Male suggested new CoreFieldTypeAttribute interface, so maybe it
> > should be implemented instead of Fieldable for custom implementations, so
> > both Fieldable and AbstractField are not needed anymore.
> > In my opinion Field shoud become abstract extended with others.
> > Another proposal: how about keeping only Field (with no hierarchy) and
> move
> > hierarchy to FieldType, such as NumericFieldType, StringFieldType since
> this
> > hierarchy concerns type information only?
>
> I think hierarchy of both types and the "value containers" that hold
> the corresponding values could make sense?
>

Hmm, I think we should get more opinions on this one also.


>
> > e.g. Usage:
> > FieldType number = new NumericFieldType();
> > Field price = new Field();
> > price.setType(number);
> > // but this is much cleaner...
> > Field price = new NumericField();
> > so maybe whe should have paraller XYZField with XYZFieldType...
> > Am I complicating?
> >>
> >> > StringField,
> >>
> >> This would be like NOT_ANALYZED?
> >
> > Yes, strings are often one word only. Or maybe we can name it NameField,
> > NonAnalyzedField or something.
>
> StringField sounds good actually...
>
> >> > TextField,
> >>
> >> This would be ANALYZED?
> >
> > Yes.
> >
>
> OK.
>
> >> > What is the best way to break this into small baby steps?
> >>
> >> Hopefully this becomes clearer as we iterate.
> >
> > Well, we know the first step: moving type details into FieldType class.
>
> Yes!
>
> Somehow tying into this as well is a stronger decoupling of the
> indexer from analysis/document.  Ie, what indexer needs of a document
> is very minimal -- just an iterable over indexed & stored values.
> Separately we can still provide a "full featured" Document class w/
> add, get, remove, etc., but that's "outside" of the indexer.
>

I'll get back to this one after additional research. Maybe we should do
couple of more interactions, then I'll summarize the conclusions.


>
> Mike
>
> http://blog.mikemccandless.com


Nikola

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8020 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8020/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3040) analysis consumers should use reusable tokenstreams

2011-05-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3040.
-

Resolution: Fixed

Committed revision 1102817, 1102820

> analysis consumers should use reusable tokenstreams
> ---
>
> Key: LUCENE-3040
> URL: https://issues.apache.org/jira/browse/LUCENE-3040
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3040.patch, LUCENE-3040.patch
>
>
> Some analysis consumers (highlighter, more like this, memory index, contrib 
> queryparser, ...) are using Analyzer.tokenStream but should be using 
> Analyzer.reusableTokenStream instead for better performance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8023 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8023/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3064) add checks to MockTokenizer to enforce proper consumption

2011-05-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3064.
-

Resolution: Fixed

backported to 3.x in revision 1102812

> add checks to MockTokenizer to enforce proper consumption
> -
>
> Key: LUCENE-3064
> URL: https://issues.apache.org/jira/browse/LUCENE-3064
> Project: Lucene - Java
>  Issue Type: Test
>Reporter: Robert Muir
> Fix For: 3.2, 4.0
>
> Attachments: LUCENE-3064.patch, LUCENE-3064.patch, LUCENE-3064.patch, 
> LUCENE-3064.patch
>
>
> we can enforce things like consumer properly iterates through tokenstream 
> lifeycle
> via MockTokenizer. this could catch bugs in consumers that don't call 
> reset(), etc.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8019 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8019/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8022 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8022/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8018 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8018/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-docvalues-branch - Build # 1112 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: 
https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-docvalues-branch/1112/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8021 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8021/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8017 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8017/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8020 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8020/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

2011-05-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033080#comment-13033080
 ] 

Simon Willnauer commented on LUCENE-3090:
-

bq. But shouldn't stallControl kick in in that case? Ie, we stall all indexing 
if the number of flush-pending DWPTs is >= the number of active DWPTs, I think?
Right so lets say we have two active thread states:
1. thread 1 starts indexing (max ram is 16M) it indexes n docs and has 15.9 MB 
ram used. Now n+1 doc comes in has 5MB (active mem= 20.9M flush Mem: 0M)

2. take it out for flush (active mem=0M flush Mem: 20.9M)

3. thread 2 starts indexing and fills ram quickly ending up with 18M memory 
(active mem=18M flush Mem: 20.9M)
4. take thread 2 out for flush (active mem=0M flush Mem: 38.9M)
5. thread 3 has already started indexing and reaches the RAM threshold (16M) so 
we have: (active mem=16M flush Mem: 38.9M)
6. take it out for flushing (now we stall currently) (active mem=0M flush Mem: 
54.9M) - this is more than 3x max ram buffer.

we currently stall at  flush-pending DWPTs is > (num active DWPT + 1) we can 
reduce that though but maybe we should swap back to ram based stalling?




> DWFlushControl does not take active DWPT out of the loop on fullFlush
> -
>
> Key: LUCENE-3090
> URL: https://issues.apache.org/jira/browse/LUCENE-3090
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Critical
> Fix For: 4.0
>
> Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by 
> DWFlushControl missing DWPT that are set as flushPending but can't full due 
> to a full flush going on. Yet that means that those DWPT are filling up in 
> the background while they should actually be checked out and blocked until 
> the full flush finishes. Even further we currently stall on the 
> maxNumThreadStates while we should stall on the num of active thread states. 
> I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8016 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8016/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine

2011-05-13 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-2512.
--

Resolution: Fixed

trunk: Committed revision 1102785.
3x: Committed revision 1102789.

> uima: add an ability to skip runtime error in AnalysisEngine
> 
>
> Key: SOLR-2512
> URL: https://issues.apache.org/jira/browse/SOLR-2512
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, 
> SOLR-2512.patch, SOLR-2512.patch
>
>
> Currently, if AnalysisEngine throws an exception during processing a text, 
> whole adding docs go fail. Because online NLP services are error-prone, users 
> should be able to choose whether solr skips the text processing (but source 
> text can be indexed) for the document or throws a runtime exception so that 
> solr can stop adding documents entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033072#comment-13033072
 ] 

Simon Willnauer edited comment on LUCENE-3094 at 5/13/11 3:19 PM:
--

Display the attached images

before:

!before.png!

after:

!after.png!

  was (Author: simonw):
Display the attached images

before:

!before.png|thumbnail!

after:

!after.png|thumbnail!
  
> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

2011-05-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033073#comment-13033073
 ] 

Michael McCandless commented on LUCENE-3090:


{quote}
bq. Could we add an assert that net flushPending + active RAM never exceeds 
some multiplier (2X?) of the configured max RAM?

net flush pending means? we only differ between flushing ram and active ram so 
flushing ram can easily get above such a limit if IO is slow...
{quote}

But shouldn't stallControl kick in in that case?  Ie, we stall all indexing if 
the number of flush-pending DWPTs is >= the number of active DWPTs, I think?

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> -
>
> Key: LUCENE-3090
> URL: https://issues.apache.org/jira/browse/LUCENE-3090
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Critical
> Fix For: 4.0
>
> Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by 
> DWFlushControl missing DWPT that are set as flushPending but can't full due 
> to a full flush going on. Yet that means that those DWPT are filling up in 
> the background while they should actually be checked out and blocked until 
> the full flush finishes. Even further we currently stall on the 
> maxNumThreadStates while we should stall on the num of active thread states. 
> I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033072#comment-13033072
 ] 

Simon Willnauer edited comment on LUCENE-3094 at 5/13/11 3:19 PM:
--

Display the attached images

before:

!before.png|thumbnail!

after:

!after.png|thumbnail!

  was (Author: simonw):
Display the attached images

before:

!before.png!

after:

!after.png!
  
> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3095) TestIndexWriter#testThreadInterruptDeadlock fails with OOM

2011-05-13 Thread Simon Willnauer (JIRA)

TestIndexWriter#testThreadInterruptDeadlock fails with OOM 
---

 Key: LUCENE-3095
 URL: https://issues.apache.org/jira/browse/LUCENE-3095
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index, Tests
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.0


Selckin reported a repeatedly failing test that throws OOM Exceptions. 
According to the heapdump the MockDirectoryWrapper#createdFiles HashSet takes 
about 400MB heapspace containing 4194304 entries. Seems kind of way too many 
though :)

{noformat}
 [junit] java.lang.OutOfMemoryError: Java heap space
[junit] Dumping heap to /tmp/java_pid25990.hprof ...
[junit] Heap dump file created [520807744 bytes in 4.250 secs]
[junit] Testsuite: org.apache.lucene.index.TestIndexWriter
[junit] Testcase: 
testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED
[junit] 
[junit] junit.framework.AssertionFailedError: 
[junit] at 
org.apache.lucene.index.TestIndexWriter.testThreadInterruptDeadlock(TestIndexWriter.java:2249)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
[junit] 
[junit] 
[junit] Testcase: 
testThreadInterruptDeadlock(org.apache.lucene.index.TestIndexWriter): FAILED
[junit] Some threads threw uncaught exceptions!
[junit] junit.framework.AssertionFailedError: Some threads threw uncaught 
exceptions!
[junit] at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:557)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1282)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1211)
[junit] 
[junit] 
[junit] Tests run: 67, Failures: 2, Errors: 0, Time elapsed: 3,254.884 sec
[junit] 
[junit] - Standard Output ---
[junit] FAILED; unexpected exception
[junit] java.lang.OutOfMemoryError: Java heap space
[junit] at org.apache.lucene.store.RAMFile.newBuffer(RAMFile.java:85)
[junit] at org.apache.lucene.store.RAMFile.addBuffer(RAMFile.java:58)
[junit] at 
org.apache.lucene.store.RAMOutputStream.switchCurrentBuffer(RAMOutputStream.java:132)
[junit] at 
org.apache.lucene.store.RAMOutputStream.copyBytes(RAMOutputStream.java:171)
[junit] at 
org.apache.lucene.store.MockIndexOutputWrapper.copyBytes(MockIndexOutputWrapper.java:155)
[junit] at 
org.apache.lucene.index.CompoundFileWriter.copyFile(CompoundFileWriter.java:223)
[junit] at 
org.apache.lucene.index.CompoundFileWriter.close(CompoundFileWriter.java:189)
[junit] at 
org.apache.lucene.index.SegmentMerger.createCompoundFile(SegmentMerger.java:138)
[junit] at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3344)
[junit] at 
org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2959)
[junit] at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:37)
[junit] at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1763)
[junit] at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1758)
[junit] at 
org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1754)
[junit] at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1373)
[junit] at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1230)
[junit] at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1211)
[junit] at 
org.apache.lucene.index.TestIndexWriter$IndexerThreadInterrupt.run(TestIndexWriter.java:2154)
[junit] -  ---
[junit] - Standard Error -
[junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testThreadInterruptDeadlock 
-Dtests.seed=7183538093651149:3431510331342554160
[junit] NOTE: reproduce with: ant test -Dtestcase=TestIndexWriter 
-Dtestmethod=testThreadInterruptDeadlock 
-Dtests.seed=7183538093651149:3431510331342554160
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: Thread-379 ***
[junit] java.lang.RuntimeException: MockDirectoryWrapper: cannot close: 
there are still open files: {_3r1n_0.tib=1, _3r1n_0.frq=1, _3r1n_0.pos=1, 
_3r1m.cfs=1, _3r1n_0.doc=1, _3r1n.tvf=1, _3r1n.tvd=1, _3r1n.tvx=1, _3r1n.fdx=1, 
_3r1n.fdt=1, _3r1q.cfs=1, _3r1o.cfs=1, _3r1n_0.skp=1, _3r1n_0.pyl=1}
[junit] at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:448)
[junit] at 
org.apache.lucene.index.TestIndexWriter$In

[jira] [Commented] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033072#comment-13033072
 ] 

Simon Willnauer commented on LUCENE-3094:
-

Display the attached images

before:

!before.png!

after:

!after.png!

> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8019 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8019/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8015 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8015/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3094:


Attachment: after.png
before.png

> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch, after.png, before.png
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8018 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8018/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2511) Make it easier to override SolrContentHandler newDocument

2011-05-13 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2511.
---

   Resolution: Fixed
Fix Version/s: 4.0
   3.2

> Make it easier to override SolrContentHandler newDocument
> -
>
> Key: SOLR-2511
> URL: https://issues.apache.org/jira/browse/SOLR-2511
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2511.patch
>
>
> The SolrContentHandler's newDocument method does a variety of things: adds 
> metadata, literals, content and catpured content.  We could split this out 
> into protected methods for each that makes it easier to override.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine

2011-05-13 Thread Tommaso Teofili (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033060#comment-13033060
 ] 

Tommaso Teofili commented on SOLR-2512:
---

+1

> uima: add an ability to skip runtime error in AnalysisEngine
> 
>
> Key: SOLR-2512
> URL: https://issues.apache.org/jira/browse/SOLR-2512
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, 
> SOLR-2512.patch, SOLR-2512.patch
>
>
> Currently, if AnalysisEngine throws an exception during processing a text, 
> whole adding docs go fail. Because online NLP services are error-prone, users 
> should be able to choose whether solr skips the text processing (but source 
> text can be indexed) for the document or throws a runtime exception so that 
> solr can stop adding documents entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.

2011-05-13 Thread digy digy

Hi Vincent,
My first goal was to replace ArrayList, Hashtables, Enumerators etc. as
quickly as possible. Applying best practices could wait till a more cleaner
code .

The purpose for Support.Set was to have a collection that can be accessed
with indexer and also implements the method "Contains".
It was a quick solution to the problem.

Similarly, Support.Dictionary was just to be able to return null when a
collection didn't contain the item(without exception).
Changing zillions of lines with if(coll.ContainsKey(...)) seemed too hard to
me at that time(forgetting one results in weird effects at runtime not at
compile time).

DIGY

On Fri, May 13, 2011 at 4:22 PM, Van Den Berghe, Vincent (JIRA) <
j...@apache.org> wrote:

>
>[
> https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033031#comment-13033031]
>
> Van Den Berghe, Vincent commented on LUCENENET-414:
> ---
>
> Hello Digy,
>
> Thanks for your response.
> I don't want to sound overly pedantic (but please tell me if I do), but
> this changed implementation solves only part of the problem.
> Now, CharArraySet derives from Set, which itself derives from List.
> Items are now stored both in this base class, as in the private
> HashSet _Set.
> However, because List doesn't define its modifiers Add(T), Clear() and
> Remove(T) as virtual, the derived implementation defines them as "new".
> This violates a variant of the Liskov substitution principle: an operation
> on the derived type has not the same effect as the same operation on the
> base type.
> In this case, it means that the following code will cause the items in the
> List base type and in the _Set to be desynchronized:
>
>CharArraySet set=...
>List same=set;
>same.Add("whatever");
>// at this point, same.Contains("whatever")==true but
> set.Contains("whatever")==false even though it's the same instance.
>
> You might rightfully retort that this never happens and I should mind my
> own business, but I know at least one poor soul who did just that: me :-(.
>
> On a completely unrelated matter, the new implementation has 2 methods:
>
>public void Add(System.Collections.Generic.IList
> items)
>public void Add(Support.Set items)
>
> .. which can be collapsed into one, since the only thing used in both cases
> is the enumerator:
>
>public void Add(IEnumerable items)
>
> I don't recall the design rule, but it's something like "to increase reuse,
> make your function parameters are general as possible, but their return
> value as specific as possible".
> I am unable to get 2.9.4g to investigate further, but if you are moving
> towards the Generic collections in Lucene, the following implementation
> should be a drop-in replacement, without suffering from the aforementioned
> quirks:
>
>[Serializable]
>public class Set : ICollection
>{
>private readonly
> System.Collections.Generic.HashSet _Set = new
> System.Collections.Generic.HashSet();
>bool _ReadOnly = false;
>
>public Set()
>{
>}
>
>public Set(bool readOnly)
>{
>this._ReadOnly = readOnly;
>}
>
>public bool ReadOnly
>{
>set
>{
>_ReadOnly = value;
>}
>get
>{
>return _ReadOnly;
>}
>}
>
>public virtual void Add(T item)
>{
>if (_ReadOnly) throw new
> NotSupportedException();
>if (_Set.Contains(item)) return;
>_Set.Add(item);
>}
>
>public void Add(IEnumerable items)
>{
>if (_ReadOnly) throw new
> NotSupportedException();
>foreach (T item in items)
>{
>if (_Set.Contains(item)) continue;
>_Set.Add(item);
>}
>}
>
>
>public void Clear()
>{
>if (_ReadOnly) throw new
> NotSupportedException();
>_Set.Clear();
>}
>
>public b

[jira] [Commented] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine

2011-05-13 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033058#comment-13033058
 ] 

Koji Sekiguchi commented on SOLR-2512:
--

I'll commit soon.

> uima: add an ability to skip runtime error in AnalysisEngine
> 
>
> Key: SOLR-2512
> URL: https://issues.apache.org/jira/browse/SOLR-2512
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, 
> SOLR-2512.patch, SOLR-2512.patch
>
>
> Currently, if AnalysisEngine throws an exception during processing a text, 
> whole adding docs go fail. Because online NLP services are error-prone, users 
> should be able to choose whether solr skips the text processing (but source 
> text can be indexed) for the document or throws a runtime exception so that 
> solr can stop adding documents entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2512) uima: add an ability to skip runtime error in AnalysisEngine

2011-05-13 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-2512:


Assignee: Koji Sekiguchi

> uima: add an ability to skip runtime error in AnalysisEngine
> 
>
> Key: SOLR-2512
> URL: https://issues.apache.org/jira/browse/SOLR-2512
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 3.1
>Reporter: Koji Sekiguchi
>Assignee: Koji Sekiguchi
>Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2512.patch, SOLR-2512.patch, SOLR-2512.patch, 
> SOLR-2512.patch, SOLR-2512.patch
>
>
> Currently, if AnalysisEngine throws an exception during processing a text, 
> whole adding docs go fail. Because online NLP services are error-prone, users 
> should be able to choose whether solr skips the text processing (but source 
> text can be indexed) for the document or throws a runtime exception so that 
> solr can stop adding documents entirely.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8014 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8014/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3094:


Attachment: LUCENE-3094.patch

> optimize lev automata construction
> --
>
> Key: LUCENE-3094
> URL: https://issues.apache.org/jira/browse/LUCENE-3094
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.0
>
> Attachments: LUCENE-3094.patch
>
>
> in our lev automata algorithm, we compute an upperbound of the maximum 
> possible states (not the true number), and
> create some "useless" unconnected states "floating around".
> this isn't harmful, in the original impl we did the Automaton is simply a 
> pointer to the initial state, and all algorithms
> traverse this list, so effectively the useless states were dropped 
> immediately. But recently we changed automaton to
> cache its numberedStates, and we set them here, so these useless states are 
> being kept around.
> it has no impact on performance, but can be really confusing if you are 
> debugging (e.g. toString). Thanks to Dawid Weiss
> for noticing this. 
> at the same time, forcing an extra traversal is a bit scary, so i did some 
> benchmarking with really long strings and found
> that actually its helpful to reduce() the number of transitions (typically 
> cuts them in half) for these long strings, as it
> speeds up some later algorithms. 
> won't see any speedup for short terms, but I think its easier to work with 
> these simpler automata anyway, and it eliminates
> the confusion of seeing the redundant states without slowing anything down.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3094) optimize lev automata construction

2011-05-13 Thread Robert Muir (JIRA)

optimize lev automata construction
--

 Key: LUCENE-3094
 URL: https://issues.apache.org/jira/browse/LUCENE-3094
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.0


in our lev automata algorithm, we compute an upperbound of the maximum possible 
states (not the true number), and
create some "useless" unconnected states "floating around".

this isn't harmful, in the original impl we did the Automaton is simply a 
pointer to the initial state, and all algorithms
traverse this list, so effectively the useless states were dropped immediately. 
But recently we changed automaton to
cache its numberedStates, and we set them here, so these useless states are 
being kept around.

it has no impact on performance, but can be really confusing if you are 
debugging (e.g. toString). Thanks to Dawid Weiss
for noticing this. 

at the same time, forcing an extra traversal is a bit scary, so i did some 
benchmarking with really long strings and found
that actually its helpful to reduce() the number of transitions (typically cuts 
them in half) for these long strings, as it
speeds up some later algorithms. 

won't see any speedup for short terms, but I think its easier to work with 
these simpler automata anyway, and it eliminates
the confusion of seeing the redundant states without slowing anything down.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8017 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8017/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #119: POMs out of sync

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-Maven-trunk/119/

No tests ran.

Build Log (for compile errors):
[...truncated 40 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

2011-05-13 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033043#comment-13033043
 ] 

Robert Muir commented on LUCENE-3090:
-

bq. net flush pending means? we only differ between flushing ram and active ram 
so flushing ram can easily get above such a limit if IO is slow...

I/O or just "O"? Should we add a ThrottledIndexInput too? :)

> DWFlushControl does not take active DWPT out of the loop on fullFlush
> -
>
> Key: LUCENE-3090
> URL: https://issues.apache.org/jira/browse/LUCENE-3090
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Critical
> Fix For: 4.0
>
> Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by 
> DWFlushControl missing DWPT that are set as flushPending but can't full due 
> to a full flush going on. Yet that means that those DWPT are filling up in 
> the background while they should actually be checked out and blocked until 
> the full flush finishes. Even further we currently stall on the 
> maxNumThreadStates while we should stall on the num of active thread states. 
> I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8013 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8013/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8016 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8016/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8012 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8012/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Commented] (LUCENENET-414) The definition of CharArraySet is dangerously confusing and leads to bugs when used.

2011-05-13 Thread Van Den Berghe, Vincent (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033031#comment-13033031
 ] 

Van Den Berghe, Vincent commented on LUCENENET-414:
---

Hello Digy,

Thanks for your response.
I don't want to sound overly pedantic (but please tell me if I do), but this 
changed implementation solves only part of the problem.
Now, CharArraySet derives from Set, which itself derives from List. Items 
are now stored both in this base class, as in the private HashSet _Set.
However, because List doesn't define its modifiers Add(T), Clear() and 
Remove(T) as virtual, the derived implementation defines them as "new". 
This violates a variant of the Liskov substitution principle: an operation on 
the derived type has not the same effect as the same operation on the base type.
In this case, it means that the following code will cause the items in the 
List base type and in the _Set to be desynchronized:

CharArraySet set=...
List same=set;
same.Add("whatever");
// at this point, same.Contains("whatever")==true but 
set.Contains("whatever")==false even though it's the same instance.

You might rightfully retort that this never happens and I should mind my own 
business, but I know at least one poor soul who did just that: me :-(.

On a completely unrelated matter, the new implementation has 2 methods:

public void Add(System.Collections.Generic.IList 
items)
public void Add(Support.Set items)

.. which can be collapsed into one, since the only thing used in both cases is 
the enumerator:

public void Add(IEnumerable items)

I don't recall the design rule, but it's something like "to increase reuse, 
make your function parameters are general as possible, but their return value 
as specific as possible".
I am unable to get 2.9.4g to investigate further, but if you are moving towards 
the Generic collections in Lucene, the following implementation should be a 
drop-in replacement, without suffering from the aforementioned quirks:

[Serializable]
public class Set : ICollection
{
private readonly System.Collections.Generic.HashSet 
_Set = new System.Collections.Generic.HashSet();
bool _ReadOnly = false;

public Set()
{
}

public Set(bool readOnly)
{
this._ReadOnly = readOnly;
}

public bool ReadOnly
{
set
{
_ReadOnly = value;
}
get
{
return _ReadOnly;
}
}

public virtual void Add(T item)
{
if (_ReadOnly) throw new 
NotSupportedException();
if (_Set.Contains(item)) return;
_Set.Add(item);
}

public void Add(IEnumerable items)
{
if (_ReadOnly) throw new 
NotSupportedException();
foreach (T item in items)
{
if (_Set.Contains(item)) continue;
_Set.Add(item);
}
}


public void Clear()
{
if (_ReadOnly) throw new 
NotSupportedException();
_Set.Clear();
}

public bool Contains(T item)
{
return _Set.Contains(item);
}

public void CopyTo(T[] array, int arrayIndex)
{
_Set.CopyTo(array, arrayIndex);
}

public int Count
{
get { return _Set.Count; }
}

public bool IsReadOnly
{
get { return _ReadOnly; }
}

public bool Remove(T item)
{
if (_ReadOnly) throw new 
NotSupportedException();
return _Set.Remove(item);

[jira] [Updated] (SOLR-2511) Make it easier to override SolrContentHandler newDocument

2011-05-13 Thread Grant Ingersoll (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated SOLR-2511:
--

Attachment: SOLR-2511.patch

Going to commit this

> Make it easier to override SolrContentHandler newDocument
> -
>
> Key: SOLR-2511
> URL: https://issues.apache.org/jira/browse/SOLR-2511
> Project: Solr
>  Issue Type: Improvement
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: SOLR-2511.patch
>
>
> The SolrContentHandler's newDocument method does a variety of things: adds 
> metadata, literals, content and catpured content.  We could split this out 
> into protected methods for each that makes it easier to override.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 8015 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/8015/

No tests ran.

Build Log (for compile errors):
[...truncated 52 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3090) DWFlushControl does not take active DWPT out of the loop on fullFlush

2011-05-13 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033024#comment-13033024
 ] 

Simon Willnauer commented on LUCENE-3090:
-

bq. Could we add an assert that net flushPending + active RAM never exceeds 
some multiplier (2X?) of the configured max RAM?
net flush pending means? we only differ between flushing ram and active ram so 
flushing ram can easily get above such a limit if IO is slow...


> DWFlushControl does not take active DWPT out of the loop on fullFlush
> -
>
> Key: LUCENE-3090
> URL: https://issues.apache.org/jira/browse/LUCENE-3090
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Index
>Affects Versions: 4.0
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Critical
> Fix For: 4.0
>
> Attachments: LUCENE-3090.patch, LUCENE-3090.patch
>
>
> We have seen several OOM on TestNRTThreads and all of them are caused by 
> DWFlushControl missing DWPT that are set as flushPending but can't full due 
> to a full flush going on. Yet that means that those DWPT are filling up in 
> the background while they should actually be checked out and blocked until 
> the full flush finishes. Even further we currently stall on the 
> maxNumThreadStates while we should stall on the num of active thread states. 
> I will attach a patch tomorrow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 8011 - Still Failing

2011-05-13 Thread Apache Jenkins Server

Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/8011/

No tests ran.

Build Log (for compile errors):
[...truncated 45 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 128 matches

Mail list logo