[jira] Updated: (LUCENE-662) Extendable writer and reader of field data

2006-09-24 Thread JIRA
 [ http://issues.apache.org/jira/browse/LUCENE-662?page=all ]

Nicolas Lalevée updated LUCENE-662:
---

Attachment: generic-fieldIO-2.patch

I think I got it. What was disturbing on the last patch was the notion of 
FieldData I added. So I removed it. So let's summerize the diff between the 
trunk and my patch :

* The concepts :
** an IndexFormat defines which FieldsWriter and FieldsReader to use
** an IndexFormat defines the used extensions, so the user can add it's own 
files
** the format of an index is attached to the Directory
** the whole index format isn't customizable, just a part of them. So some 
functions are private or default, so the Lucene user won't have acess to them 
: it's Lucene internal stuff. Some others are public or protected : they can be 
redefined.
** Lucene now provide an API to add some files which are tables of data, as the 
FieldInfos is
** it is to the FieldsWriter implementation to check if the field to write is 
of the same format (basically checking by a instanceof).
** the user can add some information at the document level, and provide it's 
own implementation of Document
** the user can define how data for a field is stored and retreived, and 
provide it's own implementation of Fieldable
** the reading of field data is done in the Fieldable
** the writting of the field is done in the FieldsWriter

* API change :
** There are new constructors of the directory : contructors with specified 
IndexFormat
** new Entry and EntryTable : generic API for managing a table of data in a file
** FieldInfos extends now EntryTable

* Code changes :
** AbstractField become Fieldable (Fieldable is no more an interface).
** the FieldsWriter have been separated in the abstract class FieldsWriter and 
its default implementation DefaultFieldsWriter. Idem for FieldsReader and 
DefaultFieldsReader.
** the lazy loading have been moved from FieldsReader to Fieldable
** IndexOuput can now write directly from an input stream
** If a field was loaded lazily, the DefaultFieldsWriter directly copy the 
source input stream to the output stream
** the IndexFileNameFilter take now it's list of known file extensions from the 
index format
** each time a temporary RAM directory is created, the index format have to be 
passed : see diff for CompoundFileReader or IndexWriter
** Some private and/or final have been moved to public

* Last worries :
** quite a big one in fact, but I don't know how to handle it : every RMI tests 
fails because of :
{noformat}
error unmarshalling return; nested exception is:
[junit] java.io.InvalidClassException: 
org.apache.lucene.document.Field; no valid constructor
[junit] java.rmi.UnmarshalException: error unmarshalling return; nested 
exception is:
[junit] java.io.InvalidClassException: 
org.apache.lucene.document.Field; no valid constructor
[junit] at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:157)
{noformat}
** a function is public and it shouldn't : see Fieldable.setLazyData()

I have added an exemple of implementation in the patch that use this future : 
look at org.apache.lucene.index.rdf

I know this is a big patch but I think the API has not been broken, and I would 
appreciate comments on this.

 Extendable writer and reader of field data
 --

 Key: LUCENE-662
 URL: http://issues.apache.org/jira/browse/LUCENE-662
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Nicolas Lalevée
Priority: Minor
 Attachments: generic-fieldIO-2.patch, generic-fieldIO.patch


 As discussed on the dev mailing list, I have modified Lucene to allow to 
 define how the data of a field is writen and read in the index.
 Basically, I have introduced the notion of IndexFormat. It is in fact a 
 factory of FieldsWriter and FieldsReader. So the IndexReader, the indexWriter 
 and the SegmentMerger are using this factory and not doing a new 
 FieldsReader/Writer().
 I have also introduced the notion of FieldData. It handles every data of a 
 field, and also the writing and the reading in a stream. I have done this way 
 because in the current design of Lucene, Fiedable is an interface, so methods 
 with a protected or package visibility cannot be defined.
 A FieldsWriter just writes data into a stream via the FieldData of the field.
 A FieldsReader instanciates a FieldData depending on the field name. Then it 
 use the field data to read the stream. And finnaly it instanciates a Field 
 with the field data.
 About compatibility, I think it is kept, as I have writen a 
 DefaultIndexFormat that provides some DefaultFieldsWriter and 
 DefaultFieldsReader. These implementations do the exact job that is done 
 today.
 To acheive this modification, some classes and methods had to be moved from 
 private and/or final to public or protected.
 About the lazy fields, I have 

[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2006-09-24 Thread Paul Elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-584?page=comments#action_12437242 ] 

Paul Elschot commented on LUCENE-584:
-

I wrote:

 One could add an abstract Scorer.explain() to catch these, or
 provide a default implementation for Scorer.explain().

by mistake. The good news is that the patch leaves the 
the existing abstract Scorer.explain() method unaffected.


 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: http://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.0.1
Reporter: Peter Schäfer
Priority: Minor
 Attachments: BitsMatcher.java, Filter-20060628.patch, 
 HitCollector-20060628.patch, IndexSearcher-20060628.patch, 
 MatchCollector.java, Matcher.java, Matcher20060830b.patch, 
 Scorer-20060628.patch, Searchable-20060628.patch, Searcher-20060628.patch, 
 Some Matchers.zip, SortedVIntList.java, TestSortedVIntList.java


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The default implementation 
 could still delegate to =java.util.BitSet=.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



undefined primitive types

2006-09-24 Thread Greg Colvin

I'm trying to write C++ code following the Lucene File Formats
document, and find that the terms Int, Long, and VLong are left
undefined.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: undefined primitive types

2006-09-24 Thread Otis Gospodnetic
Hi Greg,

Are you aware of CLucene?

Otis

- Original Message 
From: Greg Colvin [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Sunday, September 24, 2006 9:25:35 PM
Subject: undefined primitive types

I'm trying to write C++ code following the Lucene File Formats
document, and find that the terms Int, Long, and VLong are left
undefined.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: undefined primitive types

2006-09-24 Thread Greg Colvin

Just.  I'll look there, thanks.

On Sep 24, 2006, at 10:48 PM, Otis Gospodnetic wrote:

Hi Greg,

Are you aware of CLucene?

Otis

- Original Message 
From: Greg Colvin [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Sunday, September 24, 2006 9:25:35 PM
Subject: undefined primitive types

I'm trying to write C++ code following the Lucene File Formats
document, and find that the terms Int, Long, and VLong are left
undefined.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: undefined primitive types

2006-09-24 Thread David Balmain

Hi Greg,

I don't know which documentation of the Lucene FileFormat you are
looking at but you can see UInt32 (Int) UInt64 (Long) and VInt defined
here:

   http://lucene.apache.org/java/docs/fileformats.html

Are you at liberty to tell us what you are working on? You may also
like to take a look at Ferret:

   http://ferret.davebalmain.com/trac

Up to version 0.9.6 it follows the Lucene file format quite closely
apart from the fact that Ferret can't handly modified UTF-8. Also,
it's in C, not C++.

Cheers,
Dave

On 9/25/06, Greg Colvin [EMAIL PROTECTED] wrote:

Just.  I'll look there, thanks.

On Sep 24, 2006, at 10:48 PM, Otis Gospodnetic wrote:
 Hi Greg,

 Are you aware of CLucene?

 Otis

 - Original Message 
 From: Greg Colvin [EMAIL PROTECTED]
 To: java-dev@lucene.apache.org
 Sent: Sunday, September 24, 2006 9:25:35 PM
 Subject: undefined primitive types

 I'm trying to write C++ code following the Lucene File Formats
 document, and find that the terms Int, Long, and VLong are left
 undefined.


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]