Re: Resolving term vector even when not stored?

2007-03-17 Thread Doron Cohen
Mike Klaas [EMAIL PROTECTED] wrote on 16/03/2007 14:26:46:

 On 3/15/07, karl wettin [EMAIL PROTECTED] wrote:
  I propose a change of the current IndexReader.getTermFreqVector/s-
  code so that it /always/ return the vector space model of a document,
  even when set fields are set as Field.TermVector.NO.
 
  Is that crazy? Could be really slow, but except for that.. And if it
  is cached then that information is known by inspecting the fields.
  People don't go fetching term vectors without knowing what thay are
  doing, are they?

 The highlighting contrib code does this: attempt to retrieve the
 termvector, catch InvalidArgumentException, fall back to re-analysis
 of the data.

This way makes more sense to me.  IndexReader.getTermFreqVector() means its
there, just bring it, while the fall-back is more a
computeTermFreqVector(), which takes much more time.  Users would likely
prefer getting an exception for the get() (oops, term vectors were not
saved..) rather then auto falling back to an expensive computation.

This functionality seems proper as a utility, so it can be reused, I think
perhaps in contrib?


 I'm not sure if that is crazy, but that is what is currently implemented.

 -Mike


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Resolving term vector even when not stored?

2007-03-17 Thread karl wettin


17 mar 2007 kl. 08.15 skrev Doron Cohen:


Mike Klaas [EMAIL PROTECTED] wrote on 16/03/2007 14:26:46:


On 3/15/07, karl wettin [EMAIL PROTECTED] wrote:

I propose a change of the current IndexReader.getTermFreqVector/s-
code so that it /always/ return the vector space model of a  
document,

even when set fields are set as Field.TermVector.NO.

Is that crazy? Could be really slow, but except for that.. And if it
is cached then that information is known by inspecting the fields.
People don't go fetching term vectors without knowing what thay are
doing, are they?


The highlighting contrib code does this: attempt to retrieve the
termvector, catch InvalidArgumentException, fall back to re-analysis
of the data.


This way makes more sense to me.  IndexReader.getTermFreqVector()  
means its

there, just bring it,


They way I look at it the vector space model is there all the time and
Field.TermVector.YES really means Field.TermVector.Level1Cached.

Also, I would not mind a soft referenced map in IndexReader that keeps
track of all resoved term vectors. Perhaps that should be a decoration.


--
karl

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Indexing time taken is too long - Help Appreciated.

2007-03-17 Thread karl wettin


17 mar 2007 kl. 06.01 skrev Lokeya:


Help Appreciated.


There are even more, helpful, people in the java-users. You have a  
greater chance to get a good answer in time there, as this forum  
focus on development of the actual API rather than consumer  
implementations.


--
karl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-834) Payload Queries

2007-03-17 Thread Grant Ingersoll (JIRA)
Payload Queries
---

 Key: LUCENE-834
 URL: https://issues.apache.org/jira/browse/LUCENE-834
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Grant Ingersoll
 Assigned To: Grant Ingersoll
Priority: Minor


Now that payloads have been implemented, it will be good to make them 
searchable via one or more Query mechanisms.  See 
http://wiki.apache.org/lucene-java/Payload_Planning for some background 
information and https://issues.apache.org/jira/browse/LUCENE-755 for the issue 
that started it all.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Storing whole document in the index

2007-03-17 Thread jafarim

Hello
It's a whil that I am using lucene and as most of people seemingly do, I
used to save only some important fields of a docuemnt in the index. But
recently I thought why not store the whole document bytes as an untokenized
field in the index in order to ease the retrieval process? For example
serialize the pdf file into a byte[] and then save the bytes as a field in
the index.(some gzip and base64 encodings may be needed as glue logic). Then
I can delete the original file from the system. Is there any reason against
this idea? Can lucene bear this large volume of input streamed data?


Monster Jobs search

2007-03-17 Thread Eric Cone

Hello Peter,

Now that the monster lucene search is live, is performance pretty good? Are
you still running it on a single 8 core server? Can you give us a rough idea
on the number of queries you can handle/second and the number of docs in the
index? Are you using dotLucene or a java webservice tier?

How did you implement your bounding box for the searching? It sounds like
you do this outside of lucene and return a custom hitcollector. Why not use
a rangequery or functionquery for the basic bounding before sorting?

Thanks,
Eric


Re: Storing whole document in the index

2007-03-17 Thread Grant Ingersoll
Please ask these type of questions on the user mailing list, you will  
get much better responses.  The dev list is for developers of Lucene.


To answer your question, yes you can do this.  You may find the  
FieldSelector API additions and Lazy Field Loading to be helpful  
performance wise, as well.


-Grant

On Mar 17, 2007, at 8:36 AM, jafarim wrote:


Hello
It's a whil that I am using lucene and as most of people seemingly  
do, I
used to save only some important fields of a docuemnt in the index.  
But
recently I thought why not store the whole document bytes as an  
untokenized

field in the index in order to ease the retrieval process? For example
serialize the pdf file into a byte[] and then save the bytes as a  
field in
the index.(some gzip and base64 encodings may be needed as glue  
logic). Then
I can delete the original file from the system. Is there any reason  
against

this idea? Can lucene bear this large volume of input streamed data?


--
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Commit and Review (was Is Lucene Java trunk still stable for production code?)

2007-03-17 Thread Grant Ingersoll

Hoss wrote:
 (or in short: we're moving more towards a *true* commit and review  
model)


I'm curious as to what you think are the practical implications are  
for committers for this model?  Do you imagine a change in the  
workflow whereby we commit and then review or do we stick to the  
patch approach as committers (contributors will always submit  
patches)?  It has always been a gray area, where we all kind of know  
what we can commit w/o creating patches for versus what we should put  
up patches for.   Just curious, I'm working on the payloads stuff and  
I know that as long as it compiles, it isn't going to break anything,  
so in some sense I could commit b/c I know it would make it easier  
for Michael B. and others to update and review w/o going through the  
patch process.  On the other hand, the patch approach makes you take  
one extra good look at what you are doing.


What do others think?

-Grant

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Resolved: (LUCENE-829) StandardBenchmarker#makeDocument does not explicitly close opened files

2007-03-17 Thread karl wettin


16 mar 2007 kl. 02.23 skrev Doron Cohen (JIRA):



 [ https://issues.apache.org/jira/browse/LUCENE-829? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]


Doron Cohen resolved LUCENE-829.


   Resolution: Fixed
Fix Version/s: 2.2
Lucene Fields: [Patch Available]  (was: [New])

Commited the fix for this.
There were actually two more cases like this.


Also found this in ReutersQueryMaker:

private void prepareQueries() throws Exception {
// analyzer (default is standard analyzer)
Analyzer anlzr= (Analyzer) Class.forName(config.get(analyzer,
org.apache.lucene.analysis.StandardAnalyzer)).newInstance();


It should be

 
org.apache.lucene.analysis.standard.StandardAnalyzer)).newInstance();



--
karl

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-03-17 Thread Karl Wettin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wettin updated LUCENE-550:
---

Attachment: HitCollectionBench.jpg

A graph showing performance of hit collection using InstantiatedIndex, 
RAMDirectory and FSDirectory.

In essence, there is no great win in pure search time when there are more than 
7000 documents. However, retreiving documents is still not associate with any 
cost what so ever, so in a 25 sized index that use Lucene for persistency 
of fields, I still see a boost with 6-10x or so compared to RAMDirectory. 

documents in corpus \t  queries per second 

[EMAIL PROTECTED]
250 37530,00
500 29610,00
750 22612,50
100019267,50
125016027,50
150014737,50
175013230,00
200012322,50
225011482,50
250010125,00
27509802,50
30008508,25
32508469,80
35007788,61
37505207,29
40005484,52
42504912,50
45004420,58
47504006,49
50004357,50
52503886,67
55003573,93
57503236,76
60003602,10
62503420,00
65003075,00
67502805,00
70002680,98
72502908,55
75002769,46
77502644,86
80002496,25
82502377,50
85002578,71
87502390,11
90002160,00
92502037,96
95001872,19
97502041,38
1   1959,12
Created 1 documents

[EMAIL PROTECTED]
250 4845,00
500 3986,01
750 4330,67
10004682,82
12504148,78
15004847,65
17504535,23
20004192,50
22504203,30
25003695,65
27503742,50
30003485,76
32503470,76
35003525,00
37502877,61
40003221,78
42502983,51
45002982,02
47502724,55
50003092,86
52502646,18
55002940,00
57502709,58
60002423,30
62502602,50
65002305,39
67502462,57
70001815,00
72502431,42
75002171,74
77502297,90
80002134,30
82502308,85
85002038,98
87502231,65
90002097,90
92502041,38
95001819,77
97502102,24
1   1876,87
Created 1 documents


[EMAIL PROTECTED]
250 3448,28
500 2422,50
750 2677,50
10002607,39
12502241,92
15002486,27
17502472,53
20001733,52
22502325,00
25002194,21
27501969,55
30002125,75
32502009,00
35001473,08
37501858,14
40001925,57
42501671,66
45001786,25
47501694,15
50001217,63
52501595,11
55001745,75
57501526,18
60001431,78
62501524,66
65001648,35
67501544,23
70001428,22
72501487,29
75001494,02
77501106,13
80001455,00
82501284,86
85001182,63
87501292,33
90001399,70
92501000,00
95001291,04
97501359,56
1   1194,62
Created 1 documents

 InstantiatedIndex - faster but memory consuming index
 -

 Key: LUCENE-550
 URL: https://issues.apache.org/jira/browse/LUCENE-550
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Affects Versions: 2.0.0
Reporter: Karl Wettin
 Assigned To: Karl Wettin
 Attachments: HitCollectionBench.jpg, lucene-550.jpg, 
 test-reports.zip, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2


 An non file centrinc all in memory index. Consumes some 2x the memory of a 
 RAMDirectory (in a term satured index) but is between 3x-60x faster depending 
 on application and how one counts. Average query is about 8x faster. 
 IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and 
 InterfaceIndexModifier. 
 InstantiatedIndex is wrapped in a new top layer index facade (class Index) 
 that comes with factory methods for writers, readers and searchers for unison 
 index handeling. There are decorators with notification handling that can be 
 used for automatically syncronizing 

[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-03-17 Thread Karl Wettin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wettin updated LUCENE-550:
---

Attachment: HitCollectionBench.jpg

made graph more readable

 InstantiatedIndex - faster but memory consuming index
 -

 Key: LUCENE-550
 URL: https://issues.apache.org/jira/browse/LUCENE-550
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Affects Versions: 2.0.0
Reporter: Karl Wettin
 Assigned To: Karl Wettin
 Attachments: HitCollectionBench.jpg, lucene-550.jpg, 
 test-reports.zip, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2


 An non file centrinc all in memory index. Consumes some 2x the memory of a 
 RAMDirectory (in a term satured index) but is between 3x-60x faster depending 
 on application and how one counts. Average query is about 8x faster. 
 IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and 
 InterfaceIndexModifier. 
 InstantiatedIndex is wrapped in a new top layer index facade (class Index) 
 that comes with factory methods for writers, readers and searchers for unison 
 index handeling. There are decorators with notification handling that can be 
 used for automatically syncronizing searchers on updates, et.c. 
 Index also comes with FS/RAMDirectory implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-03-17 Thread Karl Wettin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wettin updated LUCENE-550:
---

Attachment: (was: HitCollectionBench.jpg)

 InstantiatedIndex - faster but memory consuming index
 -

 Key: LUCENE-550
 URL: https://issues.apache.org/jira/browse/LUCENE-550
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Affects Versions: 2.0.0
Reporter: Karl Wettin
 Assigned To: Karl Wettin
 Attachments: HitCollectionBench.jpg, lucene-550.jpg, 
 test-reports.zip, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2


 An non file centrinc all in memory index. Consumes some 2x the memory of a 
 RAMDirectory (in a term satured index) but is between 3x-60x faster depending 
 on application and how one counts. Average query is about 8x faster. 
 IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and 
 InterfaceIndexModifier. 
 InstantiatedIndex is wrapped in a new top layer index facade (class Index) 
 that comes with factory methods for writers, readers and searchers for unison 
 index handeling. There are decorators with notification handling that can be 
 used for automatically syncronizing searchers on updates, et.c. 
 Index also comes with FS/RAMDirectory implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Commit and Review (was Is Lucene Java trunk still stable for production code?)

2007-03-17 Thread Grant Ingersoll
And by break, I mean all tests pass with the possible exception of  
those related to the new functionality.


Also, the example I gave about payloads is hypothetical.  I'm still  
going to submit a patch.


On Mar 17, 2007, at 12:02 PM, Grant Ingersoll wrote:


Hoss wrote:
 (or in short: we're moving more towards a *true* commit and  
review model)


I'm curious as to what you think are the practical implications are  
for committers for this model?  Do you imagine a change in the  
workflow whereby we commit and then review or do we stick to the  
patch approach as committers (contributors will always submit  
patches)?  It has always been a gray area, where we all kind of  
know what we can commit w/o creating patches for versus what we  
should put up patches for.   Just curious, I'm working on the  
payloads stuff and I know that as long as it compiles, it isn't  
going to break anything, so in some sense I could commit b/c I know  
it would make it easier for Michael B. and others to update and  
review w/o going through the patch process.  On the other hand, the  
patch approach makes you take one extra good look at what you are  
doing.


What do others think?

-Grant

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



--
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-834) Payload Queries

2007-03-17 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-834:
---

Attachment: boosting.term.query.patch

First draft at a BoostingTermQuery, which is based on the SpanTermQuery and can 
be used for boosting the score of a term based on what is in the payload (for 
things like weighting terms higher according to their font size or part of 
speech).  

A couple of classes that were previously package level are now public and have 
been marked as Public and for derivational purposes only.


See the CHANGES.xml for some more details.

I believe all tests still pass.

 Payload Queries
 ---

 Key: LUCENE-834
 URL: https://issues.apache.org/jira/browse/LUCENE-834
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Reporter: Grant Ingersoll
 Assigned To: Grant Ingersoll
Priority: Minor
 Attachments: boosting.term.query.patch


 Now that payloads have been implemented, it will be good to make them 
 searchable via one or more Query mechanisms.  See 
 http://wiki.apache.org/lucene-java/Payload_Planning for some background 
 information and https://issues.apache.org/jira/browse/LUCENE-755 for the 
 issue that started it all.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Commit and Review (was Is Lucene Java trunk still stable for production code?)

2007-03-17 Thread Chris Hostetter

Personally, i really like having a Jira issue number assocaited with every
commit ... and if there is a Jira issue open, attaching a patch is trivial
-- and having patches in Jira makes it a little easier for people who
really, Really, REALLY can't upgrade their version of lucene for some
strange reason still have an easy refrence point for when a feature was
added (and they can try to backport it).

I think the Commit and Review mentality is much more about how timid a
committer needs to be about big changes that *might* have consequences or
change APIs ... there are a lot of patches that don't break any existing
unit tests, but add some new public methods whose existence may be
questionable;  there are patches that change code and fix existing unit
test so that they still pass -- and these fixes might make the tests
logical, but raise the possibility that people were depending on the old
behavior; then there are patches thta change the way something works
internally, whose behavior was previously undefined; ... all of these
cases are things that a committer who feels they understnad the changes
should be able to go ahead and apply under the Commit and Review model,
becuase if there are any consequences, they can allways be rolled back
before the next release -- this is a luxary we haven't had in the past,
because so many people expected the trunk to be stable, and that they
could always roll forward using new methods and depending on new
behavior with risk that they would go away in a future release.

Ultimately big changes should always be discussed before they are commited
-- but where know we tend to have issues open for a really long time, with
lots of iterations of code patches before anything is ever commited, i
suspect we may eventually reach the point where issues are opened to talk about
*concepts* and hypotheitcal patches are attached showing off ideas, and as
people say X makes sense, Y i'm not so sure about X gets commited and
discussion continues about Y ... but if the game plan changes and X
becomes silly, we yank it back out.

we won't get from here to there overnight ... it's a deleicate dance
between the frequency of major releases, the mindsets of committers, and
the comfort level with doing patch releases to get bug fixes out because
you know the trunk has a lot of X stypes things in it that aren't relaly
stable.


: Hoss wrote:
:   (or in short: we're moving more towards a *true* commit and review
: model)
:
: I'm curious as to what you think are the practical implications are
: for committers for this model?  Do you imagine a change in the
: workflow whereby we commit and then review or do we stick to the
: patch approach as committers (contributors will always submit
: patches)?  It has always been a gray area, where we all kind of know
: what we can commit w/o creating patches for versus what we should put
: up patches for.   Just curious, I'm working on the payloads stuff and
: I know that as long as it compiles, it isn't going to break anything,
: so in some sense I could commit b/c I know it would make it easier
: for Michael B. and others to update and review w/o going through the
: patch process.  On the other hand, the patch approach makes you take
: one extra good look at what you are doing.
:
: What do others think?
:
: -Grant
:
: -
: To unsubscribe, e-mail: [EMAIL PROTECTED]
: For additional commands, e-mail: [EMAIL PROTECTED]
:



-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-834) Payload Queries

2007-03-17 Thread Andrzej Bialecki

Grant Ingersoll (JIRA) wrote:

 [ 
https://issues.apache.org/jira/browse/LUCENE-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-834:
---

Attachment: boosting.term.query.patch

First draft at a BoostingTermQuery, which is based on the SpanTermQuery and can be used for boosting the score of a term based on what is in the payload (for things like weighting terms higher according to their font size or part of speech).  


A couple of classes that were previously package level are now public and have 
been marked as Public and for derivational purposes only.


See the CHANGES.xml for some more details.

I believe all tests still pass.

  


Grant,

This is great stuff! I know quite a few projects that will love this - 
specifically to boost terms differently based on a POS tag.


We had a discussion recently in Nutch about changing the way typical 
Nutch queries are translated into Lucene queries, and performance 
implications there. If you're looking for a challenge ;) could you 
perhaps take a look at this discussion and see if you could contribute 
something? ;)


http://www.nabble.com/Performance-optimization-for-Nutch-index---query-tf3276316.html

Thanks in advance!

--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Updated: (LUCENE-834) Payload Queries

2007-03-17 Thread Grant Ingersoll


On Mar 17, 2007, at 6:02 PM, Andrzej Bialecki wrote:


Grant Ingersoll (JIRA) wrote:

Grant,

This is great stuff! I know quite a few projects that will love  
this - specifically to boost terms differently based on a POS tag.




Michael B. did a great job on implementing the underlying storage  
mechanisms, so most kudos should go to him.


I/we hope to add several other types of Queries (see http:// 
wiki.apache.org/lucene-java/Payload_Planning and add your own thoughts)


POS, font weights, information from NLP applications, XPath, cross- 
references.  It's all good!


I am planning to have a few slides in my ApacheCon talk come May on  
the subject.


We had a discussion recently in Nutch about changing the way  
typical Nutch queries are translated into Lucene queries, and  
performance implications there. If you're looking for a  
challenge ;) could you perhaps take a look at this discussion and  
see if you could contribute something? ;)


You know, the Nutch Dev mailing list was my last holdout for  
subscriptions to the Lucene mailing lists!  :-)  I barely can keep up  
with Lucene Java!


I will try to have a read soon, but can't promise I can add anything  
meaningful.


Cheers,
Grant



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Resolved: (LUCENE-829) StandardBenchmarker#makeDocument does not explicitly close opened files

2007-03-17 Thread Doron Cohen
karl wettin [EMAIL PROTECTED] wrote on 17/03/2007 09:43:45:

 Also found this in ReutersQueryMaker:

 private void prepareQueries() throws Exception {
  // analyzer (default is standard analyzer)
  Analyzer anlzr= (Analyzer) Class.forName(config.get(analyzer,
  org.apache.lucene.analysis.StandardAnalyzer)).newInstance();


 It should be


 org.apache.lucene.analysis.standard.StandardAnalyzer)).newInstance();


Nice catch, I fixed that (and the 3 more like this).
Thanks Karl!

- Doron


 --
 karl


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]