Re: is the classes ended with PerThread(*PerThread) multithread

2010-12-28 Thread Simon Willnauer
Hey there,

so what you are looking at are classes that are created per Thread
rather than shared with other threads. Lucene internally rarely
creates threads or subclasses Thread, Runnable or Callable
(ParallelMultiSearcher is an exception or some of the merging code).
Yet, inside the indexer when you add (update) a document Lucene
utilizes the callers thread rather than spanning a new one. When you
look at DocumentsWriter.java there should be a method callled
getThreadState. Each indexing thread, lets say in updateDocument, gets
its Thread-Private DocumentsWriterThreadState. This thread state holds
a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer
(see the indexing chain). DocConsumerPerThread in that case is some
kind of decorator that hold other DocConsumerPerThread instances like
TermsHashPerThread etc.

The general pattern is for each DocConsumer you can get a
DocConsumerPerThread for your indexing thread which then consumes the
document you are processing right now.

I hope that helps

simon


On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote:
 hi all:
 I'm new to dev
 these days I'm reading the source code in the index package
 and I was confused.
 there are classes with suffix PerThread such as DocFieldProcessorPerThread,
 DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread.
 in this mailing-list, I was told that they are multithreaded.
 however, there are some difficulties for me to understand!
 I see no sign that they inherited from the Thread , or implement the
 Runnable, or something else??
 how do they map to the OS thread??
 thanks ^_^

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: is the classes ended with PerThread(*PerThread) multithread

2010-12-28 Thread xu cheng
hi simon

thanks for replying very much.

after reading the source code with your suggestion, here's my understanding,
and I don't know whether it's right:

the DocumentsWriter actually don't create threads, but the codes that
useDocumentsWriter can do the
multithreading(say, several threads call updateDocument). and each thread
has its DocumentsWriterThreadState, in the mean while,
each DocumentsWriterThreadState has its own objects(the *PerThread such as
DocFieldProcessorPerThread, DocInverterPerThread and so on )

as the methods of DocumentsWriter are called by multiple threads, for
example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4
index chains, ( each index chain has it's own *PerThread objects ,  to
process the document).

am I right??

thanks for replying again!



2010/12/28 Simon Willnauer simon.willna...@googlemail.com

 Hey there,

 so what you are looking at are classes that are created per Thread
 rather than shared with other threads. Lucene internally rarely
 creates threads or subclasses Thread, Runnable or Callable
 (ParallelMultiSearcher is an exception or some of the merging code).
 Yet, inside the indexer when you add (update) a document Lucene
 utilizes the callers thread rather than spanning a new one. When you
 look at DocumentsWriter.java there should be a method callled
 getThreadState. Each indexing thread, lets say in updateDocument, gets
 its Thread-Private DocumentsWriterThreadState. This thread state holds
 a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer
 (see the indexing chain). DocConsumerPerThread in that case is some
 kind of decorator that hold other DocConsumerPerThread instances like
 TermsHashPerThread etc.

 The general pattern is for each DocConsumer you can get a
 DocConsumerPerThread for your indexing thread which then consumes the
 document you are processing right now.

 I hope that helps

 simon


 On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote:
  hi all:
  I'm new to dev
  these days I'm reading the source code in the index package
  and I was confused.
  there are classes with suffix PerThread such as
 DocFieldProcessorPerThread,
  DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread.
  in this mailing-list, I was told that they are multithreaded.
  however, there are some difficulties for me to understand!
  I see no sign that they inherited from the Thread , or implement the
  Runnable, or something else??
  how do they map to the OS thread??
  thanks ^_^

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: is the classes ended with PerThread(*PerThread) multithread

2010-12-28 Thread Simon Willnauer
On Tue, Dec 28, 2010 at 10:57 AM, xu cheng xcheng@gmail.com wrote:
 hi simon
 thanks for replying very much.
 after reading the source code with your suggestion, here's my understanding,
 and I don't know whether it's right:
 the DocumentsWriter actually don't create threads, but the codes that use
 DocumentsWriter can do the multithreading(say, several threads call
 updateDocument). and each thread has its DocumentsWriterThreadState, in the
 mean while, each DocumentsWriterThreadState has its own objects(the
 *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so
 on )
 as the methods of DocumentsWriter are called by multiple threads, for
 example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4
 index chains, ( each index chain has it's own *PerThread objects ,  to
 process the document).
 am I right??

that sounds about right

simon
 thanks for replying again!


 2010/12/28 Simon Willnauer simon.willna...@googlemail.com

 Hey there,

 so what you are looking at are classes that are created per Thread
 rather than shared with other threads. Lucene internally rarely
 creates threads or subclasses Thread, Runnable or Callable
 (ParallelMultiSearcher is an exception or some of the merging code).
 Yet, inside the indexer when you add (update) a document Lucene
 utilizes the callers thread rather than spanning a new one. When you
 look at DocumentsWriter.java there should be a method callled
 getThreadState. Each indexing thread, lets say in updateDocument, gets
 its Thread-Private DocumentsWriterThreadState. This thread state holds
 a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer
 (see the indexing chain). DocConsumerPerThread in that case is some
 kind of decorator that hold other DocConsumerPerThread instances like
 TermsHashPerThread etc.

 The general pattern is for each DocConsumer you can get a
 DocConsumerPerThread for your indexing thread which then consumes the
 document you are processing right now.

 I hope that helps

 simon


 On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote:
  hi all:
  I'm new to dev
  these days I'm reading the source code in the index package
  and I was confused.
  there are classes with suffix PerThread such as
  DocFieldProcessorPerThread,
  DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread.
  in this mailing-list, I was told that they are multithreaded.
  however, there are some difficulties for me to understand!
  I see no sign that they inherited from the Thread , or implement the
  Runnable, or something else??
  how do they map to the OS thread??
  thanks ^_^

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



AW: Improving String Distance calculation performance

2010-12-28 Thread Biedermann,S.,Fa. Post Direkt
Hi Robert,

Thanks for your hint about LevensteinAutomata. Are AutomatonQueries planned for 
an upcoming release?

At the moment, we build the reference to boost documents those at query time 
which contain fuzzily seldom used tokens within a queried region, in a manner 
of speaking a fuzzied localised idf() .The boosts are injected via payloads. 
Since levenstein must be calculated within a (fuzzied) region only, O(mn) 
applies only to each region. On the outside, we have O(#region).

The problem could be equivalently solved query time. But this would mean to 
count the matched documents of each fuzzy query within a more complex queries.
In Release 3.0.2. it looks quite complicated to me to incorporate a different 
scoring model that first count matches of each fuzzy sub-query and then apply 
the boosts to the matched tokens. I haven't seen a Scorer doing this so far. 
Furthermore we are sensible about query time. 

Do you have any ideas?



-Ursprüngliche Nachricht-
Von: Robert Muir [mailto:rcm...@gmail.com] 
Gesendet: Montag, 27. Dezember 2010 17:11
An: dev@lucene.apache.org
Betreff: Re: Improving String Distance calculation performance

On Mon, Dec 27, 2010 at 10:31 AM, Biedermann,S.,Fa. Post Direkt 
s.biederm...@postdirekt.de wrote:

 As for our problem: we are trying to build reference data against which 
 requests shall be matched. In this case we need quite a huge amount of string 
 distance measurements for preparing this reference.


If this is your problem, i wouldn't recommend using the StringDistance 
directly. As i mentioned, its not designed for your use case because the way 
its used by spellchecker, it only needs something like 20-50 comparisons...

If you try to use it the way you describe, it will be very slow, it must do 
O(k) comparisons, where k is the number of strings, and each comparison is 
O(mn), where m and n are the lengths of the input string and string being 
compared, respectively.

Easier would be to index your terms and simply do FuzzyQuery (with trunk), 
specifying the exact max edit distance you want. Or if you care about getting 
all exact results within Levenshtein distance of some degree N, use 
AutomatonQuery built from LevenshteinAutomata.

This will give you a sublinear number of comparisons, something complicated but 
more like O(sqrt(k)) where k is the number of strings, and each comparison is 
O(n), where n is the length of the target string.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional 
commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2836) FieldCache rewrite method for MultiTermQueries

2010-12-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975477#action_12975477
 ] 

Uwe Schindler commented on LUCENE-2836:
---

Hah, cool!

The question is, does it really works correct with multivalued fields? I have 
to recapitulate the TermsIndex, but the method fcsi.getOrd(doc) returns only 
the term ord of the first term found in index for that document? For numeric 
queries with single-value fields thats fine, but for wildcards on analyzed 
fields? Maybe I miss something, but I am not sure if it works correct...

Robert: Help me please :-) *g*

 FieldCache rewrite method for MultiTermQueries
 --

 Key: LUCENE-2836
 URL: https://issues.apache.org/jira/browse/LUCENE-2836
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2836.patch


 For some MultiTermQueries, like RangeQuery we have a FieldCacheRangeFilter 
 etc (in this case its particularly optimized).
 But in the general case, since LUCENE-2784 we can now have a rewrite method 
 to rewrite any MultiTermQuery 
 using the FieldCache, because MultiTermQuery's getEnum no longer takes 
 IndexReader but Terms, and all the 
 FilteredTermsEnums are now just real TermsEnum decorators.
 In cases like low frequency queries this is actually slower (I think this has 
 been shown for numeric ranges before too),
 but for the really high-frequency cases like especially ugly wildcards, 
 regexes, fuzzies, etc, this can be several times faster 
 using the FieldCache instead, since all the terms are in RAM and automaton 
 can blast through them quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Improving String Distance calculation performance

2010-12-28 Thread Robert Muir
On Tue, Dec 28, 2010 at 5:26 AM, Biedermann,S.,Fa. Post Direkt
s.biederm...@postdirekt.de wrote:
 Hi Robert,

 Thanks for your hint about LevensteinAutomata. Are AutomatonQueries planned 
 for an upcoming release?

yes, but its in trunk, so you can use it now...


 At the moment, we build the reference to boost documents those at query time 
 which contain fuzzily seldom used tokens within a queried region, in a manner 
 of speaking a fuzzied localised idf() .The boosts are injected via payloads. 
 Since levenstein must be calculated within a (fuzzied) region only, O(mn) 
 applies only to each region. On the outside, we have O(#region).

 The problem could be equivalently solved query time. But this would mean to 
 count the matched documents of each fuzzy query within a more complex queries.
 In Release 3.0.2. it looks quite complicated to me to incorporate a different 
 scoring model that first count matches of each fuzzy sub-query and then apply 
 the boosts to the matched tokens. I haven't seen a Scorer doing this so far. 
 Furthermore we are sensible about query time.

 Do you have any ideas?

Not sure I fully understand what your app needs to do, but you can
take a look at using a different rewrite method.

for example, it seems like rewriting to span queries (see
SpanMultiTermQueryWrapper) might be close to what you want, except it
suffers from the problem that boosting is completely broken in
Lucene's span queries (since they don't combine with real Scorers but
instead Spans)...

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2836) FieldCache rewrite method for MultiTermQueries

2010-12-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975496#action_12975496
 ] 

Robert Muir commented on LUCENE-2836:
-

The question is, does it really works correct with multivalued fields?

of course not, its no different than any of the other fieldcache*filter stuff 
we have now.
except that stuff is an aweful lot more code... do we really need all those 
specializations in fieldcacherangefilter?


 FieldCache rewrite method for MultiTermQueries
 --

 Key: LUCENE-2836
 URL: https://issues.apache.org/jira/browse/LUCENE-2836
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-2836.patch


 For some MultiTermQueries, like RangeQuery we have a FieldCacheRangeFilter 
 etc (in this case its particularly optimized).
 But in the general case, since LUCENE-2784 we can now have a rewrite method 
 to rewrite any MultiTermQuery 
 using the FieldCache, because MultiTermQuery's getEnum no longer takes 
 IndexReader but Terms, and all the 
 FilteredTermsEnums are now just real TermsEnum decorators.
 In cases like low frequency queries this is actually slower (I think this has 
 been shown for numeric ranges before too),
 but for the really high-frequency cases like especially ugly wildcards, 
 regexes, fuzzies, etc, this can be several times faster 
 using the FieldCache instead, since all the terms are in RAM and automaton 
 can blast through them quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2276) Support for cologne phonetic

2010-12-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975510#action_12975510
 ] 

Robert Muir commented on SOLR-2276:
---

bq. Seems ColognePhonetic will be supported in Apache Commons Codec 1.4.1.

Thanks for your patch Marc. Has there been any discussion on a tentative 
release date for 1.4.1?
When this happens I'll be happy to add it.

{quote}
Besides, do you think it is a good idea to allow a fully qualified class name 
as encoder in PhoneticFilterFactory? Extending solr by a custom phonetic 
filter could be much easier for developers.
{quote}

I think the reason its not done with reflection might be historical, before all 
tokenstreams were reused?
But I still think its a good idea to avoid reflection when possible, so I think 
we should keep the statically built map.

However, if you supply a string thats not in this map, I don't think it would 
hurt to try to reflect the name
before throwing an exception, as in this case you would only get an exception 
anyway.


 Support for cologne phonetic
 

 Key: SOLR-2276
 URL: https://issues.apache.org/jira/browse/SOLR-2276
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 1.4.1
 Environment: Apache Commons Codec 1.5
Reporter: Marc Pompl
 Fix For: 4.0

 Attachments: ColognePhonetic-patch.txt

   Original Estimate: 2h
  Remaining Estimate: 2h

 As soon as Apache Commons Codec 1.5 is released, support new encoder 
 ColognePhonetic please.
 See JIRA for CODEC-106.
 It is fundamental for phonetic searches if you are indexing german names. 
 Other indexers are optimizied for english (words).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Geospatial search in Lucene/Solr

2010-12-28 Thread Smiley, David W.
(I was emailing Grant RE geospatial search and it occurred to me I should be 
doing this on the Lucene dev list so I've taken it there)

On Dec 28, 2010, at 8:31 AM, Grant Ingersoll wrote:
On Dec 27, 2010, at 11:52 PM, Smiley, David W. wrote:

Hi Grant.

I saw your latest blog post at Lucid Imagination in which you mentioned you 
were going to work on adding polygon search.  FWIW, I finished polygon search 
last week to my code base based 
onhttps://issues.apache.org/jira/browse/SOLR-2155 (geohash prefix 
implementation).

I've scanned this, but haven't had time to look in depth.

I used the JTS library to do the heavy lifting.  I’d be happy to release the 
code.  I’ve iterated more on SOLR-2155 on my local project code and plan to 
re-integrate it with the patch at some point.  There was almost 2 months of 
time in-between me releasing SOLR-2155 and me having other priorities but I’m 
back at it.

Cool.  The problem w/ JTS is it is LGPL.

At least it's not GPL.  Can you simply use JTS and make it an optional library, 
like Solr does for some other libs?  There's a lot of expertise in that library 
that's been refined over the last 10 years.  Even if you refuse to touch 
anything non-Apache licensed, I highly recommend you look through its code to 
see how geospatial point-in-polygon is efficiently done.  It has a concept of a 
prepared geometry object that is optimized to make large numbers of 
point-in-polygon queries more efficient.  It is implemented by putting the line 
segments of the polygon in an in-memory R-tree index.  If you'd like me to 
point you at specific classes, I'd be happy to.  Better yet, I could release an 
update to SOLR-2155 and you could debug step through.  FWIW  I used a separate 
class for the polygon search that implements my GeoShape interface.  If a user 
doesn't need to do a polygon search (which is not a common requirement of 
geospatial search), then the JTS library need not ship with Lucene/Solr.

Presently, I’m working on Lucene’s benchmark contrib module to evaluate the 
performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon 
range queries), and then I’ll work on a more efficient probably non-geohash 
implementation but based on the same underlying concept of a hierarchical grid. 
 I’m using the geonames.orghttp://geonames.org/ data set.  Unfortunately, the 
benchmark code seems very oriented to a generic title-body document whereas I’m 
looking to create lat-lon pairs… and furthermore to create documents containing 
multiple lat-lon pairs, and even furthermore a query generator that generates 
random box queries centered on a random location from the data set.  I seem to 
be stretching the benchmark framework beyond the use-case it was designed for 
and so perhaps it won’t be committable but at least I’ll have a patch for other 
geospatial birds-of-a-feather like you to use.

Stretch away.  The Title/Body orientation is just a relic of what we have done 
in the past, it doesn't have to stay that way.

I am interested in your thoughts on evaluating the performance of geospatial 
queries.  Since reading LIA2, the Lucene's benchmark contrib module seems like 
the ideal way to test Lucene. I thought about programmatically generating 
points but I've warmed to the idea of using geonames.orghttp://geonames.org 
data as the space of possible points.  Geonames has 7.5M unique lat-lon 
pairs[1], including population.  In SOLR-2155, there are multiple points per 
document as a key feature.  For testing, I want to be able to configure 1 point 
per document for comparison to algorithms that only support that but I must 
support a random variable number of them too.  Consequently, each place does 
NOT correspond 1-1 to a document.  The number of documents indexed should be 
configurable which will be randomly generated based on randomly picking one or 
more points from the input data set.  The number of points available from the 
input data set should be configurable too (i.e.  7.5M).   Assuming that a more 
populace place is more likely to be referenced than a less populace one, I want 
to skew choosing a place weighted by the place's population.  This creates a 
much more realistic skew of documents mapped to points than an evenly 
distributed one, which is important.  On the query side, I'd like to generate 
random queries centered on a random one of these points, with a radius that is 
either 1km, 10km, 100km, 1000km, or 1km, in order to match a wide variety 
of documents.  For reporting, I'd like to see a chart of response time vs 
number of documents matched.

I'm perhaps half-done with implementing all this. Because I need to randomly 
choose points in the data set, I can't stream it in, I need to read it all into 
memory as a singleton object used by both the indexing side, and the query side 
(since the queries need to pick a random point).

[1] http://www.geonames.org/about.html

~ David Smiley


Re: Geospatial search in Lucene/Solr

2010-12-28 Thread Robert Muir
On Tue, Dec 28, 2010 at 10:47 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Presently, I’m working on Lucene’s benchmark contrib module to evaluate the
 performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon
 range queries), and then I’ll work on a more efficient probably non-geohash
 implementation but based on the same underlying concept of a hierarchical
 grid.  I’m using the geonames.org data set.  Unfortunately, the benchmark
 code seems very oriented to a generic title-body document whereas I’m
 looking to create lat-lon pairs… and furthermore to create documents
 containing multiple lat-lon pairs, and even furthermore a query generator
 that generates random box queries centered on a random location from the
 data set.  I seem to be stretching the benchmark framework beyond the
 use-case it was designed for and so perhaps it won’t be committable but at
 least I’ll have a patch for other geospatial birds-of-a-feather like you to
 use.

 Stretch away.  The Title/Body orientation is just a relic of what we have
 done in the past, it doesn't have to stay that way.

just for reference, a couple of us are using a python front-end to
contrib/benchmark that Mike developed:

http://code.google.com/p/luceneutil/

This is nice as its designed for you to just declare 'competitors' (2
checkouts of solrcene), and then you run the python script and it
gives you the relative comparison... because they are 2 different
checkouts its simple to compare different approaches, and each
checkout can run with a different index (e.g. different codecs or test
index format changes).

I thought it might be interesting to you, because there's a variety of
queries tested here like numeric range, sorting, primary-key lookup,
span queries etc beyond the standard set of queries. The framework
also ensures that you are bringing back the same results in the same
order, runs multiple iterations (including iterations in new JVMs),
makes it easy to test optimized, optimized with deletions,
multi-segment, multi-segment with deletions, and can output to txt,
html, jira format for convenience.

currently we are generally testing with a line file format from
wikipedia, but besides geonames i wanted to point out that wikipedia
does include lat/long information for many articles (this is a major
source for much of geonames place data!).

it would definitely be cool if we could test spatial queries with this
as well... e.g by parsing out the lat/long from the wikipedia XML and
adding to the line files, and adding some spatial queries to the
default list of queries being tested.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: is the classes ended with PerThread(*PerThread) multithread

2010-12-28 Thread Earwin Burrfoot
There is a single indexchain, with a single instance of each chain
component, except those ending in -PerThread.

Though that's gonna change with
https://issues.apache.org/jira/browse/LUCENE-2324

On Tue, Dec 28, 2010 at 13:10, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Tue, Dec 28, 2010 at 10:57 AM, xu cheng xcheng@gmail.com wrote:
 hi simon
 thanks for replying very much.
 after reading the source code with your suggestion, here's my understanding,
 and I don't know whether it's right:
 the DocumentsWriter actually don't create threads, but the codes that use
 DocumentsWriter can do the multithreading(say, several threads call
 updateDocument). and each thread has its DocumentsWriterThreadState, in the
 mean while, each DocumentsWriterThreadState has its own objects(the
 *PerThread such as DocFieldProcessorPerThread, DocInverterPerThread and so
 on )
 as the methods of DocumentsWriter are called by multiple threads, for
 example, 4 threads, there are 4 DocumentsWriterThreadState objects, and 4
 index chains, ( each index chain has it's own *PerThread objects ,  to
 process the document).
 am I right??

 that sounds about right

 simon
 thanks for replying again!


 2010/12/28 Simon Willnauer simon.willna...@googlemail.com

 Hey there,

 so what you are looking at are classes that are created per Thread
 rather than shared with other threads. Lucene internally rarely
 creates threads or subclasses Thread, Runnable or Callable
 (ParallelMultiSearcher is an exception or some of the merging code).
 Yet, inside the indexer when you add (update) a document Lucene
 utilizes the callers thread rather than spanning a new one. When you
 look at DocumentsWriter.java there should be a method callled
 getThreadState. Each indexing thread, lets say in updateDocument, gets
 its Thread-Private DocumentsWriterThreadState. This thread state holds
 a DocConsumerPerThread obtained from the DocumentsWriters DocConsumer
 (see the indexing chain). DocConsumerPerThread in that case is some
 kind of decorator that hold other DocConsumerPerThread instances like
 TermsHashPerThread etc.

 The general pattern is for each DocConsumer you can get a
 DocConsumerPerThread for your indexing thread which then consumes the
 document you are processing right now.

 I hope that helps

 simon


 On Tue, Dec 28, 2010 at 4:19 AM, xu cheng xcheng@gmail.com wrote:
  hi all:
  I'm new to dev
  these days I'm reading the source code in the index package
  and I was confused.
  there are classes with suffix PerThread such as
  DocFieldProcessorPerThread,
  DocInverterPerThread, TermsHashPerThread, FreqProxTermWriterPerThread.
  in this mailing-list, I was told that they are multithreaded.
  however, there are some difficulties for me to understand!
  I see no sign that they inherited from the Thread , or implement the
  Runnable, or something else??
  how do they map to the OS thread??
  thanks ^_^

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com)
Phone: +7 (495) 683-567-4
ICQ: 104465785

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Geospatial search in Lucene/Solr

2010-12-28 Thread Smiley, David W.
Thanks for letting me know about this Rob.  I think geonames is much simpler 
(and much less data) to work with than wikipedia.  It's plain tab-delimited and 
I like that it includes the population.  I'll press forward with my benchmark 
module based patch.  I can relatively easily switch between the lat-lon type 
and my geohash type since they both conform to the SpatialQueriable interface, 
and so consequently I don't need two complete Lucene checkouts.  I had to add 
Solr  spatial as dependencies to the benchmark module but it's worth it to me.

~ David

On Dec 28, 2010, at 11:18 AM, Robert Muir wrote:

 On Tue, Dec 28, 2010 at 10:47 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Presently, I’m working on Lucene’s benchmark contrib module to evaluate the
 performance of SOLR-2155 compared to the LatLon type (i.e. a pair of lat-lon
 range queries), and then I’ll work on a more efficient probably non-geohash
 implementation but based on the same underlying concept of a hierarchical
 grid.  I’m using the geonames.org data set.  Unfortunately, the benchmark
 code seems very oriented to a generic title-body document whereas I’m
 looking to create lat-lon pairs… and furthermore to create documents
 containing multiple lat-lon pairs, and even furthermore a query generator
 that generates random box queries centered on a random location from the
 data set.  I seem to be stretching the benchmark framework beyond the
 use-case it was designed for and so perhaps it won’t be committable but at
 least I’ll have a patch for other geospatial birds-of-a-feather like you to
 use.
 
 Stretch away.  The Title/Body orientation is just a relic of what we have
 done in the past, it doesn't have to stay that way.
 
 just for reference, a couple of us are using a python front-end to
 contrib/benchmark that Mike developed:
 
 http://code.google.com/p/luceneutil/
 
 This is nice as its designed for you to just declare 'competitors' (2
 checkouts of solrcene), and then you run the python script and it
 gives you the relative comparison... because they are 2 different
 checkouts its simple to compare different approaches, and each
 checkout can run with a different index (e.g. different codecs or test
 index format changes).
 
 I thought it might be interesting to you, because there's a variety of
 queries tested here like numeric range, sorting, primary-key lookup,
 span queries etc beyond the standard set of queries. The framework
 also ensures that you are bringing back the same results in the same
 order, runs multiple iterations (including iterations in new JVMs),
 makes it easy to test optimized, optimized with deletions,
 multi-segment, multi-segment with deletions, and can output to txt,
 html, jira format for convenience.
 
 currently we are generally testing with a line file format from
 wikipedia, but besides geonames i wanted to point out that wikipedia
 does include lat/long information for many articles (this is a major
 source for much of geonames place data!).
 
 it would definitely be cool if we could test spatial queries with this
 as well... e.g by parsing out the lat/long from the wikipedia XML and
 adding to the line files, and adding some spatial queries to the
 default list of queries being tested.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Geospatial search in Lucene/Solr

2010-12-28 Thread Robert Muir
On Tue, Dec 28, 2010 at 11:59 AM, Smiley, David W. dsmi...@mitre.org wrote:
 Thanks for letting me know about this Rob.  I think geonames is much simpler 
 (and much less data) to work with than wikipedia.  It's plain tab-delimited 
 and I like that it includes the population.  I'll press forward with my 
 benchmark module based patch.  I can relatively easily switch between the 
 lat-lon type and my geohash type since they both conform to the 
 SpatialQueriable interface, and so consequently I don't need two complete 
 Lucene checkouts.  I had to add Solr  spatial as dependencies to the 
 benchmark module but it's worth it to me.


well in my opinion there isn't a reason why benchmark couldn't be
moved to /modules and depend on solr... others might disagree but i
would prefer that benchmark be a module where you can benchmark
everything.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2299) improve test-running from eclipse

2010-12-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2299:
--

Attachment: SOLR-2299.patch

Same patch, i only fixed a couple tests that explicitly used File to use 
resources instead.

I looked at all the failing ones, its a little bit of work but I think its 
definitely feasible that
we get the solr tests working so that:
# you put test resources in classpath
# tests are independent of current working directory.

I think this is just a good general simplification (nothing to do with eclipse) 
and is good
for the ant build too, so that tests dont have to run with a CWD of 
src/test/test-files in ant.

The problem with the way we do this now (even in ant) is that i've sporatically 
seen tests actually create files in this CWD, which means we are creating 
leftovers in a src directory that could accidentally be committed, among other 
problems.

I'd like to commit this now and iterate on the remaining individual tests so 
that they open their test resources all as resources... its not that many left 
but some are tricky.


 improve test-running from eclipse
 -

 Key: SOLR-2299
 URL: https://issues.apache.org/jira/browse/SOLR-2299
 Project: Solr
  Issue Type: Test
  Components: Build
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: SOLR-2299.patch, SOLR-2299.patch


 In eclipse, its currently difficult to get a solr development environment 
 working.
 One big thing that would help would be to make it easier to run the tests.
 When loading resources, if we checked the config dir + file directory from 
 the resource path,
 then users could simply add src/test/test-files to their eclipse build 
 classpath, and tests would just work from the IDE.
 I gather that this might make things easier for other IDEs too, though I'm 
 aware that ones like Intellij
 let you configure the test 'working directory' on a project basis, but 
 eclipse doesn't 
 (you have to make a custom run configuration every time you run the tests)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-2301) RSS Feed URL Breaking

2010-12-28 Thread Adam Estrada (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975618#action_12975618
 ] 

Adam Estrada commented on SOLR-2301:


Thanks Carl,

I heard somewhere that Manifold or the Connector Framework were all going to 
be integrated in to Lucene/Solr. Any thoughts on that?

Adam



 RSS Feed URL Breaking
 -

 Key: SOLR-2301
 URL: https://issues.apache.org/jira/browse/SOLR-2301
 Project: Solr
  Issue Type: Bug
  Components: clients - C#
Affects Versions: 1.4.1, 4.0
 Environment: Windows 7
Reporter: Adam Estrada

 This is an odd oneI am trying to index RSS feeds and have come across 
 several issues. Some are more pressing than others. Referring to SOLR-2286 ;-)
 Anyway, the CDC has a list of RSS feeds that the Solr dataimporter can't work 
 with
 Home page:
 http://emergency.cdc.gov/rss/
 Page to Index:
 http://www2a.cdc.gov/podcasts/createrss.asp?t=rc=19
 The console reports the following and as you can see it's because it does not 
 like the param c. Any ideas on how to fix this?
 INFO: Processing configuration from solrconfig.xml: 
 {config=./solr/conf/dataimpo
 rthandler/rss.xml}
 [Fatal Error] :18:63: The reference to entity c must end with the ';' 
 delimite
 r.
 Dec 28, 2010 2:39:46 PM org.apache.solr.handler.dataimport.DataImportHandler 
 inf
 orm
 SEVERE: Exception while loading DataImporter
 org.apache.solr.handler.dataimport.DataImportHandlerException: Exception 
 occurre
 d while initializing context
 at 
 org.apache.solr.handler.dataimport.DataImporter.loadDataConfig(DataIm
 porter.java:193)
 at 
 org.apache.solr.handler.dataimport.DataImporter.init(DataImporter.j
 ava:100)
 at 
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImpor
 tHandler.java:112)
 at 
 org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.jav
 a:539)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:596)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:660)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:412)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:294)
 at 
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContain
 er.java:243)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-28 Thread Michael McCandless (JIRA)
Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
into IndexSearcher
---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0


We've discussed cleaning up our *Searcher stack for some time... I
think we should try to do this before releasing 4.0.

So I'm attaching an initial patch which:

  * Removes Searcher, Searchable, absorbing all their methods into IndexSearcher

  * Removes contrib/remote

  * Removes MultiSearcher

  * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
pass useThreads=true, or a custom ES to the ctor)

The patch is rough -- I just ripped stuff out, did search/replace to
IndexSearcher, etc.  EG nothing is directly testing using threads with
IndexSearcher, but before committing I think we should add a
newSearcher to LuceneTestCase, which randomly chooses whether the
searcher uses threads, and cutover tests to use this instead of making
their own IndexSearcher.

I think MultiSearcher has a useful purpose, but as it is today it's
too low-level, eg it shouldn't be involved in rewriting queries: the
Query.combine method is scary.  Maybe in its place we make a higher
level class, with limited API, that's able to federate search across
multiple IndexSearchers?  It'd also be able to optionally use thread
per IndexSearcher.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975649#action_12975649
 ] 

Robert Muir commented on LUCENE-2837:
-

{quote}
but before committing I think we should add a
newSearcher to LuceneTestCase, which randomly chooses whether the
searcher uses threads, and cutover tests to use this instead of making
their own IndexSearcher.
{quote}

I did this on LUCENE-2751, but the tests won't all pass until we fix the 
FieldCache autodetect 
synchronization bug (the Numerics tests will fail with multiple threads)...


 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1566) Allow components to add fields to outgoing documents

2010-12-28 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated SOLR-1566:


Attachment: SOLR-1566-rm.patch

updated patch to work with trunk

note, this does not do anything yet, but points towards a direction

 Allow components to add fields to outgoing documents
 

 Key: SOLR-1566
 URL: https://issues.apache.org/jira/browse/SOLR-1566
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Noble Paul
Assignee: Grant Ingersoll
 Fix For: Next

 Attachments: SOLR-1566-gsi.patch, SOLR-1566-rm.patch, 
 SOLR-1566-rm.patch, SOLR-1566-rm.patch, SOLR-1566.patch, SOLR-1566.patch, 
 SOLR-1566.patch, SOLR-1566.patch


 Currently it is not possible for components to add fields to outgoing 
 documents which are not in the the stored fields of the document.  This makes 
 it cumbersome to add computed fields/metadata .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores

2010-12-28 Thread Uwe Schindler (JIRA)
ConstantScoreQuery should directly support wrapping Query and simply strip off 
scores
-

 Key: LUCENE-2838
 URL: https://issues.apache.org/jira/browse/LUCENE-2838
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0


Especially in MultiTermQuery rewrite modes we often simply need to strip off 
scores from Queries and make them constant score. Currently the code to do this 
looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query))

The name say, QueryWrapperFilter should make any other Query constant score, so 
why does it not take a Query as ctor param. This question was aldso askedquite 
often by my customers and is simply stupid.

Looking closer into the code, it is clear that this would also speed up MTQs:
- One additional wrapping and method calls can be removed
- Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only used 
in tests and the use-case for this class is not really available) and 
LUCENE-2831 does not need the stupid hack to make Simon's assertions pass
- CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on 
top-level now directly feeds the Collector. For that a small trick is used: The 
score(Collector) calls are directly delegated and the scores are stripped by 
wrapping the setScorer() method in Collector

During that I found a visibility bug in Scorer: The method boolean 
score(Collector collector, int max, int firstDocID) should be public not 
protected, as its not solely intended to be overridden by subclasses and is 
called from other classes, too! This leads to no compiler bugs as the other 
classes that calls it is mainly BooleanScorer(2) and thats in same package, but 
visibility is wrong. I will open an issue for that and fix it at least in trunk 
where we have no backwards-requirement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Solr pseudo fields in response

2010-12-28 Thread Ryan McKinley
With Yonik's work on SOLR-2297, I figured i would dive back into
figuing out how we could do this  I updated SOLR-1566 to compile
with trunk, but when i looked into really making it work, there are
some issues.

It looks like the APIs now support extra fields *BUT* they don't do
anything, and there is no clear way how to best make them work.  For
example
  TextResponseWriter.writeSolrDocument() includes a parameter Map pseudoFields
and
  TextResponseWriter.writeSolrDocumentList() includes Map otherFields

Any idea how this is supposed to work?

I see a few problems with this approach:
* this requires each format (XML/JSON/binary/etc) to do its own implementing
* only supports SolrDocument, not Document

BaseResponseWriter may be a reasonable approach -- it abstracts the
Document creation into one place.  It may have performance issues
since every document would get turned into a SolrDocument before
getting written.  This only applies to the docs that are written so it
may not be a big deal.

BUT BaseResponseWriter does not have any concrete implementation and
converting the XML/TextResponseWriter to use it is not clear to me.

Thoughts?

ryan

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2838) ConstantScoreQuery should directly support wrapping Query and simply strip off scores

2010-12-28 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2838:
--

Attachment: LUCENE-2838.patch

 ConstantScoreQuery should directly support wrapping Query and simply strip 
 off scores
 -

 Key: LUCENE-2838
 URL: https://issues.apache.org/jira/browse/LUCENE-2838
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2838.patch


 Especially in MultiTermQuery rewrite modes we often simply need to strip off 
 scores from Queries and make them constant score. Currently the code to do 
 this looks quite ugly: new ConstantScoreQuery(new QueryWrapperFilter(query))
 The name say, QueryWrapperFilter should make any other Query constant score, 
 so why does it not take a Query as ctor param. This question was aldso 
 askedquite often by my customers and is simply stupid.
 Looking closer into the code, it is clear that this would also speed up MTQs:
 - One additional wrapping and method calls can be removed
 - Maybe we can even deprecate QueryWrapperFilter in 3.1 now (it's now only 
 used in tests and the use-case for this class is not really available) and 
 LUCENE-2831 does not need the stupid hack to make Simon's assertions pass
 - CSQ now supports out-of-order scoring and topLevel scoring, so a CSQ on 
 top-level now directly feeds the Collector. For that a small trick is used: 
 The score(Collector) calls are directly delegated and the scores are stripped 
 by wrapping the setScorer() method in Collector
 During that I found a visibility bug in Scorer: The method boolean 
 score(Collector collector, int max, int firstDocID) should be public not 
 protected, as its not solely intended to be overridden by subclasses and is 
 called from other classes, too! This leads to no compiler bugs as the other 
 classes that calls it is mainly BooleanScorer(2) and thats in same package, 
 but visibility is wrong. I will open an issue for that and fix it at least in 
 trunk where we have no backwards-requirement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2839) Visibility of Scorer.score(Collector, int, int) is wrong

2010-12-28 Thread Uwe Schindler (JIRA)
Visibility of Scorer.score(Collector, int, int) is wrong


 Key: LUCENE-2839
 URL: https://issues.apache.org/jira/browse/LUCENE-2839
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Uwe Schindler
 Fix For: 4.0


The method for scoring subsets in Scorer has wrong visibility, its marked 
protected, but protected methods should not be called from other classes. 
Protected methods are intended for methods that should be overridden by 
subclasses and are called by (often) final methods of the same class. They 
should never be called from foreign classes.

This method is called from another class out-of-scope: BooleanScorer(2) - so it 
must be public, but it's protected. This does not lead to a compiler error 
because BS(2) is in same package, but may lead to problems if subclasses from 
other packages override it. When implementing LUCENE-2838 I hit a trap, as I 
thought tis method should only be called from the class or Scorer itsself, but 
in fact its called from outside, leading to bugs, because I had not overridden 
it. As ConstantScorer did not use it I have overridden it with throw UOE and 
suddenly BooleanQuery was broken, which made it clear that it's called from 
outside (which is not the intention of protected methods).

We cannot fix this in 3.x, as it would break backwards for classes that 
overwrite this method, but we can fix visibility in trunk.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



SV:

2010-12-28 Thread Rida Benjelloun
I bought some items from a commercial site, because of the unique
channel of purchases,
product prices unexpected, I think you can go to see: elesales.com ,
high-quality products can also attract you.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re:

2010-12-28 Thread Rida Benjelloun
I bought some items from a commercial site, because of the unique
channel of purchases,
product prices unexpected, I think you can go to see: elesales.com ,
high-quality products can also attract you.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-28 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975670#action_12975670
 ] 

Uwe Schindler commented on LUCENE-2837:
---

{quote}
I think MultiSearcher has a useful purpose, but as it is today it's
too low-level, eg it shouldn't be involved in rewriting queries: the
Query.combine method is scary. Maybe in its place we make a higher
level class, with limited API, that's able to federate search across
multiple IndexSearchers? It'd also be able to optionally use thread
per IndexSearcher.
{quote}

Query.combine is simply broken, this is another issue. It violates DeMorgans 
law...: LUCENE-2756

 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2299) improve test-running from eclipse

2010-12-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2299:
--

Attachment: SOLR-2299_part2.patch

here's a patch fixing a lot of tests, only a few core tests left.

 improve test-running from eclipse
 -

 Key: SOLR-2299
 URL: https://issues.apache.org/jira/browse/SOLR-2299
 Project: Solr
  Issue Type: Test
  Components: Build
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: SOLR-2299.patch, SOLR-2299.patch, SOLR-2299_part2.patch


 In eclipse, its currently difficult to get a solr development environment 
 working.
 One big thing that would help would be to make it easier to run the tests.
 When loading resources, if we checked the config dir + file directory from 
 the resource path,
 then users could simply add src/test/test-files to their eclipse build 
 classpath, and tests would just work from the IDE.
 I gather that this might make things easier for other IDEs too, though I'm 
 aware that ones like Intellij
 let you configure the test 'working directory' on a project basis, but 
 eclipse doesn't 
 (you have to make a custom run configuration every time you run the tests)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2837) Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS into IndexSearcher

2010-12-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12975676#action_12975676
 ] 

Robert Muir commented on LUCENE-2837:
-

bq. Query.combine is simply broken, this is another issue.

I agree, but with this issue we don't need Query.combine anymore, so its then 
fixed.
This method only exists for MultiSearcher (and there is some other dead code in 
Query.java related to it, that we could even delete now, totally unused today!)


 Collapse Searcher/Searchable/IndexSearcher; remove contrib/remote; merge PMS 
 into IndexSearcher
 ---

 Key: LUCENE-2837
 URL: https://issues.apache.org/jira/browse/LUCENE-2837
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2837.patch


 We've discussed cleaning up our *Searcher stack for some time... I
 think we should try to do this before releasing 4.0.
 So I'm attaching an initial patch which:
   * Removes Searcher, Searchable, absorbing all their methods into 
 IndexSearcher
   * Removes contrib/remote
   * Removes MultiSearcher
   * Absorbs ParallelMultiSearcher into IndexSearcher (ie you can now
 pass useThreads=true, or a custom ES to the ctor)
 The patch is rough -- I just ripped stuff out, did search/replace to
 IndexSearcher, etc.  EG nothing is directly testing using threads with
 IndexSearcher, but before committing I think we should add a
 newSearcher to LuceneTestCase, which randomly chooses whether the
 searcher uses threads, and cutover tests to use this instead of making
 their own IndexSearcher.
 I think MultiSearcher has a useful purpose, but as it is today it's
 too low-level, eg it shouldn't be involved in rewriting queries: the
 Query.combine method is scary.  Maybe in its place we make a higher
 level class, with limited API, that's able to federate search across
 multiple IndexSearchers?  It'd also be able to optionally use thread
 per IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-3.x - Build # 225 - Still Failing

2010-12-28 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/225/

All tests passed

Build Log (for compile errors):
[...truncated 20926 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-trunk - Build # 1409 - Failure

2010-12-28 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-trunk/1409/

All tests passed

Build Log (for compile errors):
[...truncated 17903 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2611) IntelliJ IDEA setup

2010-12-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2611:


Attachment: LUCENE-2611_eclipse.patch

here's the eclipse part (following the same conventions of your patch).

basically for eclipse, getting things to work is just setting up the classpath 
and setting the whole project to UTF-8... but this takes a while even if you 
know everything you need to do.


 IntelliJ IDEA setup
 ---

 Key: LUCENE-2611
 URL: https://issues.apache.org/jira/browse/LUCENE-2611
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Build
Affects Versions: 3.1, 4.0
Reporter: Steven Rowe
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611-branch-3x.patch, 
 LUCENE-2611-branch-3x.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, LUCENE-2611.patch, 
 LUCENE-2611.patch, LUCENE-2611_eclipse.patch, LUCENE-2611_mkdir.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test.patch, LUCENE-2611_test.patch, 
 LUCENE-2611_test.patch, LUCENE-2611_test_2.patch


 Setting up Lucene/Solr in IntelliJ IDEA can be time-consuming.
 The attached patch adds a new top level directory {{dev-tools/}} with sub-dir 
 {{idea/}} containing basic setup files for trunk, as well as a top-level ant 
 target named idea that copies these files into the proper locations.  This 
 arrangement avoids the messiness attendant to in-place project configuration 
 files directly checked into source control.
 The IDEA configuration includes modules for Lucene and Solr, each Lucene and 
 Solr contrib, and each analysis module.  A JUnit test run per module is 
 included.
 Once {{ant idea}} has been run, the only configuration that must be performed 
 manually is configuring the project-level JDK.
 If this patch is committed, Subversion svn:ignore properties should be 
 added/modified to ignore the destination module files (*.iml) in each 
 module's directory.
 Iam Jambour has written up on the Lucene wiki a detailed set of instructions 
 for applying the 3.X branch patch: 
 http://wiki.apache.org/lucene-java/HowtoConfigureIntelliJ

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Lucene-Solr-tests-only-3.x - Build # 3097 - Failure

2010-12-28 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/3097/

1 tests failed.
REGRESSION:  org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability

Error Message:
No live SolrServers available to handle this request

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this request
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:220)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.waitForServer(TestLBHttpSolrServer.java:188)
at 
org.apache.solr.client.solrj.TestLBHttpSolrServer.testReliability(TestLBHttpSolrServer.java:181)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:956)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:894)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.net.SocketTimeoutException: Read timed out
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:204)
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:146)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427)




Build Log (for compile errors):
[...truncated 9798 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



any issues about the *perthread classes

2010-12-28 Thread xu cheng
hi all
I noticed that there are plenty *PerThread classes in the trunk
http://svn.apache.org/repos/asf/lucene/dev/trunk/
while in the realtime_search version
http://svn.apache.org/repos/asf/lucene/dev/branches/realtime_search/
the *PerThread classes are gone!
this just confused me,  cos I'm new here.

what's the purpose of such a design?what's the advantage? any issues refer
to this ??

any suggestion or references are appreciated!
regards.
xu


Solr-3.x - Build # 211 - Still Failing

2010-12-28 Thread Apache Hudson Server
Build: https://hudson.apache.org/hudson/job/Solr-3.x/211/

All tests passed

Build Log (for compile errors):
[...truncated 20482 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org