from:"Reitzel, Charles"

RE: Coverity scan results of Lucene

2015-07-14 Thread Reitzel, Charles

What about the "High Impact" defects?If I'm reading the report right, there 
appear to be only 12 of these.  

Rishabh, could you copy just the High Impact defects to the list?

-Original Message-
From: Dawid Weiss [mailto:dawid.we...@gmail.com] 
Sent: Tuesday, July 14, 2015 11:59 AM
To: dev@lucene.apache.org
Subject: Re: Coverity scan results of Lucene

Yeah, that's exactly what I though. If you look at the code you'll see
that these are false positives. For example:

5) There is a loop, the following comments gives you a clue:

  // Loop until we succeed in calling doBody() without
  // hitting an IOException.

and inside the loop the exc variable is assigned to IOException,
should there be any.

2) There is a refcount check before this assignment (and this is using
atomic variables); it looks like it's harmless. To make it not
complain is not an easy fix; you can't add synchronization there.

Didn't look at other places, but it's definitely not an automated
pick-and-fix task to address those 444 issues

Dawid


On Tue, Jul 14, 2015 at 5:48 PM, Rishabh Patel
 wrote:
> org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java
>
> Line 498: Volatile not atomically updated. Updating nextID based on a stale
> value. Any intervening update in another thread is overwritten
>
> org/apache/lucene/index/IndexReader.java
>
> Line 249: Unguarded write. missing_lock: Accessing closed without holding
> lock IndexReader.this. Elsewhere,
> "org.apache.lucene.index.IndexReader.closed" is accessed
> withIndexReader.this held 2 out of 3 times.
>
> org/apache/lucene/index/SnapshotDeletionPolicy.java
>
> Line 116: Unguarded read. missing_lock: Accessing indexCommits without
> holding lock SnapshotDeletionPolicy.this. Elsewhere,
> "org.apache.lucene.index.SnapshotDeletionPolicy.indexCommits" is accessed
> withSnapshotDeletionPolicy.this held 4 out of 5 times.
> 'lastCommit' accessed in both synchronized and unsynchronized contexts.
>
> org/apache/lucene/queries/function/valuesource/QueryValueSource.java
>
> Line 76 and 116: Passing null pointer fcontext to createWeight, which
> dereferences it.
>
> org/apache/lucene/index/SegmentInfos.java
>
> Line 687: Throwing null exception exc.
>
>
> On Tue, Jul 14, 2015 at 10:59 AM, Rishabh Patel
>  wrote:
>>
>> org/apache/lucene/facet/taxonomy/directory/DirectoryTaxonomyWriter.java:
>>
>>
>> On Tue, Jul 14, 2015 at 10:44 AM, Dawid Weiss 
>> wrote:
>>>
>>> The 444 defects is an overwhelming number. Most of those automated
>>> tools detect things that turn to be valid code (upon closer
>>> inspection). Could you start by listing, say, the first 5 defects that
>>> actually make sense and are indeed flawed code that should be fixed?
>>>
>>> Dawid
>>>
>>> On Tue, Jul 14, 2015 at 4:33 PM, Rishabh Patel
>>>  wrote:
>>> > Hello!
>>> >
>>> > I scanned the Lucene project with Coverity scanner. 444 defects have
>>> > been
>>> > detected.
>>> > Please check the attached report on the breakup of the issues. Some of
>>> > the
>>> > issues are false positives.
>>> >
>>> > I would like to volunteer for fixing these defects.
>>> >
>>> > Before I start, could you please tell me whether I should I create a
>>> > single
>>> > JIRA for a kind of issue (e.g. "Concurrent data access" or "Null
>>> > pointer
>>> > exception") or should multiple issues be created according to the
>>> > module of
>>> > the files to be modified?
>>> >
>>> > --
>>> > Sincerely,
>>> > Rishabh Patel
>>> >
>>> >
>>> >
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> > For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>
>>
>>
>> --
>> Sincerely,
>> Rishabh Patel
>>
>>
>
>
>
> --
> Sincerely,
> Rishabh Patel
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: Version field as DV

2015-06-29 Thread Reitzel, Charles

Shalin, that makes sense.But it also seems like the details of _version_ 
can and should be handled internally and not be subjected to the vagaries of 
deployment.   Put another way, whenever _version_ is used, shouldn't its 
storage should be determined by the code, not schema.xml?

SOLR-5944 is a super important issue with endless applications.  Pricing is a 
huge use case: price field values fluctuate by the minute, hour, day, etc., but 
docs remain otherwise very stable.   But there are many other cases with 
similar semantics (e.g. share counts, purchase order quantities, assigned 
resources).

So, I guess I'm encouraging you to do whatever it takes to _version_ to make 
SOLR-5944 work.   :-)

P.S. Many thanks to Chris Hostetter for his corrections and clarifications.  
I'm learning a lot from this thread.

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Thursday, June 25, 2015 12:48 AM
To: dev@lucene.apache.org
Subject: Re: Version field as DV

On Tue, Jun 23, 2015 at 6:41 PM, Adrien Grand  wrote:
> For the record, there is an experimental postings format in 
> lucene/sandbox called IDVersionPostingsFormat that stores both the ID 
> and version in a postings format. This way you don't have to perform 
> additional seeks to look up the version, and it's even optimized for 
> id look ups with a minimum version for faster optimistic concurrency.

Yeah, I have looked at it in the past but in the context of updateable 
DocValues, I feel that there is no way to support updateable doc values if we 
use the IDVersionPostingsFormat. This is because we must update a DocValue 
field together with the version field atomically or else we run into 
consistency issues.

>
> On Mon, Jun 22, 2015 at 4:41 PM, Ishan Chattopadhyaya 
>  wrote:
>> Hi all,
>> I am looking to try out _version_ as a docvalue (SOLR-6337) as a 
>> precursor to SOLR-5944. Towards that, I want the _version_ field to 
>> be stored=indexed=false, docValues=true.
>>
>> Does someone know about the performance implications of retrieving 
>> the _version_ as a docvalue, e.g. accessing docvalue vs. a stored 
>> field? Is there any known inefficiency when using a docvalue (as 
>> opposed to a stored
>> field) due to random disk seeks, for example?
>> Regards,
>> Ishan
>
>
>
> --
> Adrien

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: Version field as DV

2015-06-22 Thread Reitzel, Charles

I think where Ishan is going with his question is this:

1.  _version_ never needs to be searchable, thus, indexed=false makes sense.

2.  _version_ typically needs to be evaluated with performing an update 
and, possibly, delete, thus stored=true makes sense.

3.  _version_ would never be used for either sorting or faceting.

4.  Given the above, is using docValues=true for _version_ a good idea?

Looking at the documentation:
https://cwiki.apache.org/confluence/display/solr/DocValues

And a bit more background:
http://lucidworks.com/blog/fun-with-docvalues-in-solr-4-2/

My take is a simple “no”.   Since docValues is, in essence, column oriented 
storage (and can be seen, I think, as an alternate index format), what benefit 
is to be gained for the _version_ field.   The primary benefits of docValues 
are in the sorting and faceting operations (maybe grouping?).   These 
operations are never performed on the _version_ field, are they?

I guess my remaining question is does it make sense to set indexed=”false” on 
_version_?   The example schemas set indexed=true.   Does solr itself perform 
searches internally on _version_?   If so, then indexed=true is required.   But 
otherwise, it seems like useless overhead.

Note, I have been using optimistic concurrency control in one application and, 
so, am interested in this possible optimization.   Any changes in this space 
between 4.x and 5.x?

Thanks,
Charlie

From: Joel Bernstein [mailto:joels...@gmail.com]
Sent: Monday, June 22, 2015 11:55 AM
To: lucene dev
Subject: Re: Version field as DV

In general DocValues were built to support large scale random access use cases 
such as faceting and sorting. They have similar performance characteristics as 
the FieldCache. But unlike the FieldCache you can trade off memory and 
performance by selecting different DocValues formats.

Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, Jun 22, 2015 at 10:41 AM, Ishan Chattopadhyaya 
mailto:ichattopadhy...@gmail.com>> wrote:
Hi all,
I am looking to try out _version_ as a docvalue (SOLR-6337) as a precursor to 
SOLR-5944. Towards that, I want the _version_ field to be stored=indexed=false, 
docValues=true.
Does someone know about the performance implications of retrieving the 
_version_ as a docvalue, e.g. accessing docvalue vs. a stored field? Is there 
any known inefficiency when using a docvalue (as opposed to a stored field) due 
to random disk seeks, for example?
Regards,
Ishan


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-04-02 Thread Reitzel, Charles

Unfortunately, since boost is used in hashCode() and equals() calculations, 
changing the boost will still make the queries "trappy".   You will do all that 
work to make everything-but-boost immutable and still not fix the problem.

You can prove it to yourself like so (this test fails!):

  public void testMapOrphan() {
Map map = new HashMap<>();
BooleanQuery booleanAB = new BooleanQuery();
booleanAB.add(new TermQuery(new Term("contents", "a")), 
BooleanClause.Occur.SHOULD);
booleanAB.add(new TermQuery(new Term("contents", "b")), 
BooleanClause.Occur.SHOULD);
map.put( booleanAB, 1 );

booleanAB.setBoost( 33.3f );// Set boost after map.put()
assertTrue( map.containsKey(booleanAB) );
  }

Seems like the quick way to the coast is to write a failing test - before 
making changes.

I realize this is easier said than done.  Based on your testing that led you to 
start this discussion, can you narrow it down to a single Query class and/or 
IndexSearcher use case?   Not there will be only one case.  But, at least, it 
will be a starting point.   Once the first failing test has been written, it 
should be relatively easy to write test variations to cover the remaining 
"mutuable" Query classes.

With the scale of the changes you are proposing, "test first" seems like a 
reasonable approach.

Another compromise approach might be to sub-class the mutable Query classes 
like so:

class ImmutableBooleanQuery extends BooleanQuery {
   public void add(BooleanClause clause) {
  throw new UnsupportedOperationException( 
"ImmutableBooleanQuery.add(BooleanClause)" );
   }
   public void setBoost( int boost ) {
  throw new UnsupportedOperationException( 
"ImmutableBooleanQuery.add(BooleanClause)" );
   }
   // etc.

  public static ImmutableBooleanQuery cloneFrom(BooleanQuery original) {
   // Use field level access to by-pass mutator methods.
  }

   // Do NOT override rewrite(IndexReader)!
}

In theory, such a proxy class could be generated at runtime to force 
immutability:
https://github.com/verhas/immutator

Which could make a lot of sense in JUnit tests, if not production runtime.

An immutable Query would be cloned from the original and place on the cache 
instead.  Any attempt to modify the cache entry should fail quickly.   

To me, a less invasive approach seems like a faster and easier way to actually 
find and fix this bug.
Once that is done, then it might make sense to perform the exhaustive updates 
to prevent a "relapse" in the future.

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Thursday, April 02, 2015 9:46 AM
To: dev@lucene.apache.org
Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0

Boosts might not make sense to become immutable, it might make the code too 
complex. Who is to say until the other stuff is fixed first.
The downsides might outweight the upsides.

So yeah, if you want to say "if anyone disagrees with what the future might 
look like i'm gonna -1 your progress", then i will bite right now.

Fixing the rest of Query to be immutable, so filter caching isn't trappy, we 
should really do that. And we have been doing it already. I remember Uwe 
suggested this approach when adding automaton and related queries a long time 
ago. It made things simpler and avoided bugs, we ultimately made as much of it 
immutable as we could.

Queries have to be well-behaved, they need a good hashcode/equals, thread 
safety, good error checking, etc. It is easier to do this when things are 
immutable. Someone today can make a patch for FooQuery that nukes setBar and 
moves it to a ctor parameter named 'bar' and chances are a lot of the time, it 
probably fixes bugs in FooQuery somehow.
Thats just what it is.

Boosts are the 'long tail'. they are simple primitive floating point values, so 
susceptible to less problems. The base class incorporates boosts into 
equals/hashcode already, which prevents the most common bugs with them. They 
are trickier because internal things like
rewrite() might "shuffle them around" in conjunction with clone(), to do 
optimizations. They are also only relevant when scores are needed:
so we can prevent nasty filter caching bugs as a step, by making everything 
else immutable.

On Thu, Apr 2, 2015 at 9:27 AM, david.w.smi...@gmail.com 
 wrote:
> On Thu, Apr 2, 2015 at 3:40 AM, Adrien Grand  wrote:
>>
>> first make queries immutable up to the boost and then discuss 
>> if/how/when we should go fully immutable with a new API to change 
>> boosts?
>
>
> The “if” part concerns me; I don’t mind it being a separate issue to 
> make the changes more manageable (progress not perfection, and all 
> that).  I’m all for the whole shebang.  But if others think “no” 
> then…. will it have been worthwhile to do this big change and not go all the 
> way?  I think not.
> Does anyone feel the answer is “no” to make boosts immutable? And if so why?
>
> If nobody comes up with a dissenting opinion t

RE: [DISCUSS] Change Query API to make queries immutable in 6.0

2015-03-31 Thread Reitzel, Charles

Am I missing something?   Across the project, I’m seeing over 1,000 references 
to BooleanQuery.add().   Already, this seems like a pretty major refactoring.  
And I haven’t checked the other types of queries: DisjunctionMax, Phrase, and 
MultiPhrase.   At that scale, bugs will be introduced.

I’m not disagreeing with the concept.  At all.   It’s part of the Collections 
contract that anything used in hashCode() and equals() be kept immutable.  Just 
wondering if the cost is worth the principle this time?

In the spirit of discussion, an alternate approach might be to:
  a. Locate the places in the code where a query is taken from the cache and 
modified after the fact.
  b. Remove the query object before modifying and placing back on the cache.

Easier said than done, I realize.  Note, changing the constructors and removing 
modifiers would force all of these changes anyway.  It's just that they would 
be lost in a forest of other minor modifications.So, even if folks are ok 
with the larger scale changes, it might make sense to start with the 
problematic places first and then move on to the bulk of "syntax changes".

Please ignore this if I am missing something here.

From: Terry Smith [mailto:sheb...@gmail.com] 
Sent: Tuesday, March 31, 2015 9:38 AM
To: dev@lucene.apache.org
Subject: Re: [DISCUSS] Change Query API to make queries immutable in 6.0

Adrien,

I missed the reason that boost is going to stay mutable. Is this to support 
query rewriting?

--Terry

On Tue, Mar 31, 2015 at 7:21 AM, Robert Muir  wrote:
Same with BooleanQuery. the go-to ctor should just take 'clauses'

On Tue, Mar 31, 2015 at 5:18 AM, Michael McCandless
 wrote:
> +1
>
> For PhraseQuery we could also have a common-case ctor that just takes
> the terms (and assumes sequential positions)?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Mar 31, 2015 at 5:10 AM, Adrien Grand  wrote:
>> Recent changes that added automatic filter caching to IndexSearcher
>> uncovered some traps with our queries when it comes to using them as
>> cache keys. The problem comes from the fact that some of our main
>> queries are mutable, and modifying them while they are used as cache
>> keys makes the entry that they are caching invisible (because the hash
>> code changed too) yet still using memory.
>>
>> While I think most users would be unaffected as it is rather uncommon
>> to modify queries after having passed them to IndexSearcher, I would
>> like to remove this trap by making queries immutable: everything
>> should be set at construction time except the boost parameter that
>> could still be changed with the same clone()/setBoost() mechanism as
>> today.
>>
>> First I would like to make sure that it sounds good to everyone and
>> then to discuss what the API should look like. Most of our queries
>> happen to be immutable already (NumericRangeQuery, TermsQuery,
>> SpanNearQuery, etc.) but some aren't and the main exceptions are:
>>  - BooleanQuery,
>>  - DisjunctionMaxQuery,
>>  - PhraseQuery,
>>  - MultiPhraseQuery.
>>
>> We could take all parameters that are set as setters and move them to
>> constructor arguments. For the above queries, this would mean (using
>> varargs for ease of use):
>>
>>   BooleanQuery(boolean disableCoord, int minShouldMatch,
>>     BooleanClause... clauses)
>>   DisjunctionMaxQuery(float tieBreakMul, Query... clauses)
>>
>> For PhraseQuery and MultiPhraseQuery, the closest to what we have
>> today would require adding new classes to wrap terms and positions
>> together, for instance:
>>
>> class TermAndPosition {
>>   public final BytesRef term;
>>   public final int position;
>> }
>>
>> so that eg. PhraseQuery would look like:
>>
>>   PhraseQuery(int slop, String field, TermAndPosition... terms)
>>
>> MultiPhraseQuery would be the same with several terms at the same position.
>>
>> Comments/ideas/concerns are highly welcome.
>>
>> --
>> Adrien

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: Make Solr's core admin API internal-only/implementation detail?

2015-03-30 Thread Reitzel, Charles

As a Solr user, my feeling is that the proposal would be a significant 
improvement in the product design.  A significant piece of related work, imo, 
would be to update all of the examples and "getting started" docs.   Likewise, 
isn't the plan for migration from 4.x to 5.x (wrt to maintenance, monitoring, 
ETL, etc.) is affected by this issue? I feel clarity and stability in these 
areas would help adoption of 5.x.

Re: SOLR-6278, My feeling is that the term "core" could just go away with the 
public API .   There should be a single, clear cut way to address individual 
replicas, of any given shard, within any collection.  Even if there is only one 
replica (the leader) and only one shard, the addressing scheme and terminology 
remain the same.   

If there are functional gaps in the Collections API, fill them as needed - 
perhaps by delegating to the internal "Admin" service at the node level.  But 
please keep the public parameters and terminology consistent.   There should be 
only one way to do it ...

On Saturday, March 28, 2015 9:17 PM, Yonik Seeley [mailto:ysee...@gmail.com] 
wrote:
>
> On Sat, Mar 28, 2015 at 2:27 PM, Erick Erickson  
> wrote:
> > Fold any functionality we still want
> > to support at a user level into the collections API. I mean a core on
> > a machine is really just a single-node collection sans Zookeeper,
> > right?
>
> +1, this is the position I've advocated in the past as well.
>
> -Yonik

On Sunday, March 29, 2015 7:53 PM, Ramkumar R. Aiyengar 
[mailto:andyetitmo...@gmail.com] wrote:
>
> Sounds good to me, except that we have to reconcile some of the objections in 
> the past 
> to collection API additions, like with 
> https://issues.apache.org/jira/SOLR-6278.  In short, 
> collection API provides you a way to operate on collections. Operationally 
> you would often
> also want functionality based off physical location (e.g. I need to 
> decommission this machine, 
> so boot and delete everything on it), core admin appeared to be the place for 
> it.

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: reuseAddress default in Solr jetty.xml

2015-03-02 Thread Reitzel, Charles

My bad.  Too long away from sockets since cleaning up those shutdown handlers.  
Your point is well taken, on the server side the risks of consuming a stray 
echo packet are fairly low (but non-zero, if you’ve ever spent any quality time 
with tcpdump/wireshark).

Still, in a production setting, SIGKILL (aka “kill -9”) should be a last resort 
after more reasonable methods (e.g. SIGINT, SIGTERM, SIGSTOP) have failed.

From: Ramkumar R. Aiyengar [mailto:andyetitmo...@gmail.com]
Sent: Monday, March 02, 2015 7:00 PM
To: dev@lucene.apache.org
Subject: RE: reuseAddress default in Solr jetty.xml


No, reuseAddress doesn't allow you to have two processes, old and new, listen 
to the same port. There's no option which allows you to do that.

Tl;DR This can happen when you have a connection to a server which gets killed 
hard and comes back up immediately

So here's what happens.

When a server normally shuts down, it triggers an active close on all open TCP 
connections it has. That sends a three way msg exchange with the remote 
recipient (FIN, FIN+ACK, ACK) at the end of which the socket is closed and the 
kernel puts it in a TIME_WAIT state for a few minutes in the background 
(depends on the OS, maximum tends to be 4 mins). This is needed to allow for 
reordered older packets to reach the machine just in case. Now typically if the 
server restarts within that period and tries to bind again to the same port, 
the kernel is smart enough to not complain that there is an existing socket in 
TIME_WAIT, because it knows the last sequence number it used for the final 
message in the previous process, and since sequence numbers are always 
increasing, it can reject any messages before that sequence number as a new 
process has now taken the port.

Trouble is with abnormal shutdown. There's no time for a proper goodbye, so the 
kernel marks the socket to respond to remote packets with a rude RST (reset). 
Since there has been no goodbye with the remote end, it also doesn't know the 
last sequence number to delineate if a new process binds to the same port. 
Hence by default it denies binding to the new port for the TIME_WAIT period to 
avoid the off chance a stray packet gets picked up by the new process and 
utterly confuses it. By setting reuseAddress, you are essentially waiving off 
this protection. Note that this possibility of confusion is unbelievably 
miniscule in the first place (both the source and destination host:port should 
be the same and the client port is generally randomly allocated). If the port 
we are talking of is a local port, it's almost impossible -- you have bigger 
problems if a TCP packet is lost or delayed within the same machine!

As to Shawn's point, for Solr's stop port, you essentially need to be trying to 
actively shutdown the server using the stop port, or be within a few minutes of 
such an attempt while the server is killed. Just the server being killed 
without any active connection to it is not going to cause this issue.
Hi Ram,

It appears the problem is that the old solr/jetty process is actually still 
running when the new solr/jetty process is started.   That’s the problem that 
needs fixing.

This is not a rare problem in systems with worker threads dedicated to 
different tasks.   These threads need to wake up in response to the shutdown 
signal/command, as well the normal inputs.

It’s a bug I’ve created and fixed a couple times over the years … :-)I 
wouldn’t know where to start with Solr.  But, as I say, re-using the port is a 
band-aid.  I’ve yet to see a case where it is the best solution.

best,
Charlie

From: Ramkumar R. Aiyengar 
[mailto:andyetitmo...@gmail.com<mailto:andyetitmo...@gmail.com>]
Sent: Saturday, February 28, 2015 8:15 PM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Subject: Re: reuseAddress default in Solr jetty.xml

Hey Charles, see my explanation above on why this is needed. If Solr has to be 
killed, it would generally be immediately restarted. This would normally not 
the case, except when things are potentially misconfigured or if there is a 
bug, but not doing so makes the impact worse..
In any case, turns out really that reuseAddress is true by default for the 
connectors we use, so that really isn't the issue. The issue more specifically 
is that the stop port doesn't do it, so the actual port by itself starts just 
fine on a restart, but the stop port fails to bind -- and there's no way 
currently in Jetty to configure that.
Based on my question in the jetty mailing list, I have now created an issue for 
them..

https://bugs.eclipse.org/bugs/show_bug.cgi?id=461133

On Fri, Feb 27, 2015 at 3:03 PM, Reitzel, Charles 
mailto:charles.reit...@tiaa-cref.org>> wrote:
Disclaimer: I’m not a Solr committer.  But, as a developer, I’ve never seen a 
good case for reusing the listening port.   Better to find and fix the root 
cause on the zombie state (or

RE: reuseAddress default in Solr jetty.xml

2015-03-02 Thread Reitzel, Charles

Hi Ram,

It appears the problem is that the old solr/jetty process is actually still 
running when the new solr/jetty process is started.   That’s the problem that 
needs fixing.

This is not a rare problem in systems with worker threads dedicated to 
different tasks.   These threads need to wake up in response to the shutdown 
signal/command, as well the normal inputs.

It’s a bug I’ve created and fixed a couple times over the years … :-)I 
wouldn’t know where to start with Solr.  But, as I say, re-using the port is a 
band-aid.  I’ve yet to see a case where it is the best solution.

best,
Charlie

From: Ramkumar R. Aiyengar [mailto:andyetitmo...@gmail.com]
Sent: Saturday, February 28, 2015 8:15 PM
To: dev@lucene.apache.org
Subject: Re: reuseAddress default in Solr jetty.xml

Hey Charles, see my explanation above on why this is needed. If Solr has to be 
killed, it would generally be immediately restarted. This would normally not 
the case, except when things are potentially misconfigured or if there is a 
bug, but not doing so makes the impact worse..
In any case, turns out really that reuseAddress is true by default for the 
connectors we use, so that really isn't the issue. The issue more specifically 
is that the stop port doesn't do it, so the actual port by itself starts just 
fine on a restart, but the stop port fails to bind -- and there's no way 
currently in Jetty to configure that.
Based on my question in the jetty mailing list, I have now created an issue for 
them..

https://bugs.eclipse.org/bugs/show_bug.cgi?id=461133

On Fri, Feb 27, 2015 at 3:03 PM, Reitzel, Charles 
mailto:charles.reit...@tiaa-cref.org>> wrote:
Disclaimer: I’m not a Solr committer.  But, as a developer, I’ve never seen a 
good case for reusing the listening port.   Better to find and fix the root 
cause on the zombie state (or just slow shutdown, sometimes) and release the 
port.

From: Mark Miller [mailto:markrmil...@gmail.com<mailto:markrmil...@gmail.com>]
Sent: Thursday, February 26, 2015 5:28 PM
To: dev@lucene.apache.org<mailto:dev@lucene.apache.org>
Subject: Re: reuseAddress default in Solr jetty.xml

+1

- Mark

On Thu, Feb 26, 2015 at 1:54 PM Ramkumar R. Aiyengar 
mailto:andyetitmo...@gmail.com>> wrote:
The jetty.xml we currently ship by default doesn't set reuseAddress=true. If 
you are having a bad GC day with things going OOM and resulting in Solr not 
even being able to shutdown cleanly (or the oom_solr.sh script killing it), 
whatever external service management mechanism you have is probably going to 
try respawn it and fail with the default config because the ports will be in 
TIME_WAIT. I guess there's the usual disclaimer with reuseAddress causing stray 
packets to reach the restarted server, but sounds like at least the default 
should be true..

I can raise a JIRA, but just wanted to check if anyone has any opinions either 
way..


*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*



--
Not sent from my iPhone or my Blackberry or anyone else's

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: reuseAddress default in Solr jetty.xml

2015-02-27 Thread Reitzel, Charles

Disclaimer: I’m not a Solr committer.  But, as a developer, I’ve never seen a 
good case for reusing the listening port.   Better to find and fix the root 
cause on the zombie state (or just slow shutdown, sometimes) and release the 
port.

From: Mark Miller [mailto:markrmil...@gmail.com]
Sent: Thursday, February 26, 2015 5:28 PM
To: dev@lucene.apache.org
Subject: Re: reuseAddress default in Solr jetty.xml

+1

- Mark

On Thu, Feb 26, 2015 at 1:54 PM Ramkumar R. Aiyengar 
mailto:andyetitmo...@gmail.com>> wrote:
The jetty.xml we currently ship by default doesn't set reuseAddress=true. If 
you are having a bad GC day with things going OOM and resulting in Solr not 
even being able to shutdown cleanly (or the oom_solr.sh script killing it), 
whatever external service management mechanism you have is probably going to 
try respawn it and fail with the default config because the ports will be in 
TIME_WAIT. I guess there's the usual disclaimer with reuseAddress causing stray 
packets to reach the restarted server, but sounds like at least the default 
should be true..

I can raise a JIRA, but just wanted to check if anyone has any opinions either 
way..

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

SOLR-7144 - Suggester dictionary does not set origFreq

2015-02-23 Thread Reitzel, Charles

Hi All,

With apologies for jumping the gun and submitting the issue prior to this email 
...

I have created SOLR-7144 - "Suggester dictionary does not set origFreq" here:
https://issues.apache.org/jira/browse/SOLR-7144

The short version is that Suggester never sets the original frequency.  Since 
we're trying to use Suggester as a spellcheck implementation (per the 
Spellcheck docs until very recently).   For a variety of reasons, we want to 
use the extended spellcheck response to make search suggestions.   Without 
term-by-term frequency information, our code cannot distinguish which are good 
terms to use in a subsequent query and which are not.   This patch fixes that 
problem.

Of course, if there is a better way to handle the issue, avoiding a code change 
is preferable.  I did check this out on the user list and did not get a 
solution there.

Any thoughts on this patch?   Is this a good approach?

It's only a few lines of product changes.  Includes a couple unit tests.  Also, 
"ant test" still runs fine at the top level.

Any feedback appreciated.
Thanks,
Charlie Reitzel

*
This e-mail may contain confidential or privileged information.
If you are not the intended recipient, please notify the sender immediately and 
then delete it.

TIAA-CREF
*

RE: Coverity scan results of Lucene

RE: Version field as DV

RE: Version field as DV

RE: [DISCUSS] Change Query API to make queries immutable in 6.0

RE: [DISCUSS] Change Query API to make queries immutable in 6.0

RE: Make Solr's core admin API internal-only/implementation detail?

RE: reuseAddress default in Solr jetty.xml

RE: reuseAddress default in Solr jetty.xml

RE: reuseAddress default in Solr jetty.xml

SOLR-7144 - Suggester dictionary does not set origFreq

10 matches

Site Navigation

Mail list logo

Footer information