Re: Rule-based replication or sharing

2018-10-01 Thread Varun Thacker
Hi Chuck,

I was chatting with Noble offline and he suggested we could use this
starting 7.5

{replica:'#EQUAL', shard : ''#EACH' , sysprop.az :'#EACH'}

where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1 )

It's documented
https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html

Let me know if this works for you.

( Looks like my previous email had some formatting issues )

On Mon, Oct 1, 2018 at 10:17 PM Varun Thacker  wrote:

> Hi Chuck,
>
> I was chatting with Noble offline and he suggested we could use this
> starting 7.5
>
> *{replica:'#EQUAL', shard : ''#EACH' , sysprop.az 
> :'#EACH'}*
>
> where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1
> )
>
> It's documented
> https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html
>
> Let me know if this works for you.
>
>
> On Wed, Sep 26, 2018 at 9:11 AM Chuck Reynolds 
> wrote:
>
>> Noble,
>>
>> Are you saying in the latest version of Solr that this would work with
>> three instances of Solr running on each server?
>>
>> If so how?
>>
>> Thanks again for your help.
>>
>> On 9/26/18, 9:11 AM, "Noble Paul"  wrote:
>>
>> I'm not sure if it is pertinent to ask you to move to the latest Solr
>> which has the policy based replica placement. Unfortunately, I don't
>> have any other solution I can think of
>>
>> On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <
>> creyno...@ancestry.com> wrote:
>> >
>> > Noble,
>> >
>> > So other than manually moving replicas of shard do you have a
>> suggestion of how one might accomplish the multiple availability zone with
>> multiple instances of Solr running on each server?
>> >
>> > Thanks
>> >
>> > On 9/26/18, 12:56 AM, "Noble Paul"  wrote:
>> >
>> > The rules suggested by Steve is correct. I tested it locally
>> and I got
>> > the same errors. That means a bug exists probably.
>> > All the new development efforts are invested in the new policy
>> feature
>> > .
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA=
>> >
>> > The old one is going to be deprecated pretty soon. So, I'm not
>> sure if
>> > we should be investing our resources here
>> > On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <
>> creyno...@ancestry.com> wrote:
>> > >
>> > > Shawn,
>> > >
>> > > Thanks for the info. We’ve been running this way for the past
>> 4 years.
>> > >
>> > > We were running on very large hardware, 20 physical cores
>> with 256 gigs of ram with 3 billion document and it was the only way we
>> could take advantage of the hardware.
>> > >
>> > > Running 1 Solr instance per server never gave us the
>> throughput we needed.
>> > >
>> > > So I somewhat disagree with your statement because our test
>> proved otherwise.
>> > >
>> > > Thanks for the info.
>> > >
>> > > Sent from my iPhone
>> > >
>> > > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <
>> apa...@elyograg.org> wrote:
>> > > >
>> > > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
>> > > >> Each server has three instances of Solr running on it so
>> every instance on the server has to be in the same replica set.
>> > > >
>> > > > You should be running exactly one Solr instance per
>> server.  When evaluating rules for replica placement, SolrCloud will treat
>> each instance as completely separate from all others, including others on
>> the same machine.  It will not know that those three instances are on the
>> same machine.  One Solr instance can handle MANY indexes.
>> > > >
>> > > > There is only ONE situation where it makes sense to run
>> multiple instances per machine, and in my strong opinion, even that
>> situation should not be handled with multiple instances. That situation is
>> this:  When running one instance would require a REALLY large heap.
>> Garbage collection pauses can become extreme in that situation, so some
>> people will run multiple instances that each have a smaller heap, and
>> divide their indexes between them. In my opinion, when you have enough
>> index data on an instance that it requires a huge heap, instead of running
>> two or more instances on one server, it's time to add more servers.
>> > > >
>> > > > Thanks,
>> > > > Shawn
>> > > >
>> >
>> >
>> >
>> > --
>> > -
>> > Noble Paul
>> >
>> >
>>
>>
>> --
>> 

Re: Rule-based replication or sharing

2018-10-01 Thread Varun Thacker
Hi Chuck,

I was chatting with Noble offline and he suggested we could use this
starting 7.5

*{replica:'#EQUAL', shard : ''#EACH' , sysprop.az 
:'#EACH'}*

where "az" is a sysprop while starting each solr instance ( -Daz=us-east-1 )

It's documented
https://lucene.apache.org/solr/guide/7_5/solrcloud-autoscaling-policy-preferences.html

Let me know if this works for you.


On Wed, Sep 26, 2018 at 9:11 AM Chuck Reynolds 
wrote:

> Noble,
>
> Are you saying in the latest version of Solr that this would work with
> three instances of Solr running on each server?
>
> If so how?
>
> Thanks again for your help.
>
> On 9/26/18, 9:11 AM, "Noble Paul"  wrote:
>
> I'm not sure if it is pertinent to ask you to move to the latest Solr
> which has the policy based replica placement. Unfortunately, I don't
> have any other solution I can think of
>
> On Wed, Sep 26, 2018 at 11:46 PM Chuck Reynolds <
> creyno...@ancestry.com> wrote:
> >
> > Noble,
> >
> > So other than manually moving replicas of shard do you have a
> suggestion of how one might accomplish the multiple availability zone with
> multiple instances of Solr running on each server?
> >
> > Thanks
> >
> > On 9/26/18, 12:56 AM, "Noble Paul"  wrote:
> >
> > The rules suggested by Steve is correct. I tested it locally and
> I got
> > the same errors. That means a bug exists probably.
> > All the new development efforts are invested in the new policy
> feature
> > .
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_7-5F4_solrcloud-2Dautoscaling-2Dpolicy-2Dpreferences.html=DwIFaQ=kKqjBR9KKWaWpMhASkPbOg=J-2s3b-3-OTA0o6bGDhJXAQlB5Y3s4rOUxlh_78DJl0=yXVYNcm-dqN_lucLyuQI38EZfK4f8l4828Ty53e4plM=D1vfu3bOu_hOGAU2CIKPwqBTPkYiBeK1kOUoFnQZpKA=
> >
> > The old one is going to be deprecated pretty soon. So, I'm not
> sure if
> > we should be investing our resources here
> > On Wed, Sep 26, 2018 at 1:23 PM Chuck Reynolds <
> creyno...@ancestry.com> wrote:
> > >
> > > Shawn,
> > >
> > > Thanks for the info. We’ve been running this way for the past
> 4 years.
> > >
> > > We were running on very large hardware, 20 physical cores with
> 256 gigs of ram with 3 billion document and it was the only way we could
> take advantage of the hardware.
> > >
> > > Running 1 Solr instance per server never gave us the
> throughput we needed.
> > >
> > > So I somewhat disagree with your statement because our test
> proved otherwise.
> > >
> > > Thanks for the info.
> > >
> > > Sent from my iPhone
> > >
> > > > On Sep 25, 2018, at 4:19 PM, Shawn Heisey <
> apa...@elyograg.org> wrote:
> > > >
> > > >> On 9/25/2018 9:21 AM, Chuck Reynolds wrote:
> > > >> Each server has three instances of Solr running on it so
> every instance on the server has to be in the same replica set.
> > > >
> > > > You should be running exactly one Solr instance per server.
> When evaluating rules for replica placement, SolrCloud will treat each
> instance as completely separate from all others, including others on the
> same machine.  It will not know that those three instances are on the same
> machine.  One Solr instance can handle MANY indexes.
> > > >
> > > > There is only ONE situation where it makes sense to run
> multiple instances per machine, and in my strong opinion, even that
> situation should not be handled with multiple instances. That situation is
> this:  When running one instance would require a REALLY large heap.
> Garbage collection pauses can become extreme in that situation, so some
> people will run multiple instances that each have a smaller heap, and
> divide their indexes between them. In my opinion, when you have enough
> index data on an instance that it requires a huge heap, instead of running
> two or more instances on one server, it's time to add more servers.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> >
> >
> >
> > --
> > -
> > Noble Paul
> >
> >
>
>
> --
> -
> Noble Paul
>
>
>


Re: Solr edismax multi-word match issue

2018-10-01 Thread Zheng Lin Edwin Yeo
Sorry, couldn't quite get your issue. Are you trying to search for "viet
nam", and you are expecting to find a match for "Vietnam" in your index but
you could not find it?
Also, which version of Solr are you using?

Regards,
Edwin

On Thu, 20 Sep 2018 at 15:09, Simon Bloch  wrote:

> Hi,
>
> I'm having issues getting an edismax query to match a certain document via
> a particular field ("name_c"). I believe this issue is related to
> whitespace removal and field/edismax configuration.
>
> *Search term:* "viet nam"
> *Document name:* "Vietnam"
>
> *Field Type: *
>   
>omitNorms="true"
>  positionIncrementGap="0" omitTermFreqAndPositions="true">
> 
>  pattern="([^a-z0-9])" replacement=""/>
>   
>replacement="" replace="all" />
>preserveOriginal="false"/>
>   
> 
>   
>
> *Field: *
>  indexed="true" required="false" stored="false"/>
>
> *Raw Query (from Solr Admin Console):*
> q=viet nam&
> defType=edismax&
> sow=false&
> qf=name^1.0 name_c^10.0 ancestor_name^1.25&
> sort=score desc, name_c asc&
> wt=json=true
>
> *Issue Explanation:*
> When I execute the query in my local admin console (with debugQuery
> enabled) I don't see a match or score for "Vietnam" for the field "name_c".
>
>- I have this field boosted extra high so any match will take
> precedence.
>- I'm confident that this isn't being caused by any other fields I have
>more not listed but I removed for clarity
>- I believe this is caused by whitespace interpretation
>- Interestingly, the space is removed for the "name_c" field in the
>parsedquery:
>
> 
> "parsedquery":"(+DisjunctionMaxQuery(((name_c:vietnam)^10.0 |
>   (ancestor_name:viet nam)^1.25 |
>   (name:viet name_ps:nam)^1.0)"
>
> "parsedquery_toString":"+((name_c:vietnam)^10.0 |
>   (ancestor_name:viet nam)^1.25 |
>   (name:viet nam)^1.0)
> 
>
> I would really appreciate any support or debugging advice in this matter!
> -Simon Bloch
>


Re: what's in cursorMark

2018-10-01 Thread Li, Yi
Hi,

Did you just do base84 decoding?

Thanks,
Yi

On 10/1/18, 9:41 AM, "Vincenzo D'Amore"  wrote:

Hi Yi,

have you tried to decode the string?

AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3

seems to be only:

? favoritePlace/f85333c1-c444-4cfb-afd7-37281a07b0f7



On Mon, Oct 1, 2018 at 3:37 PM Li, Yi  wrote:

> Hi,
>
> cursorMark appears as something like
> AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3
>
> and the document says it is “Base64 encoded serialized representation of
> the sort values encapsulated by this object”
>
> I like to know if I can decode and what content I will see in there.
>
> For example, If there is an object as a json:
> {
> “id”:”123”,
> “name”:”objectname”,
> “secret”:”my secret”
> }
> if I search id:123, and only that object returned with a cursorMark, will
> I be able to decode the cursorMark and get that secret?
>
> Thanks,
> Yi
>


-- 
Vincenzo D'Amore




Re: autoAddReplicas – what am I missing?

2018-10-01 Thread Shawn Heisey

On 10/1/2018 11:49 AM, Michael B. Klein wrote:

Then I try my experiment.

1) I bring up a 4th node (.4) and wait for it to join the cluster. I now
see .1, .2, .3, and .4 in live_nodes, and .1, .2, and .3 on the graph,
still as expected.
2) I kill .2. Predictably, it falls off the list of live_nodes and turns
gray on the cloud diagram.

Expected: 3 fully green collections replicated on .1, .3, and .4, and .2
dropped from the cloud.
Actual: 3 collections replicated on .1 (green), .2 (gray), and .3 (green),
and .4 nowhere to be seen (except in live_nodes).


In older versions, autoAddReplicas only worked when Solr was using HDFS 
for index storage.  This includes the 6.6.5 version you're running.


In the latest versions, it also works with standard indexes.  This 
capability is part of autoscaling.


Thanks,
Shawn



autoAddReplicas – what am I missing?

2018-10-01 Thread Michael B. Klein
Hi,

I have a SolrCloud 6.6.5 cluster with three nodes (.1, .2, .3). It has 3
collections, all of which are configured with replicationFactor=3 and
autoAddReplicas=true. When I go to the Cloud/Graph page of the admin
interface, I see what I expect – 3 collections, 3 nodes each, all green.

Then I try my experiment.

1) I bring up a 4th node (.4) and wait for it to join the cluster. I now
see .1, .2, .3, and .4 in live_nodes, and .1, .2, and .3 on the graph,
still as expected.
2) I kill .2. Predictably, it falls off the list of live_nodes and turns
gray on the cloud diagram.

Expected: 3 fully green collections replicated on .1, .3, and .4, and .2
dropped from the cloud.
Actual: 3 collections replicated on .1 (green), .2 (gray), and .3 (green),
and .4 nowhere to be seen (except in live_nodes).

I don't have any special rules or snitches or anything configured.

What gives? What else should I be looking at?

Thanks,
Michael


Re: what's in cursorMark

2018-10-01 Thread Shawn Heisey

On 10/1/2018 7:36 AM, Li, Yi wrote:

cursorMark appears as something like 
AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3

and the document says it is “Base64 encoded serialized representation of the 
sort values encapsulated by this object”

I like to know if I can decode and what content I will see in there.


I did a test with the techproducts example, with a sort of "id asc".  
The "nextCursorMark" value was AoEjR0JQ which is base64 encoding for a 
value that includes the text "GBP" which was the value in the uniqueKey 
field (id) for the last entry on the page.


Then I tried it again with a different sort -- "cat desc,id asc" ... and 
the decoded nextCursorMark value included the values for *both* sort 
fields found in the last document on the page.


The value(s) in nextCursorMark are used to build range filter(s) in the 
query.  The rest of the query can likely be satisfied from Solr's 
caches.  This is how cursorMark gets good performance.


Vincenzo gave you the decoded value of your nextCursorMark string.  You 
can get this yourself by running it through a base64 decoder.


Thanks,
Shawn



Re: what's in cursorMark

2018-10-01 Thread Stefan Matheis
The question might be .. what for?


-Stefan

On Mon, Oct 1, 2018, 3:36 PM Li, Yi  wrote:

> Hi,
>
> cursorMark appears as something like
> AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3
>
> and the document says it is “Base64 encoded serialized representation of
> the sort values encapsulated by this object”
>
> I like to know if I can decode and what content I will see in there.
>
> For example, If there is an object as a json:
> {
> “id”:”123”,
> “name”:”objectname”,
> “secret”:”my secret”
> }
> if I search id:123, and only that object returned with a cursorMark, will
> I be able to decode the cursorMark and get that secret?
>
> Thanks,
> Yi
>


Re: Realtime get not always returning existing data

2018-10-01 Thread Erick Erickson
Thanks. I'll be away for the rest of the week, so won't be able to try
anything more
On Mon, Oct 1, 2018 at 5:10 AM Chris Ulicny  wrote:
>
> In our case, we are heavily indexing in the collection while the /get
> requests are happening which is what we assumed was causing this very rare
> behavior. However, we have experienced the problem for a collection where
> the following happens in sequence with minutes in between them.
>
> 1. Document id=1 is indexed
> 2. Document successfully retrieved with /get?id=1
> 3. Document failed to be retrieved with /get?id=1
> 4. Document successfully retrieved with /get?id=1
>
> We've haven't looked at the issue in a while, so I don't have the exact
> timing of that sequence on hand right now. I'll try to find an actual
> example, although I'm relatively certain it was multiple minutes in between
> each of those requests. However our autocommit (and soft commit) times are
> 60s for both collections.
>
> I think the following two are probably the biggest differences for our
> setup, besides the version difference (v6.3.0):
>
> > index to this collection, perhaps not at a high rate
> > separate the machines running solr from the one doing any querying or
> indexing
>
> The clients are on 3 hosts separate from the solr instances. The total
> number of threads that are making updates and making /get requests is
> around 120-150. About 40-50 per host. Each of our two collections gets an
> average of 500 requests per second constantly for ~5 minutes, and then the
> number slowly tapers off to essentially 0 after ~15 minutes.
>
> Every thread attempts to make the same series of requests.
>
> -- Update with "_version_=-1". If successful, no other requests are made.
> -- On 409 Conflict failure, it makes a /get request for the id
> -- On doc:null failure, the client handles the error and moves on
>
> Combining this with the previous series of /get requests, we end up with
> situations where an update fails as expected, but the subsequent /get
> request fails to retrieve the existing document:
>
> 1. Thread 1 updates id=1 successfully
> 2. Thread 2 tries to update id=1, fails (409)
> 3. Thread 2 tries to get id=1 succeeds.
>
> ...Minutes later...
>
> 4. Thread 3 tries to update id=1, fails (409)
> 5. Thread 3 tries to get id=1, fails (doc:null)
>
> ...Minutes later...
>
> 6. Thread 4 tries to update id=1, fails (409)
> 7. Thread 4 tries to get id=1 succeeds.
>
> As Steven mentioned, it happens very, very rarely. We tried to recreate it
> in a more controlled environment, but ran into the same issue that you are,
> Erick. Every simplified situation we ran produced no problems. Since it's
> not a large issue for us and happens very rarely, we stopped trying to
> recreate it.
>
>
> On Sun, Sep 30, 2018 at 9:16 PM Erick Erickson 
> wrote:
>
> > 57 million queries later, with constant indexing going on and 9 dummy
> > collections in the mix and the main collection I'm querying having 2
> > shards, 2 replicas each, I have no errors.
> >
> > So unless the code doesn't look like it exercises any similar path,
> > I'm not sure what more I can test. "It works on my machine" ;)
> >
> > Here's my querying code, does it look like it what you're seeing?
> >
> >   while (Main.allStop.get() == false) {
> > try (SolrClient client = new HttpSolrClient.Builder()
> > //("http://my-solr-server:8981/solr/eoe_shard1_replica_n4;)) {
> > .withBaseSolrUrl("http://localhost:8981/solr/eoe;).build()) {
> >
> >   //SolrQuery query = new SolrQuery();
> >   String lower = Integer.toString(rand.nextInt(1_000_000));
> >   SolrDocument rsp = client.getById(lower);
> >   if (rsp == null) {
> > System.out.println("Got a null response!");
> > Main.allStop.set(true);
> >   }
> >
> >   rsp = client.getById(lower);
> >
> >   if (rsp.get("id").equals(lower) == false) {
> > System.out.println("Got an invalid response, looking for "
> > + lower + " got: " + rsp.get("id"));
> > Main.allStop.set(true);
> >   }
> >   long queries = Main.eoeCounter.incrementAndGet();
> >   if ((queries % 100_000) == 0) {
> > long seconds = (System.currentTimeMillis() - Main.start) /
> > 1000;
> > System.out.println("Query count: " +
> > numFormatter.format(queries) + ", rate is " +
> > numFormatter.format(queries / seconds) + " QPS");
> >   }
> > } catch (Exception cle) {
> >   cle.printStackTrace();
> >   Main.allStop.set(true);
> > }
> >   }
> >   }On Sat, Sep 29, 2018 at 12:46 PM Erick Erickson
> >  wrote:
> > >
> > > Steve:
> > >
> > > bq.  Basically, one core had data in it that should belong to another
> > > core. Here's my question about this: Is it possible that two request to
> > the
> > > /get API coming in at the same time would get confused and either both
> > get
> > > the same result or result get inverted?
> 

Re: Creating CJK bigram tokens with ClassicTokenizer

2018-10-01 Thread Shawn Heisey

On 9/30/2018 10:14 PM, Yasufumi Mizoguchi wrote:

I am looking for the way to create CJK bigram tokens with ClassicTokenizer.
I tried this by using CJKBigramFilter, but it only supports for
StandardTokenizer...


CJKBigramFilter shouldn't care what tokenizer you're using.  It should 
work with any tokenizer.  What problem are you seeing that you're trying 
to solve?  What version of Solr, what configuration, and what does it do 
that you're not expecting, and what do you want it to do?


I don't have access to the systems where I was using that filter, but if 
I recall correctly, I was using the whitespace tokenizer.


Thanks,
Shawn



Re: what's in cursorMark

2018-10-01 Thread Vincenzo D'Amore
Hi Yi,

have you tried to decode the string?

AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3

seems to be only:

? favoritePlace/f85333c1-c444-4cfb-afd7-37281a07b0f7



On Mon, Oct 1, 2018 at 3:37 PM Li, Yi  wrote:

> Hi,
>
> cursorMark appears as something like
> AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3
>
> and the document says it is “Base64 encoded serialized representation of
> the sort values encapsulated by this object”
>
> I like to know if I can decode and what content I will see in there.
>
> For example, If there is an object as a json:
> {
> “id”:”123”,
> “name”:”objectname”,
> “secret”:”my secret”
> }
> if I search id:123, and only that object returned with a cursorMark, will
> I be able to decode the cursorMark and get that secret?
>
> Thanks,
> Yi
>


-- 
Vincenzo D'Amore


what's in cursorMark

2018-10-01 Thread Li, Yi
Hi,

cursorMark appears as something like 
AoE/E2Zhdm9yaXRlUGxhY2UvZjg1MzMzYzEtYzQ0NC00Y2ZiLWFmZDctMzcyODFhMDdiMGY3

and the document says it is “Base64 encoded serialized representation of the 
sort values encapsulated by this object”

I like to know if I can decode and what content I will see in there.

For example, If there is an object as a json:
{
“id”:”123”,
“name”:”objectname”,
“secret”:”my secret”
}
if I search id:123, and only that object returned with a cursorMark, will I be 
able to decode the cursorMark and get that secret?

Thanks,
Yi


SolrJ does not use HTTP proxy anymore in 7.5.0 after update from 6.6.5

2018-10-01 Thread Andreas Hubold

Hi,

SolrJ 6.6.5 used org.apache.http.impl.client.SystemDefaultHttpClient 
under the hood, which took system properties for HTTP proxy config into 
account (http.proxyHost and http.proxyPort).


The deprecated SystemDefaultHttpClient class was replaced as part of 
SOLR-4509. And with Solr 7.5.0 I'm now unable to use an HTTP proxy with 
SolrJ at all (not using Solr Cloud here). SolrJ 7.5 uses 
org.apache.http.impl.client.HttpClientBuilder#create to create an 
HttpClient, but it does not call #useSystemProperties on the builder. 
Because of that, the proxy configuration from system properties is ignored.


Is there some other way to configure an HTTP proxy, e.g. with 
HttpSolrClient.Builder? I don't want to create an Apache HttpClient 
instance myself but the builder from Solrj (HttpSolrClient.Builder).


Thanks in advance,
Andreas



Re: matches missing highlight information

2018-10-01 Thread Kudrettin Güleryüz
query "g12312" matches 7 documents. Fields requested are returned for all 7
documents that matches the query mentioned. For three of the matching
documents highlight snippet is not generated. Can you explain why
highlighting snippets could be absent for some of the documents?

".../sources/eproc.c":{},
".../sources/cro.c":{},
".../sources/cmd.c":{}}}


Thank you,
Kudret



On Sun, Sep 30, 2018 at 10:10 PM Zheng Lin Edwin Yeo 
wrote:

> Hi Kudret,
>
> Do you mean there are 7 documents that matches the query term, but the
> result only return 4 of them?
> Or is it that there are 4 documents that matches the query term, but there
> are 7 occurrences of the query term in these 4 documents?
>
> Regards,
> Edwin
>
> On Fri, 28 Sep 2018 at 22:47, Kudrettin Güleryüz 
> wrote:
>
> > Hi Edwin,
> >
> > I do not have any modifications in solrconfig.xml for highlighting. Here
> is
> > the query:
> >
> >
> http://test-51:8983/solr/mycollection/select?hl.fl=bodync=%25highlightpost%25=%25highlightpre%25=on=bodync:g12312
> >
> > Kudret
> >
> >
> > On Fri, Sep 28, 2018 at 2:09 AM Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > wrote:
> >
> > > Hi Kudret,
> > >
> > > What is your configuration for your /highlight requestHandler in
> > > solrconfig.xml?
> > > And also the query that you used when you get your above output?
> > >
> > > Regards,
> > > Edwin
> > >
> > > On Fri, 28 Sep 2018 at 07:33, Kudrettin Güleryüz 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > For some queries, response object returns matches without any
> highlight
> > > > information. Solr node doesn't report any errors in Solr log.
> > > > query term is g12312 number of matches is 7 only 4 of them gets
> > highlight
> > > > snippets. Any suggestions?
> > > >
> > > >  "highlighting":{
> > > > ".../sources/test.cpp":{
> > > >   "bodync":[" \n\n**/\n\n\n/* Repro definition
> problem
> > > > */\n/* %highlightpre%g12312%highlightpost% example of how to"]},
> > > > ".../sources/database.cpp":{
> > > >   "bodync":[" 12/01/2017 %highlightpre%g12312%highlightpost%\n
> > > >  // was done in here because"]},
> > > > ".../sources/predef.c":{
> > > >   "bodync":[" create_definitions(a_boolean inst)\n/*
> > > > %highlightpre%g12312%highlightpost% Replace _definitions.h by
> > > > built-in"]},
> > > > ".../sources/init.c":{
> > > >   "bodync":[") {\ncreate_descs();\n  }\n  /*T
> > > > %highlightpre%g12312%highlightpost% 13/01/2017. Replace\n
> > > > prev_definitions.h by G"]},
> > > > ".../sources/eproc.c":{},
> > > > ".../sources/cro.c":{},
> > > > ".../sources/cmd.c":{}}}
> > > >
> > > > 7.3.1 here.
> > > >
> > > > Thank you,
> > > > Kudret
> > > >
> > >
> >
>


Re: Realtime get not always returning existing data

2018-10-01 Thread Chris Ulicny
In our case, we are heavily indexing in the collection while the /get
requests are happening which is what we assumed was causing this very rare
behavior. However, we have experienced the problem for a collection where
the following happens in sequence with minutes in between them.

1. Document id=1 is indexed
2. Document successfully retrieved with /get?id=1
3. Document failed to be retrieved with /get?id=1
4. Document successfully retrieved with /get?id=1

We've haven't looked at the issue in a while, so I don't have the exact
timing of that sequence on hand right now. I'll try to find an actual
example, although I'm relatively certain it was multiple minutes in between
each of those requests. However our autocommit (and soft commit) times are
60s for both collections.

I think the following two are probably the biggest differences for our
setup, besides the version difference (v6.3.0):

> index to this collection, perhaps not at a high rate
> separate the machines running solr from the one doing any querying or
indexing

The clients are on 3 hosts separate from the solr instances. The total
number of threads that are making updates and making /get requests is
around 120-150. About 40-50 per host. Each of our two collections gets an
average of 500 requests per second constantly for ~5 minutes, and then the
number slowly tapers off to essentially 0 after ~15 minutes.

Every thread attempts to make the same series of requests.

-- Update with "_version_=-1". If successful, no other requests are made.
-- On 409 Conflict failure, it makes a /get request for the id
-- On doc:null failure, the client handles the error and moves on

Combining this with the previous series of /get requests, we end up with
situations where an update fails as expected, but the subsequent /get
request fails to retrieve the existing document:

1. Thread 1 updates id=1 successfully
2. Thread 2 tries to update id=1, fails (409)
3. Thread 2 tries to get id=1 succeeds.

...Minutes later...

4. Thread 3 tries to update id=1, fails (409)
5. Thread 3 tries to get id=1, fails (doc:null)

...Minutes later...

6. Thread 4 tries to update id=1, fails (409)
7. Thread 4 tries to get id=1 succeeds.

As Steven mentioned, it happens very, very rarely. We tried to recreate it
in a more controlled environment, but ran into the same issue that you are,
Erick. Every simplified situation we ran produced no problems. Since it's
not a large issue for us and happens very rarely, we stopped trying to
recreate it.


On Sun, Sep 30, 2018 at 9:16 PM Erick Erickson 
wrote:

> 57 million queries later, with constant indexing going on and 9 dummy
> collections in the mix and the main collection I'm querying having 2
> shards, 2 replicas each, I have no errors.
>
> So unless the code doesn't look like it exercises any similar path,
> I'm not sure what more I can test. "It works on my machine" ;)
>
> Here's my querying code, does it look like it what you're seeing?
>
>   while (Main.allStop.get() == false) {
> try (SolrClient client = new HttpSolrClient.Builder()
> //("http://my-solr-server:8981/solr/eoe_shard1_replica_n4;)) {
> .withBaseSolrUrl("http://localhost:8981/solr/eoe;).build()) {
>
>   //SolrQuery query = new SolrQuery();
>   String lower = Integer.toString(rand.nextInt(1_000_000));
>   SolrDocument rsp = client.getById(lower);
>   if (rsp == null) {
> System.out.println("Got a null response!");
> Main.allStop.set(true);
>   }
>
>   rsp = client.getById(lower);
>
>   if (rsp.get("id").equals(lower) == false) {
> System.out.println("Got an invalid response, looking for "
> + lower + " got: " + rsp.get("id"));
> Main.allStop.set(true);
>   }
>   long queries = Main.eoeCounter.incrementAndGet();
>   if ((queries % 100_000) == 0) {
> long seconds = (System.currentTimeMillis() - Main.start) /
> 1000;
> System.out.println("Query count: " +
> numFormatter.format(queries) + ", rate is " +
> numFormatter.format(queries / seconds) + " QPS");
>   }
> } catch (Exception cle) {
>   cle.printStackTrace();
>   Main.allStop.set(true);
> }
>   }
>   }On Sat, Sep 29, 2018 at 12:46 PM Erick Erickson
>  wrote:
> >
> > Steve:
> >
> > bq.  Basically, one core had data in it that should belong to another
> > core. Here's my question about this: Is it possible that two request to
> the
> > /get API coming in at the same time would get confused and either both
> get
> > the same result or result get inverted?
> >
> > Well, that shouldn't be happening, these are all supposed to be
> thread-safe
> > calls All things are possible of course ;)
> >
> > If two replicas of the same shard have different documents, that could
> account
> > for what you're seeing, meanwhile begging the question of why that is
> the case
> > since it should never be true for a quiescent