The Expected Suggestions Is Phrase Instead Of Whole Contents From Field

2018-02-02 Thread stiven.z...@swisoft.cn
Hi Dear Solr,

I config the Solr to try to get search suggestion but seem not the 
suggestion that I want.

the follow search component and request handler:


  
mySuggester
FuzzyLookupFactory
DocumentDictionaryFactory
title// the contents is from "title"
string
false
  



  
true
10
mySuggester
  
  
suggest
  


http://localhost:8983/solr/articles/suggest?hl=on&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=hello&suggest=true
 

the result is :
{
  "responseHeader":{
"status":0,
"QTime":14},
  "command":"build",
  "suggest":{"mySuggester":{
  "hello":{
"numFound":3,
"suggestions":[{
"term":"hello every one, im rebot",// the whole contents from 
field "title"
"weight":0,
"payload":""},
  {
"term":"hello the autosuggest feature to satisfy two main 
requirements",   // the whole contents from field "title"
"weight":0,
"payload":""},
  {
"term":"hello world, Im program",// the whole contents from 
field "title"
"weight":0,
"payload":""}]

BUT the expected result should be phrase instead of the whole contents from the 
field "title" :
 
"suggestions":[{
"term":"hello every one",  // expected result should be phrase like 
this
"weight":0,
"payload":""},
  {
"term":"hello autosuggest",  // expected result should be phrase 
like this
"weight":0,
"payload":""},
  {
"term":"hello world",  // expected result should be phrase like this
"weight":0,
"payload":""}]

So what is wrong? Did I config anything wrong?

thanks a lot for the help



Stiven.Zhou
stiven.z...@swisoft.cn 


Re: facet.method=uif not working in solr cloud?

2018-02-02 Thread Wei
I tried to debug a bit and see that when executing on a cloud solr server,
although I put facet.field=color&q=*:*&facet.method=uif&facet.mincount=1 in
the request url, at the point it reaches SimpleFacet inside req.params it
somehow has been rewritten to  f.color.facet.mincount=0, no wonder the
method chosen become FC. So one myth solved; but the new myth is why the
facet.mincount is override to 0 in solr req?

Cheers,
Wei

On Thu, Feb 1, 2018 at 2:01 AM, Alessandro Benedetti 
wrote:

> " Looks like when using the json facet api,
> SimpleFacets is not used, replaced by FacetFieldPorcessorByArrayUIF "
>
> That is expected, I remember Yonik to stress the fact that it is a
> completely different approach to faceting ( and different components and
> classes are involved).
>
> But your first case, it may be worth an investigation.
> If you have the tools and you are used to it I would encourage you to
> reproduce the issue and remote debug it from a Solr server.
> Putting a breakpoint in the Simple Facets method you should be able to
> solve
> the mystery ( a bug maybe ? I am very curious about it. )
>
> Cheers
>
>
>
>
> -
> ---
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: 9000+ CLOSE_WAIT connections in solr v6.2.2 causing it to "die"

2018-02-02 Thread Arcadius Ahouansou
I have seen a lot of CLOSE_WAIT in the past.
In many cases, it was that the client application was not releasing/closing
or pooling connections properly.

I would suggest you double check the client code first.

Arcadius.

On 2 February 2018 at 23:52, mmb1234  wrote:

> > You said that you're running Solr 6.2.2, but there is no 6.2.2 version.
> > but the JVM argument list includes "-Xmx512m" which is a 512MB heap
>
> My typos. They're 6.6.2 and -Xmx30g respectively.
>
> > many open connections causes is a large number of open file handles,
>
> solr [ /opt/solr/server/logs ]$ sysctl -a | grep vm.max_map_count
> vm.max_map_count = 262144
>
> The only thing I notice right before solr shutdown messages in solr.log the
> /update QTime goes from ~500ms  to ~25.
>
> There is an automated health check that issues a kill on the  due
> to http connection timeout.
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>



-- 
Arcadius Ahouansou
Menelic Ltd | Applied Knowledge Is Power
Office : +441444702101
Mobile: +447908761999
Menelic Ltd: menelic.com
Visitor Management System: menelicvisitor.com
---


Re: 9000+ CLOSE_WAIT connections in solr v6.2.2 causing it to "die"

2018-02-02 Thread mmb1234
> You said that you're running Solr 6.2.2, but there is no 6.2.2 version.
> but the JVM argument list includes "-Xmx512m" which is a 512MB heap

My typos. They're 6.6.2 and -Xmx30g respectively.

> many open connections causes is a large number of open file handles,

solr [ /opt/solr/server/logs ]$ sysctl -a | grep vm.max_map_count
vm.max_map_count = 262144

The only thing I notice right before solr shutdown messages in solr.log the
/update QTime goes from ~500ms  to ~25.

There is an automated health check that issues a kill on the  due
to http connection timeout. 





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Expected response from master when replication is disabled?

2018-02-02 Thread Elizabeth Haubert
What is the intended use case and behavior of disablereplication?

My expectation was that it would be to cause the master to not respond to
 slave requests.

Working with a master/slave Solr 7.1 setup; slave is set up to poll every
60s. I would prefer to remain on M/S for immediate future, while we are
resolving instability in the ingestion pipeline external to Solr, which
causes bad data to be ingested.

Testing failure scenarios related to data corruption, disabled replication
at the master:
curl http://
$SOLR_NODE/solr/$COLLECTION/replication?command=disablereplication

But did not disable polling at the slaves.

2018-02-02 14:05:13.775 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's generation: 0
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's version: 0
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's generation: 12254
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's version: 1517493295318
2018-02-02 14:05:13.776 INFO  (indexFetcher-16-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher New index in Master. Deleting mine...

And the slave did indeed delete all its documents.
Master's generation at the time was also 12254.

Re-enabled replication at the master, slave caught back up again :
Normal behavior when master and slave are in sync, the slave polls:
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's generation: 12258
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Master's version: 1517595712709
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's generation: 12258
2018-02-02 16:08:31.489 INFO  (indexFetcher-24-thread-1) [   x:collection1]
o.a.s.h.IndexFetcher Slave's version: 1517595712709

Things that delete the index are big glaring problems, but I'm not clear if
this is a Solr bug or user error.

Liz


Re: 9000+ CLOSE_WAIT connections in solr v6.2.2 causing it to "die"

2018-02-02 Thread Shawn Heisey

On 2/2/2018 10:00 AM, mmb1234 wrote:

In our solr non-cloud env., we are seeing lots of CLOSE_WAIT, causing jvm to
stop "working" with 3 mins of solr start.

solr [ /opt/solr ]$ netstat -anp | grep 8983 | grep CLOSE_WAIT | grep
10.xxx.xxx.xxx | wc -l
9453


Solr isn't handling network services.  That's being done by the servlet 
container.  The servlet container included with Solr is Jetty.  Since 
version 5.0, Jetty is the only option with official support.


There are a number of issues against Jetty where people are seeing many 
CLOSE_WAIT problems.  At least one of them does acknowledge that there 
is a bug in Jetty, but in discussion with them on IRC, they have 
apparently never been able to actually reproduce the problem.  One issue 
says that they don't think Jetty 9.4.x will have the problem, but Solr 
won't have Jetty 9.4.x until Solr 7.3.0 is released.


You said that you're running Solr 6.2.2, but there is no 6.2.2 version.  
There is 6.4.2 and 6.6.2.  Solr 6.6.x and 6.4.x include Jetty 9.3.14, 
6.2.x includes Jetty 9.3.8.


Another oddity: Your info says that Solr has a 30GB heap, but the JVM 
argument list includes "-Xmx512m" which is a 512MB heap.


One issue that a many open connections causes is a large number of open 
file handles, and that can apparently cause software to become 
unresponsive.  Are you seeing any errors in Solr's logfile before the 
process becomes unresponsive?


Thanks,
Shawn



Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread S G
Our 3.4.6 ZK nodes were unable to join the cluster unless their quorum got
broken.
So if there was 5 node zookeeper and it lost 2 nodes, they would not rejoin
because ZK still had its quorum.
To make them join, you had to break the quorum by restarting a node in
quorum.
Only when quorum broke, did ZK realize that something was wrong and it
recognized the other two nodes trying to rejoin.
Also this problem happened only when ZK had been running for a long time,
like several weeks (perhaps DNS caching or something, not sure really).


On Fri, Feb 2, 2018 at 11:32 AM, Tomas Fernandez Lobbe 
wrote:

> Hi Markus,
> If the same code that runs OK in 7.1 breaks 7.2.1, it is clear to me that
> there is some bug in Solr introduced between those releases (maybe an
> increase in memory utilization? or maybe some decrease in query throughput
> making threads to pile up?). I’d hate to have this issue lost in the users
> list, could you create a Jira? Maybe next time you have this issue you can
> post thread/heap dumps, that would be useful.
>
> Tomás
>
> > On Feb 2, 2018, at 9:38 AM, Walter Underwood 
> wrote:
> >
> > Zookeeper 3.4.6 is not good? That was the version recommended by Solr
> docs when I installed 6.2.0.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Feb 2, 2018, at 9:30 AM, Markus Jelsma 
> wrote:
> >>
> >> Hello S.G.
> >>
> >> We have relied in Trie* fields every since they became available, i
> don't think reverting to the old fieldType's will do us any good, we have a
> very recent problem.
> >>
> >> Regarding our heap, the cluster ran fine for years with just 1.5 GB, we
> only recently increased it because or data keeps on growing. Heap rarely
> goes higher than 50 %, except when this specific problem occurs. The nodes
> have no problem processing a few hundred QPS continuously and can go on for
> days, sometimes even a few weeks.
> >>
> >> I will keep my eye open for other clues when the problem strikes again!
> >>
> >> Thanks,
> >> Markus
> >>
> >> -Original message-
> >>> From:S G 
> >>> Sent: Friday 2nd February 2018 18:20
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: 7.2.1 cluster dies within minutes after restart
> >>>
> >>> Yeah, definitely check the zookeeper version.
> >>> 3.4.6 is not a good one I know and you can say the same for all the
> >>> versions below it too.
> >>> We have used 3.4.9 with no issues.
> >>> While Solr 7.x uses 3.4.10
> >>>
> >>> Another dimension could be the use or (dis-use) of p-fields like pint,
> >>> plong etc.
> >>> If you are using them, try to revert back to tint, tlong etc
> >>> And if you are not using them, try to use them (Although doing this
> means a
> >>> change from your older config and less likely to help).
> >>>
> >>> Lastly, did I read 2 GB for JVM heap?
> >>> That seems really too less to me for any version of Solr
> >>> We run with 10-16 gb of heap with G1GC collector and new-gen capped at
> 3-4gb
> >>>
> >>>
> >>> On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> >>> wrote:
> >>>
>  Hello Ere,
> 
>  It appears that my initial e-mail [1] got lost in the thread. We don't
>  have GC issues, the cluster that dies occasionally runs, in general,
> smooth
>  and quick with just 2 GB allocated.
> 
>  Thanks,
>  Markus
> 
>  [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
>  within-minutes-after-restart-td4372615.html
> 
>  -Original message-
> > From:Ere Maijala 
> > Sent: Friday 2nd February 2018 8:49
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > Markus,
> >
> > I may be stating the obvious here, but I didn't notice garbage
> > collection mentioned in any of the previous messages, so here goes.
> In
> > our experience almost all of the Zookeeper timeouts etc. have been
> > caused by too long garbage collection pauses. I've summed up my
> > observations here:
> >  msg135857.html
> >
> >
> > So, in my experience it's relatively easy to cause heavy memory usage
> > with SolrCloud with seemingly innocent queries, and GC can become a
> > problem really quickly even if everything seems to be running
> smoothly
> > otherwise.
> >
> > Regards,
> > Ere
> >
> > Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
> >> Hello S.G.
> >>
> >> We do not complain about speed improvements at all, it is clear 7.x
> is
>  faster than its predecessor. The problem is stability and not
> recovering
>  from weird circumstances. In general, it is our high load cluster
>  containing user interaction logs that suffers the most. Our main text
>  search cluster - receiving much fewer queries - seems mostly
> unaffected,
>  except last Sunday. After very

Re: Help with Boolean search using Solr parser edismax

2018-02-02 Thread Wendy2
Hi Erick,

Yes. Currently I re-index the database on a weekly basis because we only
have weekly release.
As part of the Solr weekly re-index, the batch job will delete the
/solr/core/data folder, restart Solr server, then re-index.
We use Luigi to build/control pipelines of Solr re-index batch jobs.

Thanks for al your help and support!

All the best,

Wendy 
a happy Solr user :-) 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: External file fields

2018-02-02 Thread Chris Hostetter

: Interesting. I will definitely explore this. Just so I'm clear, we can 
: sort on docValues, but not filter? Is there any situation where external 
: file fields would work better than docValues?

For most field types that support docValues, you can still filter on it 
even if it's indexed="false" -- but the filtering may not be as efficient 
as using indexed values.  for numeric fields you certainly can.

One situation where ExternalFileFiled would probably preferable to doing 
inplace updates on docValues is when you know you need to update the value 
for *every* document in your collection in batch -- for large 
collections, looping over every doc and sending an atomic update would 
probably be slower then just replacing the external file.

Another example when i would probably choose external file field over 
docValues is if the "keyField" was not the same as my uniqueKey field ... 
ie: if i have millions of documents each with a category_id that has a 
cardinality of ~100 categories.  I could use 
the category_id field as the keyField to associate every doc w/some 
numeric "category_rank" value (that varies only per category).  If i 
need/want to tweak 1 of those 100 category_rank values updating the 
entire external file just to change that 1 value is still probably much 
easier then redundemntly putting that category_rank field in every 
doc and sending an atomic update to all ~10K docs that have 
same category_id,category_rank i want to change.


: 
: -Original Message-
: From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
: Sent: Friday, February 2, 2018 12:24 PM
: To: solr-user@lucene.apache.org
: Subject: RE: External file fields
: 
: 
: : I did look into updatable docValues, but my understanding is that the
: : field has to be non-indexed (indexed="false"). I need to be able to sort
: : on these values. External field fields are sortable.
: 
: YOu can absolutely sort on a field that is docValues="true" 
: indexed="false" ... that is much more efficient then sorting on a field that 
is docValues="false" index="true" -- in the later case solr has to build a 
fieldcache (aka: run-time-mock-docvalues) from the indexed values the first 
time you try to sort on the field after a searcher is opened
: 
: 
: 
: -Hoss
: http://www.lucidworks.com/
: 

-Hoss
http://www.lucidworks.com/


Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread Tomas Fernandez Lobbe
Hi Markus, 
If the same code that runs OK in 7.1 breaks 7.2.1, it is clear to me that there 
is some bug in Solr introduced between those releases (maybe an increase in 
memory utilization? or maybe some decrease in query throughput making threads 
to pile up?). I’d hate to have this issue lost in the users list, could you 
create a Jira? Maybe next time you have this issue you can post thread/heap 
dumps, that would be useful.

Tomás

> On Feb 2, 2018, at 9:38 AM, Walter Underwood  wrote:
> 
> Zookeeper 3.4.6 is not good? That was the version recommended by Solr docs 
> when I installed 6.2.0.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
>> On Feb 2, 2018, at 9:30 AM, Markus Jelsma  wrote:
>> 
>> Hello S.G.
>> 
>> We have relied in Trie* fields every since they became available, i don't 
>> think reverting to the old fieldType's will do us any good, we have a very 
>> recent problem.
>> 
>> Regarding our heap, the cluster ran fine for years with just 1.5 GB, we only 
>> recently increased it because or data keeps on growing. Heap rarely goes 
>> higher than 50 %, except when this specific problem occurs. The nodes have 
>> no problem processing a few hundred QPS continuously and can go on for days, 
>> sometimes even a few weeks.
>> 
>> I will keep my eye open for other clues when the problem strikes again!
>> 
>> Thanks,
>> Markus
>> 
>> -Original message-
>>> From:S G 
>>> Sent: Friday 2nd February 2018 18:20
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: 7.2.1 cluster dies within minutes after restart
>>> 
>>> Yeah, definitely check the zookeeper version.
>>> 3.4.6 is not a good one I know and you can say the same for all the
>>> versions below it too.
>>> We have used 3.4.9 with no issues.
>>> While Solr 7.x uses 3.4.10
>>> 
>>> Another dimension could be the use or (dis-use) of p-fields like pint,
>>> plong etc.
>>> If you are using them, try to revert back to tint, tlong etc
>>> And if you are not using them, try to use them (Although doing this means a
>>> change from your older config and less likely to help).
>>> 
>>> Lastly, did I read 2 GB for JVM heap?
>>> That seems really too less to me for any version of Solr
>>> We run with 10-16 gb of heap with G1GC collector and new-gen capped at 3-4gb
>>> 
>>> 
>>> On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma 
>>> wrote:
>>> 
 Hello Ere,
 
 It appears that my initial e-mail [1] got lost in the thread. We don't
 have GC issues, the cluster that dies occasionally runs, in general, smooth
 and quick with just 2 GB allocated.
 
 Thanks,
 Markus
 
 [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
 within-minutes-after-restart-td4372615.html
 
 -Original message-
> From:Ere Maijala 
> Sent: Friday 2nd February 2018 8:49
> To: solr-user@lucene.apache.org
> Subject: Re: 7.2.1 cluster dies within minutes after restart
> 
> Markus,
> 
> I may be stating the obvious here, but I didn't notice garbage
> collection mentioned in any of the previous messages, so here goes. In
> our experience almost all of the Zookeeper timeouts etc. have been
> caused by too long garbage collection pauses. I've summed up my
> observations here:
>  
> 
> So, in my experience it's relatively easy to cause heavy memory usage
> with SolrCloud with seemingly innocent queries, and GC can become a
> problem really quickly even if everything seems to be running smoothly
> otherwise.
> 
> Regards,
> Ere
> 
> Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
>> Hello S.G.
>> 
>> We do not complain about speed improvements at all, it is clear 7.x is
 faster than its predecessor. The problem is stability and not recovering
 from weird circumstances. In general, it is our high load cluster
 containing user interaction logs that suffers the most. Our main text
 search cluster - receiving much fewer queries - seems mostly unaffected,
 except last Sunday. After very short but high burst of queries it entered
 the same catatonic state the logs cluster usually dies from.
>> 
>> The query burst immediately caused ZK timeouts and high heap
 consumption (not sure which came first of the latter two). The query burst
 lasted for 30 minutes, the excessive heap consumption continued for more
 than 8 hours, before Solr finally realized it could relax. Most remarkable
 was that Solr recovered on its own, ZK timeouts stopped, heap went back to
 normal.
>> 
>> There seems to be a causality between high load and this state.
>> 
>> We really want to get this fixed for ourselves and everyone else that
 may encounter this problem, but i don't know how, so i need much more
 feedback and hints from those who have deep understandi

Re: Query fields with data of certain length

2018-02-02 Thread Chris Hostetter

: Have you manage to get the regex for this string in Chinese: 预支款管理及账务处理办法 ?
...
: > An example of the string in Chinese is 预支款管理及账务处理办法
: >
: > The number of characters is 12, but the expected length should be 36.
...
: >> > So this would likely be different from what the operating system
: >> counts, as
: >> > the operating system may consider each Chinese characters as 3 to 4
: >> bytes.
: >> > Which is probably why I could not find any record with
: >> subject:/.{255,}.*/

Java regexes operate on unicode strings, so ".' matches any *character*
There is no regex syntax to match an any "byte" so a regex based approach 
is never going to be viable.

You're best bet is to check the byte count when indexing -- but even then 
you'd need some custom code since things like 
FieldLengthUpdateProcessorFactory are well behaved and count the 
*characters* of the unicode strings.

If you absolutely can't reindex, then you'd need a custom QParser that 
produced a custom Query object that iterated over the TermEnum looking at 
the buffers and counting the bytes in each term -- matching each doc 
assocaited with those terms.



-Hoss
http://www.lucidworks.com/

RE: External file fields

2018-02-02 Thread Brian Yee
Interesting. I will definitely explore this. Just so I'm clear, we can sort on 
docValues, but not filter? Is there any situation where external file fields 
would work better than docValues?

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Friday, February 2, 2018 12:24 PM
To: solr-user@lucene.apache.org
Subject: RE: External file fields


: I did look into updatable docValues, but my understanding is that the
: field has to be non-indexed (indexed="false"). I need to be able to sort
: on these values. External field fields are sortable.

YOu can absolutely sort on a field that is docValues="true" 
indexed="false" ... that is much more efficient then sorting on a field that is 
docValues="false" index="true" -- in the later case solr has to build a 
fieldcache (aka: run-time-mock-docvalues) from the indexed values the first 
time you try to sort on the field after a searcher is opened



-Hoss
http://www.lucidworks.com/


Re: Help with Boolean search using Solr parser edismax

2018-02-02 Thread Wendy2
Hi Erick,

Thank you very much for the clarification. I will keep it in my mind since
we are now in the process of migrating MySQL database to mongoDB.

Best Regards,

Wendy 
a happy Solr user 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Help with Boolean search using Solr parser edismax

2018-02-02 Thread Erick Erickson
>From the ref guide:

"Field names should consist of alphanumeric or underscore characters
only and not start with a digit. This is not currently strictly
enforced, but other field names will not have first class support from
all components and back compatibility is not guaranteed."

You need to _completely_ blow away your index and re-index from
scratch though. By "completely blow away" I mean
1> shut down your Solrs and "rm -rf each_core/data"
or
2> create a new collection and index into that
or
3> delete your collection and re-create it.

If you just re-index your entire corpus after changing the field
names, the metadata for the old fields will be preserved. Likely you
won't notice, it's not very much data but


Erick

On Fri, Feb 2, 2018 at 6:14 AM, Wendy2  wrote:
> Good morning, Emir,
>
> Thanks for letting me know that. I used dots to add tableName. as a field
> prefix because several columns from different tables have the same names.
> In your opinion, what will be the best way to replace dots?
>
> Happy Friday!
>
> Wendy
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread Walter Underwood
Zookeeper 3.4.6 is not good? That was the version recommended by Solr docs when 
I installed 6.2.0.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Feb 2, 2018, at 9:30 AM, Markus Jelsma  wrote:
> 
> Hello S.G.
> 
> We have relied in Trie* fields every since they became available, i don't 
> think reverting to the old fieldType's will do us any good, we have a very 
> recent problem.
> 
> Regarding our heap, the cluster ran fine for years with just 1.5 GB, we only 
> recently increased it because or data keeps on growing. Heap rarely goes 
> higher than 50 %, except when this specific problem occurs. The nodes have no 
> problem processing a few hundred QPS continuously and can go on for days, 
> sometimes even a few weeks.
> 
> I will keep my eye open for other clues when the problem strikes again!
> 
> Thanks,
> Markus
> 
> -Original message-
>> From:S G 
>> Sent: Friday 2nd February 2018 18:20
>> To: solr-user@lucene.apache.org
>> Subject: Re: 7.2.1 cluster dies within minutes after restart
>> 
>> Yeah, definitely check the zookeeper version.
>> 3.4.6 is not a good one I know and you can say the same for all the
>> versions below it too.
>> We have used 3.4.9 with no issues.
>> While Solr 7.x uses 3.4.10
>> 
>> Another dimension could be the use or (dis-use) of p-fields like pint,
>> plong etc.
>> If you are using them, try to revert back to tint, tlong etc
>> And if you are not using them, try to use them (Although doing this means a
>> change from your older config and less likely to help).
>> 
>> Lastly, did I read 2 GB for JVM heap?
>> That seems really too less to me for any version of Solr
>> We run with 10-16 gb of heap with G1GC collector and new-gen capped at 3-4gb
>> 
>> 
>> On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma 
>> wrote:
>> 
>>> Hello Ere,
>>> 
>>> It appears that my initial e-mail [1] got lost in the thread. We don't
>>> have GC issues, the cluster that dies occasionally runs, in general, smooth
>>> and quick with just 2 GB allocated.
>>> 
>>> Thanks,
>>> Markus
>>> 
>>> [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
>>> within-minutes-after-restart-td4372615.html
>>> 
>>> -Original message-
 From:Ere Maijala 
 Sent: Friday 2nd February 2018 8:49
 To: solr-user@lucene.apache.org
 Subject: Re: 7.2.1 cluster dies within minutes after restart
 
 Markus,
 
 I may be stating the obvious here, but I didn't notice garbage
 collection mentioned in any of the previous messages, so here goes. In
 our experience almost all of the Zookeeper timeouts etc. have been
 caused by too long garbage collection pauses. I've summed up my
 observations here:
  Hello S.G.
> 
> We do not complain about speed improvements at all, it is clear 7.x is
>>> faster than its predecessor. The problem is stability and not recovering
>>> from weird circumstances. In general, it is our high load cluster
>>> containing user interaction logs that suffers the most. Our main text
>>> search cluster - receiving much fewer queries - seems mostly unaffected,
>>> except last Sunday. After very short but high burst of queries it entered
>>> the same catatonic state the logs cluster usually dies from.
> 
> The query burst immediately caused ZK timeouts and high heap
>>> consumption (not sure which came first of the latter two). The query burst
>>> lasted for 30 minutes, the excessive heap consumption continued for more
>>> than 8 hours, before Solr finally realized it could relax. Most remarkable
>>> was that Solr recovered on its own, ZK timeouts stopped, heap went back to
>>> normal.
> 
> There seems to be a causality between high load and this state.
> 
> We really want to get this fixed for ourselves and everyone else that
>>> may encounter this problem, but i don't know how, so i need much more
>>> feedback and hints from those who have deep understanding of inner working
>>> of Solrcloud and changes since 6.x.
> 
> To be clear, we don't have the problem of 15 second ZK timeout, we use
>>> 30. Is 30 too low still? Is it even remotely related to this problem? What
>>> does load have to do with it?
> 
> We are not able to reproduce it in lab environments. It can take
>>> minutes after cluster startup for it to occur, but also days.
> 
> I've been slightly annoyed by problems that can occur in a board time
>>> span, it is always bad luck for reproduction.
> 
> Any help getting further is much appreciated.
> 
> Many thanks,

RE: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread Markus Jelsma
Hello S.G.

We have relied in Trie* fields every since they became available, i don't think 
reverting to the old fieldType's will do us any good, we have a very recent 
problem.

Regarding our heap, the cluster ran fine for years with just 1.5 GB, we only 
recently increased it because or data keeps on growing. Heap rarely goes higher 
than 50 %, except when this specific problem occurs. The nodes have no problem 
processing a few hundred QPS continuously and can go on for days, sometimes 
even a few weeks.

I will keep my eye open for other clues when the problem strikes again!

Thanks,
Markus

-Original message-
> From:S G 
> Sent: Friday 2nd February 2018 18:20
> To: solr-user@lucene.apache.org
> Subject: Re: 7.2.1 cluster dies within minutes after restart
> 
> Yeah, definitely check the zookeeper version.
> 3.4.6 is not a good one I know and you can say the same for all the
> versions below it too.
> We have used 3.4.9 with no issues.
> While Solr 7.x uses 3.4.10
> 
> Another dimension could be the use or (dis-use) of p-fields like pint,
> plong etc.
> If you are using them, try to revert back to tint, tlong etc
> And if you are not using them, try to use them (Although doing this means a
> change from your older config and less likely to help).
> 
> Lastly, did I read 2 GB for JVM heap?
> That seems really too less to me for any version of Solr
> We run with 10-16 gb of heap with G1GC collector and new-gen capped at 3-4gb
> 
> 
> On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma 
> wrote:
> 
> > Hello Ere,
> >
> > It appears that my initial e-mail [1] got lost in the thread. We don't
> > have GC issues, the cluster that dies occasionally runs, in general, smooth
> > and quick with just 2 GB allocated.
> >
> > Thanks,
> > Markus
> >
> > [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
> > within-minutes-after-restart-td4372615.html
> >
> > -Original message-
> > > From:Ere Maijala 
> > > Sent: Friday 2nd February 2018 8:49
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > >
> > > Markus,
> > >
> > > I may be stating the obvious here, but I didn't notice garbage
> > > collection mentioned in any of the previous messages, so here goes. In
> > > our experience almost all of the Zookeeper timeouts etc. have been
> > > caused by too long garbage collection pauses. I've summed up my
> > > observations here:
> > >  > >
> > >
> > > So, in my experience it's relatively easy to cause heavy memory usage
> > > with SolrCloud with seemingly innocent queries, and GC can become a
> > > problem really quickly even if everything seems to be running smoothly
> > > otherwise.
> > >
> > > Regards,
> > > Ere
> > >
> > > Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
> > > > Hello S.G.
> > > >
> > > > We do not complain about speed improvements at all, it is clear 7.x is
> > faster than its predecessor. The problem is stability and not recovering
> > from weird circumstances. In general, it is our high load cluster
> > containing user interaction logs that suffers the most. Our main text
> > search cluster - receiving much fewer queries - seems mostly unaffected,
> > except last Sunday. After very short but high burst of queries it entered
> > the same catatonic state the logs cluster usually dies from.
> > > >
> > > > The query burst immediately caused ZK timeouts and high heap
> > consumption (not sure which came first of the latter two). The query burst
> > lasted for 30 minutes, the excessive heap consumption continued for more
> > than 8 hours, before Solr finally realized it could relax. Most remarkable
> > was that Solr recovered on its own, ZK timeouts stopped, heap went back to
> > normal.
> > > >
> > > > There seems to be a causality between high load and this state.
> > > >
> > > > We really want to get this fixed for ourselves and everyone else that
> > may encounter this problem, but i don't know how, so i need much more
> > feedback and hints from those who have deep understanding of inner working
> > of Solrcloud and changes since 6.x.
> > > >
> > > > To be clear, we don't have the problem of 15 second ZK timeout, we use
> > 30. Is 30 too low still? Is it even remotely related to this problem? What
> > does load have to do with it?
> > > >
> > > > We are not able to reproduce it in lab environments. It can take
> > minutes after cluster startup for it to occur, but also days.
> > > >
> > > > I've been slightly annoyed by problems that can occur in a board time
> > span, it is always bad luck for reproduction.
> > > >
> > > > Any help getting further is much appreciated.
> > > >
> > > > Many thanks,
> > > > Markus
> > > >
> > > > -Original message-
> > > >> From:S G 
> > > >> Sent: Wednesday 31st January 2018 21:48
> > > >> To: solr-user@lucene.apache.org
> > > >> Subject: Re: 7.2.1 cluster dies within minutes after restart
> > > >>
> > > >> We did s

RE: External file fields

2018-02-02 Thread Chris Hostetter

: I did look into updatable docValues, but my understanding is that the 
: field has to be non-indexed (indexed="false"). I need to be able to sort 
: on these values. External field fields are sortable.

YOu can absolutely sort on a field that is docValues="true" 
indexed="false" ... that is much more efficient then sorting on a field 
that is docValues="false" index="true" -- in the later case solr has to 
build a fieldcache (aka: run-time-mock-docvalues) from the indexed values 
the first time you try to sort on the field after a searcher is opened



-Hoss
http://www.lucidworks.com/


Re: SolrCloudClient multi-threading

2018-02-02 Thread Joel Bernstein
CloudSolrClient determines which shard each document is going to in
advance. It then separates the documents based on the route and uses a
thread for each group to send the documents.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Feb 2, 2018 at 11:09 AM, Steve Pruitt  wrote:

> The 7.2.1 SolrCloudClientBuilder has the method
> withParallelUpdates(Boolean).  It's my understanding the SolrCloudClient
> does not manage multiple threads like the ConcurrentUpdateSolrClient.
> Curious what the withParallelUpdates setting on SolrCloudClientBuilder
> does.  It hints at multi-threaded updates.
>
> Thanks in advance.
>
> -S
>


Re: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread S G
Yeah, definitely check the zookeeper version.
3.4.6 is not a good one I know and you can say the same for all the
versions below it too.
We have used 3.4.9 with no issues.
While Solr 7.x uses 3.4.10

Another dimension could be the use or (dis-use) of p-fields like pint,
plong etc.
If you are using them, try to revert back to tint, tlong etc
And if you are not using them, try to use them (Although doing this means a
change from your older config and less likely to help).

Lastly, did I read 2 GB for JVM heap?
That seems really too less to me for any version of Solr
We run with 10-16 gb of heap with G1GC collector and new-gen capped at 3-4gb


On Fri, Feb 2, 2018 at 4:27 AM, Markus Jelsma 
wrote:

> Hello Ere,
>
> It appears that my initial e-mail [1] got lost in the thread. We don't
> have GC issues, the cluster that dies occasionally runs, in general, smooth
> and quick with just 2 GB allocated.
>
> Thanks,
> Markus
>
> [1]: http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-
> within-minutes-after-restart-td4372615.html
>
> -Original message-
> > From:Ere Maijala 
> > Sent: Friday 2nd February 2018 8:49
> > To: solr-user@lucene.apache.org
> > Subject: Re: 7.2.1 cluster dies within minutes after restart
> >
> > Markus,
> >
> > I may be stating the obvious here, but I didn't notice garbage
> > collection mentioned in any of the previous messages, so here goes. In
> > our experience almost all of the Zookeeper timeouts etc. have been
> > caused by too long garbage collection pauses. I've summed up my
> > observations here:
> >  >
> >
> > So, in my experience it's relatively easy to cause heavy memory usage
> > with SolrCloud with seemingly innocent queries, and GC can become a
> > problem really quickly even if everything seems to be running smoothly
> > otherwise.
> >
> > Regards,
> > Ere
> >
> > Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
> > > Hello S.G.
> > >
> > > We do not complain about speed improvements at all, it is clear 7.x is
> faster than its predecessor. The problem is stability and not recovering
> from weird circumstances. In general, it is our high load cluster
> containing user interaction logs that suffers the most. Our main text
> search cluster - receiving much fewer queries - seems mostly unaffected,
> except last Sunday. After very short but high burst of queries it entered
> the same catatonic state the logs cluster usually dies from.
> > >
> > > The query burst immediately caused ZK timeouts and high heap
> consumption (not sure which came first of the latter two). The query burst
> lasted for 30 minutes, the excessive heap consumption continued for more
> than 8 hours, before Solr finally realized it could relax. Most remarkable
> was that Solr recovered on its own, ZK timeouts stopped, heap went back to
> normal.
> > >
> > > There seems to be a causality between high load and this state.
> > >
> > > We really want to get this fixed for ourselves and everyone else that
> may encounter this problem, but i don't know how, so i need much more
> feedback and hints from those who have deep understanding of inner working
> of Solrcloud and changes since 6.x.
> > >
> > > To be clear, we don't have the problem of 15 second ZK timeout, we use
> 30. Is 30 too low still? Is it even remotely related to this problem? What
> does load have to do with it?
> > >
> > > We are not able to reproduce it in lab environments. It can take
> minutes after cluster startup for it to occur, but also days.
> > >
> > > I've been slightly annoyed by problems that can occur in a board time
> span, it is always bad luck for reproduction.
> > >
> > > Any help getting further is much appreciated.
> > >
> > > Many thanks,
> > > Markus
> > >
> > > -Original message-
> > >> From:S G 
> > >> Sent: Wednesday 31st January 2018 21:48
> > >> To: solr-user@lucene.apache.org
> > >> Subject: Re: 7.2.1 cluster dies within minutes after restart
> > >>
> > >> We did some basic load testing on our 7.1.0 and 7.2.1 clusters.
> > >> And that came out all right.
> > >> We saw a performance increase of about 30% in read latencies between
> 6.6.0
> > >> and 7.1.0
> > >> And then we saw a performance degradation of about 10% between 7.1.0
> and
> > >> 7.2.1 in many metrics.
> > >> But overall, it still seems better than 6.6.0.
> > >>
> > >> I will check for the errors too in the logs but the nodes were
> responsive
> > >> for all the 23+ hours we did the load test.
> > >>
> > >> Disclaimer: We do not test facets and pivots or block-joins. And will
> add
> > >> those features to our load-testing tool sometime this year.
> > >>
> > >> Thanks
> > >> SG
> > >>
> > >>
> > >> On Wed, Jan 31, 2018 at 3:12 AM, Markus Jelsma <
> markus.jel...@openindex.io>
> > >> wrote:
> > >>
> > >>> Ah thanks, i just submitted a patch fixing it.
> > >>>
> > >>> Anyway, in the end it appears this is not the problem we are seeing
> as our
> > >>> timeouts were already a

Re:skip slow tests?

2018-02-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
ant -Dtests.slow=false

From: solr-user@lucene.apache.org At: 02/02/18 17:07:14To:  
solr-user@lucene.apache.org
Subject: skip slow tests?

Hi *, 
Some (slow) tests in Solr are annotated with @Slow. Is there a way to run ant 
test skipping them?

thanks,
Diego



skip slow tests?

2018-02-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi *, 
Some (slow) tests in Solr are annotated with @Slow. Is there a way to run ant 
test skipping them?

thanks,
Diego

9000+ CLOSE_WAIT connections in solr v6.2.2 causing it to "die"

2018-02-02 Thread mmb1234
Hello,

In our solr non-cloud env., we are seeing lots of CLOSE_WAIT, causing jvm to
stop "working" with 3 mins of solr start.

solr [ /opt/solr ]$ netstat -anp | grep 8983 | grep CLOSE_WAIT | grep
10.xxx.xxx.xxx | wc -l
9453

Only option is then`kill -9` because even `jcmd  Thread.print` is
unable to connect to the jvm. The problem can be reproduced at will.

Any suggestions what could be causing this or the fix ?

Details of system are as follows and has been setup for "bulk indexing".

-

Solr / server:
v6.2.2 non-solrcloud in a docker with kubernetes
java: 1.8.0_151 25.151-b12 HotSpot 64bit | Oracle
jvm: heap 30GB
os: Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19) x86_64
GNU/Linux
os memory: 230GB | no swap configured
os cpu: 32vCPU
jvm:  "-XX:+UseLargePages",
"-XX:LargePageSizeInBytes=2m",
"-Xms512m",
"-Xmx512m",
"-XX:NewRatio=3",
"-XX:SurvivorRatio=4",
"-XX:TargetSurvivorRatio=90",
"-XX:MaxTenuringThreshold=8",
"-XX:+UseConcMarkSweepGC",
"-XX:+UseParNewGC",
"-XX:ConcGCThreads=4",
"-XX:ParallelGCThreads=4",
"-XX:+CMSScavengeBeforeRemark",
"-XX:PretenureSizeThreshold=64m",
"-XX:+UseCMSInitiatingOccupancyOnly",
"-XX:CMSInitiatingOccupancyFraction=50",
"-XX:CMSMaxAbortablePrecleanTime=6000",
"-XX:+CMSParallelRemarkEnabled",
"-XX:+ParallelRefProcEnabled",

non-cloud solr.xml:
transientCacheSize = 30
shareSchema = true
Also only 4 cores are POSTed to.

Client / java8 app:
An AsyncHTTPClient POST-ing gzip payloads.
PoolingNHttpClientConnectionManager maxtotal=10,000 and maxperroute=1000)
ConnectionRequestTimeout = ConnectTimeout = SocketTimeout = 4000 (4 secs)

Gzip payloads:
About 800 json messages like this.
[
  {id:"abcdefx", datetimestamp:"xx", key1:"xx", key2:"z",
},
   
]

POST rate:
Each of 4 solr core receives ~32 payloads per second from the custom java
app (plugin handler metrics in solr reports the same).
Approx ~102,000 docs per sec in total (32 payload x 800 docs x 4 solr cores)

Document uniqueness:
No doc or id is ever repeated or concurrently sent.
No atomic updates needed (overwrite=false in AddUpdateCommand was set in
solr handler)

Solrconfig.xml
For bulk indexing requirement, updatelog and softcommit were minimized /
removed.
  
none
200

  1
  6

  
  

  ${solr.autoCommit.maxTime:1}
  true


  ${solr.autoSoftCommit.maxTime:-1}
  ${solr.autoSoftCommit.maxDocs:-1}
  false

  

-M



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Help with Boolean search using Solr parser edismax

2018-02-02 Thread Wendy2
Good morning, Emir,

Thanks for letting me know that. I used dots to add tableName. as a field
prefix because several columns from different tables have the same names.  
In your opinion, what will be the best way to replace dots?

Happy Friday!

Wendy



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Help with Boolean search using Solr parser edismax

2018-02-02 Thread Erick Erickson
>From the reference guide:

Field names should consist of alphanumeric or underscore characters
only and not start with a digit. This is not currently strictly
enforced, but other field names will not have first class support from
all components and back compatibility is not guaranteed.

Best,
Erick

On Fri, Feb 2, 2018 at 1:07 AM, Emir Arnautović
 wrote:
> Hi Wendy,
> A bit off-topic, but forgot to mention in previous mail: dots in field names 
> are not recommended. Even it obviously works for you, I think I’ve seen 
> people reporting some issues caused by dot in field names (I cannot find some 
> reference now). So, if you plan some system upgrade in the future, you might 
> want to get rid of field names with dots - you can safely use underscore.
>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
>> On 1 Feb 2018, at 17:19, Wendy2  wrote:
>>
>> And the coupon has no expiration date on it (LOL).  Thank you again, Emir!
>>
>> Best Regards,
>>
>> Wendy
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Wiki contributor

2018-02-02 Thread Erick Erickson
Done.


On Fri, Feb 2, 2018 at 4:08 AM, David Backstein  wrote:
> Hi there,
>
> It would be great, if you could add me to the Wiki contributors group.
>
> My username is: dbackstein
>
> Thank you very much!
>
> Best regards,
> D
>
> --
> David Backstein
> CEO
>
> https://pixolution.org
>
> Phone: +49(0)30/60984960 | Fax: +49(0)30/60984962
> gpg-key: BEB26BF5
>
> Read us  : http://pixolution.org/blog/
> Follow us: http://twitter.com/pixolution
> Try us   : http://demo.pixolution.org
>
> pixolution GmbH | c/o Büro 2.0 | Weigandufer 45 | 12059 Berlin
> HRB 120049 | Amtsgericht Charlottenburg, Berlin
> Geschäftsführer / executive board: David Backstein | Prof. Dr. Kai-Uwe
> Barthel
>
> Confidentiality: This e-mail contains confidential information intended
> only for the addressee.
> If you are not the intended recipient you may not disclose, copy, use or
> otherwise distribute the content of this email.


Wiki contributor

2018-02-02 Thread David Backstein
Hi there,

It would be great, if you could add me to the Wiki contributors group.

My username is: dbackstein

Thank you very much!

Best regards,
D

-- 
David Backstein
CEO

https://pixolution.org

Phone: +49(0)30/60984960 | Fax: +49(0)30/60984962
gpg-key: BEB26BF5

Read us  : http://pixolution.org/blog/
Follow us: http://twitter.com/pixolution
Try us   : http://demo.pixolution.org

pixolution GmbH | c/o Büro 2.0 | Weigandufer 45 | 12059 Berlin
HRB 120049 | Amtsgericht Charlottenburg, Berlin
Geschäftsführer / executive board: David Backstein | Prof. Dr. Kai-Uwe
Barthel

Confidentiality: This e-mail contains confidential information intended
only for the addressee.
If you are not the intended recipient you may not disclose, copy, use or
otherwise distribute the content of this email.


Edismax, stopword query and non-existent fields

2018-02-02 Thread Craig Smiles
I've recently had a requirement request from the user to allow a query when
searching for stopwords alone. I discovered that edismax will already do
this for me, all I had to do was remove our StopFilterFactory on the index
analyzer so that the stopwords actually exist in our index.

So suppose I have a collection with 2 fields: an id with type string and a
field called "test" with type text_en. I remove the the StopFilterFactory
from the index analyzer but add it to the query analzyer. I then create 2
documents as below:

{ "id": 1, "test": "and" }
{ "id": 2, "test": "foo" }

"and" is a stopword. The below query only returns the document with id: 2
as expected:

http://localhost:8983/solr/test/select?defType=edismax&qf=test&q=and+foo

The below query returns only our document with id: 1 as expected:

http://localhost:8983/solr/test/select?defType=edismax&qf=test&q=and

So far so good.

We also have a requirement to allow partial searching on only some fields.
So some of our fields have an ngram equivalent, we postfix these fields
with "_ngram". So we'd sometimes end up with two fields: fieldname and
fieldname_ngram. We do this for all our fields, even the ones that don't
have an ngram equivalent, so the query will sometimes contain an fq with a
non-existent field:

http://localhost:8983/solr/test/select?defType=edismax&qf=test&qf=test_ngram&q=foo

This returns our id: 2 document, great! Notice that there's the
non-existent test_ngram added to a qf parameter. However, if I run a query
with a stopword:

http://localhost:8983/solr/test/select?defType=edismax&qf=test&qf=test_ngram&q=and

Then no documents are returned. This is inconvenient for us. It would be
better if edismax was liberal with non-existent fields. It also seems
inconsistent with the below query which will return the document with id: 1:

http://localhost:8983/solr/test/select?defType=edismax&qf=test&qf=test_ngram&q=and&stopwords=false

Would it be possible to ignore non-existent fields when a query only
contains stopwords? Or is there a good reason why this can't be implemented?

Regards,
Craig


SolrCloudClient multi-threading

2018-02-02 Thread Steve Pruitt
The 7.2.1 SolrCloudClientBuilder has the method withParallelUpdates(Boolean).  
It's my understanding the SolrCloudClient does not manage multiple threads like 
the ConcurrentUpdateSolrClient.
Curious what the withParallelUpdates setting on SolrCloudClientBuilder does.  
It hints at multi-threaded updates.

Thanks in advance.

-S


Re: External file fields

2018-02-02 Thread Emir Arnautović
Hi Brian,
You should be able to sort on field with only sorted values.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Feb 2018, at 16:53, Brian Yee  wrote:
> 
> Hello Erick,
> 
> I did look into updatable docValues, but my understanding is that the field 
> has to be non-indexed (indexed="false"). I need to be able to sort on these 
> values. External field fields are sortable.
> https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates
> 
> 
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com] 
> Sent: Thursday, February 1, 2018 5:00 PM
> To: solr-user 
> Subject: Re: External file fields
> 
> Have you considered updateable docValues?
> 
> Best,
> Erick
> 
> On Thu, Feb 1, 2018 at 10:55 AM, Brian Yee  wrote:
>> Hello,
>> 
>> I want to use external file field to store frequently changing inventory and 
>> price data. I got a proof of concept working with a mock text file and this 
>> will suit my needs.
>> 
>> What is the best way to keep this file updated in a fast way. Ideally I 
>> would like to read changes from a Kafka queue and write to the file. But it 
>> seems like I would have to open the whole file, read the whole file, find 
>> the line I want to change, and write the whole file for every change. Is 
>> there a better way to do that? That approach seems like it would be 
>> difficult/slow if the file is several million lines long.
>> 
>> Also, once I come up with a way to update the file quickly, what is the best 
>> way to distribute the file to all the different solrcloud nodes in the 
>> correct directory?



RE: External file fields

2018-02-02 Thread Brian Yee
Hello Erick,

I did look into updatable docValues, but my understanding is that the field has 
to be non-indexed (indexed="false"). I need to be able to sort on these values. 
External field fields are sortable.
https://lucene.apache.org/solr/guide/6_6/updating-parts-of-documents.html#UpdatingPartsofDocuments-In-PlaceUpdates


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, February 1, 2018 5:00 PM
To: solr-user 
Subject: Re: External file fields

Have you considered updateable docValues?

Best,
Erick

On Thu, Feb 1, 2018 at 10:55 AM, Brian Yee  wrote:
> Hello,
>
> I want to use external file field to store frequently changing inventory and 
> price data. I got a proof of concept working with a mock text file and this 
> will suit my needs.
>
> What is the best way to keep this file updated in a fast way. Ideally I would 
> like to read changes from a Kafka queue and write to the file. But it seems 
> like I would have to open the whole file, read the whole file, find the line 
> I want to change, and write the whole file for every change. Is there a 
> better way to do that? That approach seems like it would be difficult/slow if 
> the file is several million lines long.
>
> Also, once I come up with a way to update the file quickly, what is the best 
> way to distribute the file to all the different solrcloud nodes in the 
> correct directory?


RE: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread Markus Jelsma
Hello Ere,

It appears that my initial e-mail [1] got lost in the thread. We don't have GC 
issues, the cluster that dies occasionally runs, in general, smooth and quick 
with just 2 GB allocated.

Thanks,
Markus

[1]: 
http://lucene.472066.n3.nabble.com/7-2-1-cluster-dies-within-minutes-after-restart-td4372615.html

-Original message-
> From:Ere Maijala 
> Sent: Friday 2nd February 2018 8:49
> To: solr-user@lucene.apache.org
> Subject: Re: 7.2.1 cluster dies within minutes after restart
> 
> Markus,
> 
> I may be stating the obvious here, but I didn't notice garbage 
> collection mentioned in any of the previous messages, so here goes. In 
> our experience almost all of the Zookeeper timeouts etc. have been 
> caused by too long garbage collection pauses. I've summed up my 
> observations here: 
> 
> 
> So, in my experience it's relatively easy to cause heavy memory usage 
> with SolrCloud with seemingly innocent queries, and GC can become a 
> problem really quickly even if everything seems to be running smoothly 
> otherwise.
> 
> Regards,
> Ere
> 
> Markus Jelsma kirjoitti 31.1.2018 klo 23.56:
> > Hello S.G.
> > 
> > We do not complain about speed improvements at all, it is clear 7.x is 
> > faster than its predecessor. The problem is stability and not recovering 
> > from weird circumstances. In general, it is our high load cluster 
> > containing user interaction logs that suffers the most. Our main text 
> > search cluster - receiving much fewer queries - seems mostly unaffected, 
> > except last Sunday. After very short but high burst of queries it entered 
> > the same catatonic state the logs cluster usually dies from.
> > 
> > The query burst immediately caused ZK timeouts and high heap consumption 
> > (not sure which came first of the latter two). The query burst lasted for 
> > 30 minutes, the excessive heap consumption continued for more than 8 hours, 
> > before Solr finally realized it could relax. Most remarkable was that Solr 
> > recovered on its own, ZK timeouts stopped, heap went back to normal.
> > 
> > There seems to be a causality between high load and this state.
> > 
> > We really want to get this fixed for ourselves and everyone else that may 
> > encounter this problem, but i don't know how, so i need much more feedback 
> > and hints from those who have deep understanding of inner working of 
> > Solrcloud and changes since 6.x.
> > 
> > To be clear, we don't have the problem of 15 second ZK timeout, we use 30. 
> > Is 30 too low still? Is it even remotely related to this problem? What does 
> > load have to do with it?
> > 
> > We are not able to reproduce it in lab environments. It can take minutes 
> > after cluster startup for it to occur, but also days.
> > 
> > I've been slightly annoyed by problems that can occur in a board time span, 
> > it is always bad luck for reproduction.
> > 
> > Any help getting further is much appreciated.
> > 
> > Many thanks,
> > Markus
> >   
> > -Original message-
> >> From:S G 
> >> Sent: Wednesday 31st January 2018 21:48
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: 7.2.1 cluster dies within minutes after restart
> >>
> >> We did some basic load testing on our 7.1.0 and 7.2.1 clusters.
> >> And that came out all right.
> >> We saw a performance increase of about 30% in read latencies between 6.6.0
> >> and 7.1.0
> >> And then we saw a performance degradation of about 10% between 7.1.0 and
> >> 7.2.1 in many metrics.
> >> But overall, it still seems better than 6.6.0.
> >>
> >> I will check for the errors too in the logs but the nodes were responsive
> >> for all the 23+ hours we did the load test.
> >>
> >> Disclaimer: We do not test facets and pivots or block-joins. And will add
> >> those features to our load-testing tool sometime this year.
> >>
> >> Thanks
> >> SG
> >>
> >>
> >> On Wed, Jan 31, 2018 at 3:12 AM, Markus Jelsma 
> >> wrote:
> >>
> >>> Ah thanks, i just submitted a patch fixing it.
> >>>
> >>> Anyway, in the end it appears this is not the problem we are seeing as our
> >>> timeouts were already at 30 seconds.
> >>>
> >>> All i know is that at some point nodes start to lose ZK connections due to
> >>> timeouts (logs say so, but all within 30 seconds), the logs are flooded
> >>> with those messages:
> >>> o.a.z.ClientCnxn Client session timed out, have not heard from server in
> >>> 10359ms for sessionid 0x160f9e723c12122
> >>> o.a.z.ClientCnxn Unable to reconnect to ZooKeeper service, session
> >>> 0x60f9e7234f05bb has expired
> >>>
> >>> Then there is a doubling in heap usage and nodes become unresponsive, die
> >>> etc.
> >>>
> >>> We also see those messages in other collections, but not so frequently and
> >>> they don't cause failure in those less loaded clusters.
> >>>
> >>> Ideas?
> >>>
> >>> Thanks,
> >>> Markus
> >>>
> >>> -Original message-
>  From:Michael Braun 
>  Sent: Monday 29th January 2018 21:0

RE: 7.2.1 cluster dies within minutes after restart

2018-02-02 Thread Markus Jelsma
Hello S.G, see inline.

Thanks,
Markus
 
-Original message-
> From:S G 
> Sent: Thursday 1st February 2018 17:42
> To: solr-user@lucene.apache.org
> Subject: Re: 7.2.1 cluster dies within minutes after restart
> 
> ok, good to know that 7.x shows good performance for you too.
> 
> 1) Regarding the zookeeper problem, do you know for sure that it does not
> occur in 6.x ?
>  I would suggest to write a small load-test that can send a similar
> kind of load to 6.x and 7.x clusters and see which one breaks.
>  I know that these kind of problems can take days to occur but without
> a reproducible pattern, it may be hard to fix.

It is not reproducible in controlled environments. I have not seen Solr's 
cluster stability being able to deteriorate so much (since 1.2)

> 
> 2) Another thing is the zookeeper version.
> 7.x uses 3.4.10 version of zookeeper (See
> https://github.com/apache/lucene-solr/blob/branch_7_2/lucene/ivy-versions.properties#L192
> )
> If you are using 3.4.10, try using 3.4.9 or vice versa.
> Do not use zookeeper versions lower than 3.4.9 - they have some nasty
> bugs.

I am not  sure which we version we run. I'll check with my collegue, and 
upgrade it necessary. A note, all other Solr collections and all Hadoop daemons 
have no trouble with Zookeeper.

> 
> 3) Do take a look at zookeeper cluster too.
> ZK has 4-letter commands like ruok, srvr etc that reveal a lot of its
> internal activity.

Zookeeper logs have also shown messages of clients disconnecting and stuff, but 
it is hard to find the problem there. The problem is that one very specific 
Solr cluster dies, and others don't.

> 
> 4) Hopefully, you are not doing anything cross-DC as that could cause
> network delays and cause such problems.

No.

> 
> 5) As far as I can remember, we have seen some zookeeper issues but they
> were generally related to 3.4.6 version or
> VMs getting replaced in cloud environment and the IP's not getting
> refreshed in the ZK's configs.

I've seen 3.4.6 somewhere in our Salt files so i think we still may be running 
that version, but i'll check.

> 
> That's all I could think of from a user's perspective  --\_(0.0)_/--
> 
> Thanks
> SG
> 
> 
> 
> On Wed, Jan 31, 2018 at 1:56 PM, Markus Jelsma 
> wrote:
> 
> > Hello S.G.
> >
> > We do not complain about speed improvements at all, it is clear 7.x is
> > faster than its predecessor. The problem is stability and not recovering
> > from weird circumstances. In general, it is our high load cluster
> > containing user interaction logs that suffers the most. Our main text
> > search cluster - receiving much fewer queries - seems mostly unaffected,
> > except last Sunday. After very short but high burst of queries it entered
> > the same catatonic state the logs cluster usually dies from.
> >
> > The query burst immediately caused ZK timeouts and high heap consumption
> > (not sure which came first of the latter two). The query burst lasted for
> > 30 minutes, the excessive heap consumption continued for more than 8 hours,
> > before Solr finally realized it could relax. Most remarkable was that Solr
> > recovered on its own, ZK timeouts stopped, heap went back to normal.
> >
> > There seems to be a causality between high load and this state.
> >
> > We really want to get this fixed for ourselves and everyone else that may
> > encounter this problem, but i don't know how, so i need much more feedback
> > and hints from those who have deep understanding of inner working of
> > Solrcloud and changes since 6.x.
> >
> > To be clear, we don't have the problem of 15 second ZK timeout, we use 30.
> > Is 30 too low still? Is it even remotely related to this problem? What does
> > load have to do with it?
> >
> > We are not able to reproduce it in lab environments. It can take minutes
> > after cluster startup for it to occur, but also days.
> >
> > I've been slightly annoyed by problems that can occur in a board time
> > span, it is always bad luck for reproduction.
> >
> > Any help getting further is much appreciated.
> >
> > Many thanks,
> > Markus
> >
> > -Original message-
> > > From:S G 
> > > Sent: Wednesday 31st January 2018 21:48
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: 7.2.1 cluster dies within minutes after restart
> > >
> > > We did some basic load testing on our 7.1.0 and 7.2.1 clusters.
> > > And that came out all right.
> > > We saw a performance increase of about 30% in read latencies between
> > 6.6.0
> > > and 7.1.0
> > > And then we saw a performance degradation of about 10% between 7.1.0 and
> > > 7.2.1 in many metrics.
> > > But overall, it still seems better than 6.6.0.
> > >
> > > I will check for the errors too in the logs but the nodes were responsive
> > > for all the 23+ hours we did the load test.
> > >
> > > Disclaimer: We do not test facets and pivots or block-joins. And will add
> > > those features to our load-testing tool sometime this year.
> > >
> > > Thanks
> > > SG
> > >
> > >

Re: Searching for an efficient and scalable way to filter query results using non-indexed and dynamic range values

2018-02-02 Thread Alessandro Benedetti
1) Diego's observation about IDF is absolutely correct here, but I don't
think he was pointing it to be a negative aspect of your new approach.
I think he just wanted to warn you about this.

The way BM25 uses the IDF feature of a term is to estimate how important is
the term in the context ( giving its document frequency in the corpus).
I don't think you should remove IDF from your similarity function, actually
the IDF value coming from the bigger index is closer to reality ( being your
domain the web, an ideal IDF should be the one calculated over the entire
internet...)

Of course this is valid if you like BM25 as a similarity function ( and if
it is fit for purpose)

2) Related the way to evaluate the experiments based on experiment and
crawling cycle, the quickest way to do that may be to have the crawlingCycle
field to be a dynamic field.
the name of the field will depend on the experimentID.
such as : *_crawling_cycle
For experimentId= exp01, you will have the field : exp01_crawling_cycle.
For experimentId= exp02, you will have the field : exp02_crawling_cycle.
If I understood your evaluation time queries, you will be able to check each
field depending on the experiment you are interested.
My doubt using the incremental approach is that running a query such as :
"I want to know for the experiment 1 , which pages where crawled at the
first cycle"
Will not work, as you just store the last cycle that involved that page. So
the exact cycles ids assigned to the pages will not be known.
But I am not sure I fully understood your use case, so ignore my observation
if it is useless.

Regards





-
---
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: External file fields

2018-02-02 Thread Charlie Hull

On 01/02/2018 18:55, Brian Yee wrote:

Hello,

I want to use external file field to store frequently changing inventory and 
price data. I got a proof of concept working with a mock text file and this 
will suit my needs.

What is the best way to keep this file updated in a fast way. Ideally I would 
like to read changes from a Kafka queue and write to the file. But it seems 
like I would have to open the whole file, read the whole file, find the line I 
want to change, and write the whole file for every change. Is there a better 
way to do that? That approach seems like it would be difficult/slow if the file 
is several million lines long.

Also, once I come up with a way to update the file quickly, what is the best 
way to distribute the file to all the different solrcloud nodes in the correct 
directory?

Another approach would be the XJoin plugin we wrote - if you wait a few 
days we should have an updated patch for Solr v6.5 and possibly v7. 
XJoin lets you filter/join/rank Solr results using an external data source.


http://www.flax.co.uk/blog/2016/01/25/xjoin-solr-part-1-filtering-using-price-discount-data/
http://www.flax.co.uk/blog/2016/01/29/xjoin-solr-part-2-click-example/


Cheers

Charlie


--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: External file fields

2018-02-02 Thread Emir Arnautović
Maybe you can try or extend Sematext’s Redis parser: 
https://github.com/sematext/solr-redis 
. Downside of this approach is another 
moving part - Redis.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Feb 2018, at 19:55, Brian Yee  wrote:
> 
> Hello,
> 
> I want to use external file field to store frequently changing inventory and 
> price data. I got a proof of concept working with a mock text file and this 
> will suit my needs.
> 
> What is the best way to keep this file updated in a fast way. Ideally I would 
> like to read changes from a Kafka queue and write to the file. But it seems 
> like I would have to open the whole file, read the whole file, find the line 
> I want to change, and write the whole file for every change. Is there a 
> better way to do that? That approach seems like it would be difficult/slow if 
> the file is several million lines long.
> 
> Also, once I come up with a way to update the file quickly, what is the best 
> way to distribute the file to all the different solrcloud nodes in the 
> correct directory?



Re: Help with Boolean search using Solr parser edismax

2018-02-02 Thread Emir Arnautović
Hi Wendy,
A bit off-topic, but forgot to mention in previous mail: dots in field names 
are not recommended. Even it obviously works for you, I think I’ve seen people 
reporting some issues caused by dot in field names (I cannot find some 
reference now). So, if you plan some system upgrade in the future, you might 
want to get rid of field names with dots - you can safely use underscore.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 1 Feb 2018, at 17:19, Wendy2  wrote:
> 
> And the coupon has no expiration date on it (LOL).  Thank you again, Emir!
> 
> Best Regards,
> 
> Wendy
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html