date:20170818

Re: Solr query help

2017-08-18 Thread Tim Casey

You can add a ~3 to the query to allow the order to be reversed, but you
will get extra hits.  Maybe it is a ~4, i can never remember on phrases and
reversals.  I usually just try it.

Alternatively, you can create a custom query field for what you need from
dates.  For example, if you want to search by queries like "fourth
tuesday", you need to have 'tuesday" in a query and better to have " 4
tuesday " as part of the field.

Instead of a phrase query, you do +2017 +(04 03) +(01 02 03 04 05 06 07 08
09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31),
which does all the days in march and apr.  A more complicated nested query
would do more complicated date ranges.

I don't know if there is a way to get repeating date range queries, like
the fourth tuesday for all months in a year.  The date support is usually
about querying a specified range at a time.

tim

On Fri, Aug 18, 2017 at 11:19 AM, Webster Homer 
wrote:

> What field types are you using for your dates?
> Have a look at:
> https://cwiki.apache.org/confluence/display/solr/Working+with+Dates
>
> On Thu, Aug 17, 2017 at 10:08 AM, Nawab Zada Asad Iqbal 
> wrote:
>
> > Hi Krishna
> >
> > I haven't used date range queries myself. But if Solr only supports a
> > particular date format, you can write a thin client for queries, which
> will
> > convert the date to solr's format and query solr.
> >
> > Nawab
> >
> > On Thu, Aug 17, 2017 at 7:36 AM, chiru s  wrote:
> >
> > > Hello guys
> > >
> > > I am working on Apache solr and I am stuck with a use case.
> > >
> > >
> > > The input data will be in the documents like 2017/03/15 in 1st
> document,
> > >
> > > 2017/04/15 in 2nd doc,
> > >
> > > 2017/05/15 in 3rd doc,
> > >
> > > 2017/06/15 in 4th doc so on
> > >
> > > But while fetching the data it should fetch like 03/15/2017 for the
> first
> > > doc and so on.
> > >
> > > My requirement is like this ..
> > >
> > >
> > > The data is like above and when I do an fq with name:[2017/03/15 TO
> > > 2017/05/15] it fetches me the 1st three documents.. but the need the
> data
> > > as 03/15/2017 instead of 2017/03/15.
> > >
> > >
> > > I tried solr.pattetnReplaceCharFilterFactory but it doesn't seem
> > working..
> > >
> > > Can you please help on the above.
> > >
> > >
> > > Thanks in advance
> > >
> > >
> > > Krishna...
> > >
> >
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
>

Re: Get results in multiple orders (multiple boosts)

2017-08-18 Thread Rick Leir

Luca
Walter has got the best word on this, you should use SQL for sorting (maybe 
mySQL or Postgres). If you also need searching, you can create a Solr index by 
ingesting from the SQL database. The Solr index would be just used for 
searching. Cheers -- Rick
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Joe Obernberger

Thank you for the detailed response Shawn!  I've read it several times.  
Yes, that particular machine has 12 cores that are hyper-threaded.  Does 
Solr do something special when not running in HDFS to allocate memory 
that would result in VIRT showing memory required for index data size?


In my experience the VIRT shows (for java anyway) what the JVM wanted to 
allocate.  If I specify -Xms75G, VIRT will show 75G, but RES may show 
much less if the program doesn't do anything.
For example, I wrote a program that sleeps and then exits.  If I run it 
with java --Xms75G -jar blah.jar, top reports a VIRT of ~80G (notice PID 
29566)


top - 17:09:05 up 50 days,  4:24,  2 users,  load average: 9.82, 11.31, 
12.41

Tasks: 410 total,   1 running, 409 sleeping,   0 stopped,   0 zombie
Cpu(s): 39.8%us,  0.7%sy, 20.1%ni, 39.3%id,  0.0%wa, 0.0%hi,  0.2%si,  
0.0%st

Mem:  82505408k total, 76160560k used,  6344848k free, 356212k buffers
Swap: 33554428k total,   115756k used, 33438672k free, 14011992k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+  COMMAND
14260 solr  20   0 41.6g  33g  19m S 629.4 42.3 57412:59 java
29566 joeo  20   0 80.2g 275m  12m S  0.3  0.3 0:00.93 java

Note that the OS didn't actually give PID 29566 80G of memory, it 
actually gave it 275m.  Right?  Thanks again!


-Joe

On 8/18/2017 4:15 PM, Shawn Heisey wrote:

On 8/18/2017 1:05 PM, Joe Obernberger wrote:

Thank you Shawn.  Please see:
http://www.lovehorsepower.com/Vesta
for screen shots of top
(http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and
several screen shots over various times of jvisualvm.

There is also the GC log and the regular solr.log for one server
(named Vesta).  Please note that we are using HDFS for storage.  I
love top, but also use htop and atop as they show additional
information.  In general we are RAM limited and therefore do not have
much cache for OS/disk as we would like, but this issue is CPU
related.  After restarting the one node, the CPU usage stayed low for
a while, but then eventually comes up to ~800% where it will stay.

Your GC log does not show any evidence of extreme GC activity.  The
longest pause in the whole thing is 1.4 seconds, and the average pause
is only seven milliseconds.  Looking at percentile statistics, GC
performance is amazing, especially given the rather large heap size.

Problems with insufficient disk caching memory do frequently manifest as
high CPU usage, because that situation will require waiting on I/O.
When the CPU spends a lot of time in iowait, total CPU usage tends to be
very high.  The iowait CPU percentage on the top output when that
screenshot was taken was 8.5.  This sounds like a small number, but in
fact it is quite high.  Very healthy Solr installs will have an
extremely low iowait percentage -- possibly zero -- because they will
rarely read off the disk.  I can see that on the atop screenshot, iowait
percentage is 172.

The load average on the system is well above 11. The atop output shows
24 CPU cores (which might actually be 12 if the CPUs have
hypherthreading).  Even with all those CPUs, that load average is high
enough to be concerned.

I can see that the system has about 70GB of memory directly allocated to
various Java processes, leaving about 30GB for disk caching purposes.
Walter has noted that those same java processes have allocated over
200GB of virtual memory.  If we subtract the 70GB of allocated heap,
this would tend to indicate that those processes, one of which is Solr,
are accessing about 130GB of data.

I have no idea how the memory situation works with HDFS, or how this
screenshot should look on a healthy system.  Having 30GB of memory to
cache the 130GB of data opened by these Java processes might be enough,
or it might not.  If this were a system NOT running HDFS, then I would
say that there isn't enough memory.  Putting HDFS into this mix makes it
difficult for me to say anything useful, simply because I do not know
much about it.  You should consult with an HDFS expert and ask them how
to make sure that actual disk accesses are rare -- you want as much of
the index data sitting in RAM on the Solr server as you can possibly get.

Addressing a message later in the thread: The concern with high virtual
memory is actually NOT swapping.  It's effective use of disk caching
memory.  Let's examine a hypothetical situation with a machine running
nothing but Solr, using a standard filesystem for data storage.

The "top" output in this hypothetical situation indicates that total
system memory is 128GB and there is no swap usage.  The Solr process has
a RES memory size of 25GB, a SHR size of a few megabytes, and a VIRT
size of 1000GB.  This tells me that their heap is approximately 25 GB,
and that Solr is accessing 975GB of index data.  At that point, I know
that they have about 103GB of memory to cache nearly a terabyte of index
data.  This is a situation where there is nowhere near enough memory for
good performance.

Th

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Shawn Heisey

On 8/18/2017 1:05 PM, Joe Obernberger wrote:
> Thank you Shawn.  Please see:
> http://www.lovehorsepower.com/Vesta
> for screen shots of top
> (http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and
> several screen shots over various times of jvisualvm.
>
> There is also the GC log and the regular solr.log for one server
> (named Vesta).  Please note that we are using HDFS for storage.  I
> love top, but also use htop and atop as they show additional
> information.  In general we are RAM limited and therefore do not have
> much cache for OS/disk as we would like, but this issue is CPU
> related.  After restarting the one node, the CPU usage stayed low for
> a while, but then eventually comes up to ~800% where it will stay. 

Your GC log does not show any evidence of extreme GC activity.  The
longest pause in the whole thing is 1.4 seconds, and the average pause
is only seven milliseconds.  Looking at percentile statistics, GC
performance is amazing, especially given the rather large heap size.

Problems with insufficient disk caching memory do frequently manifest as
high CPU usage, because that situation will require waiting on I/O. 
When the CPU spends a lot of time in iowait, total CPU usage tends to be
very high.  The iowait CPU percentage on the top output when that
screenshot was taken was 8.5.  This sounds like a small number, but in
fact it is quite high.  Very healthy Solr installs will have an
extremely low iowait percentage -- possibly zero -- because they will
rarely read off the disk.  I can see that on the atop screenshot, iowait
percentage is 172.

The load average on the system is well above 11. The atop output shows
24 CPU cores (which might actually be 12 if the CPUs have
hypherthreading).  Even with all those CPUs, that load average is high
enough to be concerned.

I can see that the system has about 70GB of memory directly allocated to
various Java processes, leaving about 30GB for disk caching purposes. 
Walter has noted that those same java processes have allocated over
200GB of virtual memory.  If we subtract the 70GB of allocated heap,
this would tend to indicate that those processes, one of which is Solr,
are accessing about 130GB of data.

I have no idea how the memory situation works with HDFS, or how this
screenshot should look on a healthy system.  Having 30GB of memory to
cache the 130GB of data opened by these Java processes might be enough,
or it might not.  If this were a system NOT running HDFS, then I would
say that there isn't enough memory.  Putting HDFS into this mix makes it
difficult for me to say anything useful, simply because I do not know
much about it.  You should consult with an HDFS expert and ask them how
to make sure that actual disk accesses are rare -- you want as much of
the index data sitting in RAM on the Solr server as you can possibly get.

Addressing a message later in the thread: The concern with high virtual
memory is actually NOT swapping.  It's effective use of disk caching
memory.  Let's examine a hypothetical situation with a machine running
nothing but Solr, using a standard filesystem for data storage.

The "top" output in this hypothetical situation indicates that total
system memory is 128GB and there is no swap usage.  The Solr process has
a RES memory size of 25GB, a SHR size of a few megabytes, and a VIRT
size of 1000GB.  This tells me that their heap is approximately 25 GB,
and that Solr is accessing 975GB of index data.  At that point, I know
that they have about 103GB of memory to cache nearly a terabyte of index
data.  This is a situation where there is nowhere near enough memory for
good performance.

Thanks,
Shawn

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Joe Obernberger


Ah!  Yes - that makes much more sense:

CPU: http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_CPU.jpg
Mem: http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_Mem.jpg

-Joe


On 8/18/2017 3:35 PM, Michael Braun wrote:
When I recommended JVisualVM, specifically the "Sampling" portion of 
the app - just using it to see the monitoring isn't half as useful.


On Fri, Aug 18, 2017 at 3:31 PM, Joe Obernberger 
mailto:joseph.obernber...@gmail.com>> 
wrote:


Hi Walter - I see what you are saying, but the machine is not
actively swapping (that would be the concern - right?) It's the
CPU usage that I'm trying to figure out.  Htop reports that there
is about 20G of disk cache in use, and about 76G of RAM in use by
programs.  VIRT memory is what was requested to be allocated, not
what was actually allocated; that is RES.  Unless my understanding
of top is wrong.

http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_htop.jpg


atop:
http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_atop.jpg


-Joe


On 8/18/2017 3:12 PM, Walter Underwood wrote:

I see a server with 100Gb of memory and processes (java and
jsvc) using 203Gb of virtual memory. Hmm.

wunder
Walter Underwood
wun...@wunderwood.org 
http://observer.wunderwood.org/
 (my blog)


On Aug 18, 2017, at 12:05 PM, Joe Obernberger
mailto:joseph.obernber...@gmail.com>> wrote:

Thank you Shawn.  Please see:
http://www.lovehorsepower.com/Vesta

for screen shots of top
(http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg
)
and several screen shots over various times of jvisualvm.

There is also the GC log and the regular solr.log for one
server (named Vesta).  Please note that we are using HDFS
for storage.  I love top, but also use htop and atop as
they show additional information.  In general we are RAM
limited and therefore do not have much cache for OS/disk
as we would like, but this issue is CPU related.  After
restarting the one node, the CPU usage stayed low for a
while, but then eventually comes up to ~800% where it will
stay.

Please let me know if there is other information that I
can provide, or what I should be looking for in the GC
logs.  Thanks!

-Joe


On 8/18/2017 2:25 PM, Shawn Heisey wrote:

On 8/18/2017 10:37 AM, Joe Obernberger wrote:

Indexing about 15 million documents per day across
100 shards on 45
servers.  Up until about 350 million documents,
each of the solr
instances was taking up about 1 core (100% CPU). 
Recently, they all

jumped to 700%.  Is this normal?  Anything that I
can check for?

I don't see anything unusual in the solr logs.
Sample from the GC logs:

A sample from GC logs won't reveal anything.  We would
need the entire
GC log.  To share something like that, you need a file
sharing site,
something like dropbox.  With the full log, we can
analyze it for
indications of GC problems.

There are many things that can cause a sudden massive
increase in CPU
usage.  In this case, it is likely due to increased
requirements because
indexing 15 million documents per day has made the
index larger, and now
it probably needs additional resources on each server
that are not
available.

The most common need for additional resources is
unallocated system
memory for the operating system to cache the index. 
Something else that

sometimes happens is that the index outgrows the max
heap size, which we
would be able to learn from the full GC log.

These problem are discussed here:

https://wiki.apache.org/solr/SolrPerformanceProblems


Another useful piece of information is obtained by
running the "top"
utility on the commandline, pressing shift-M to sort

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Joe Obernberger

Hi Walter - I see what you are saying, but the machine is not actively 
swapping (that would be the concern - right?) It's the CPU usage that 
I'm trying to figure out.  Htop reports that there is about 20G of disk 
cache in use, and about 76G of RAM in use by programs.  VIRT memory is 
what was requested to be allocated, not what was actually allocated; 
that is RES.  Unless my understanding of top is wrong.


http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_htop.jpg

atop:
http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_atop.jpg

-Joe

On 8/18/2017 3:12 PM, Walter Underwood wrote:

I see a server with 100Gb of memory and processes (java and jsvc) using 203Gb 
of virtual memory. Hmm.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 18, 2017, at 12:05 PM, Joe Obernberger  
wrote:

Thank you Shawn.  Please see:
http://www.lovehorsepower.com/Vesta
for screen shots of top 
(http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and several screen 
shots over various times of jvisualvm.

There is also the GC log and the regular solr.log for one server (named Vesta). 
 Please note that we are using HDFS for storage.  I love top, but also use htop 
and atop as they show additional information.  In general we are RAM limited 
and therefore do not have much cache for OS/disk as we would like, but this 
issue is CPU related.  After restarting the one node, the CPU usage stayed low 
for a while, but then eventually comes up to ~800% where it will stay.

Please let me know if there is other information that I can provide, or what I 
should be looking for in the GC logs.  Thanks!

-Joe


On 8/18/2017 2:25 PM, Shawn Heisey wrote:

On 8/18/2017 10:37 AM, Joe Obernberger wrote:

Indexing about 15 million documents per day across 100 shards on 45
servers.  Up until about 350 million documents, each of the solr
instances was taking up about 1 core (100% CPU).  Recently, they all
jumped to 700%.  Is this normal?  Anything that I can check for?

I don't see anything unusual in the solr logs.  Sample from the GC logs:

A sample from GC logs won't reveal anything.  We would need the entire
GC log.  To share something like that, you need a file sharing site,
something like dropbox.  With the full log, we can analyze it for
indications of GC problems.

There are many things that can cause a sudden massive increase in CPU
usage.  In this case, it is likely due to increased requirements because
indexing 15 million documents per day has made the index larger, and now
it probably needs additional resources on each server that are not
available.

The most common need for additional resources is unallocated system
memory for the operating system to cache the index.  Something else that
sometimes happens is that the index outgrows the max heap size, which we
would be able to learn from the full GC log.

These problem are discussed here:

https://wiki.apache.org/solr/SolrPerformanceProblems

Another useful piece of information is obtained by running the "top"
utility on the commandline, pressing shift-M to sort by memory, and
taking a screenshot of that display.  Then you would need a file-sharing
website to share the image.

Thanks,
Shawn

---
This email has been checked for viruses by AVG.
http://www.avg.com

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Walter Underwood

I see a server with 100Gb of memory and processes (java and jsvc) using 203Gb 
of virtual memory. Hmm.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 18, 2017, at 12:05 PM, Joe Obernberger  
> wrote:
> 
> Thank you Shawn.  Please see:
> http://www.lovehorsepower.com/Vesta
> for screen shots of top 
> (http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and several 
> screen shots over various times of jvisualvm.
> 
> There is also the GC log and the regular solr.log for one server (named 
> Vesta).  Please note that we are using HDFS for storage.  I love top, but 
> also use htop and atop as they show additional information.  In general we 
> are RAM limited and therefore do not have much cache for OS/disk as we would 
> like, but this issue is CPU related.  After restarting the one node, the CPU 
> usage stayed low for a while, but then eventually comes up to ~800% where it 
> will stay.
> 
> Please let me know if there is other information that I can provide, or what 
> I should be looking for in the GC logs.  Thanks!
> 
> -Joe
> 
> 
> On 8/18/2017 2:25 PM, Shawn Heisey wrote:
>> On 8/18/2017 10:37 AM, Joe Obernberger wrote:
>>> Indexing about 15 million documents per day across 100 shards on 45
>>> servers.  Up until about 350 million documents, each of the solr
>>> instances was taking up about 1 core (100% CPU).  Recently, they all
>>> jumped to 700%.  Is this normal?  Anything that I can check for?
>>> 
>>> I don't see anything unusual in the solr logs.  Sample from the GC logs:
>> A sample from GC logs won't reveal anything.  We would need the entire
>> GC log.  To share something like that, you need a file sharing site,
>> something like dropbox.  With the full log, we can analyze it for
>> indications of GC problems.
>> 
>> There are many things that can cause a sudden massive increase in CPU
>> usage.  In this case, it is likely due to increased requirements because
>> indexing 15 million documents per day has made the index larger, and now
>> it probably needs additional resources on each server that are not
>> available.
>> 
>> The most common need for additional resources is unallocated system
>> memory for the operating system to cache the index.  Something else that
>> sometimes happens is that the index outgrows the max heap size, which we
>> would be able to learn from the full GC log.
>> 
>> These problem are discussed here:
>> 
>> https://wiki.apache.org/solr/SolrPerformanceProblems
>> 
>> Another useful piece of information is obtained by running the "top"
>> utility on the commandline, pressing shift-M to sort by memory, and
>> taking a screenshot of that display.  Then you would need a file-sharing
>> website to share the image.
>> 
>> Thanks,
>> Shawn
>> 
>> ---
>> This email has been checked for viruses by AVG.
>> http://www.avg.com
>> 
>

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Joe Obernberger


Thank you Shawn.  Please see:
http://www.lovehorsepower.com/Vesta
for screen shots of top 
(http://www.lovehorsepower.com/Vesta/VestaSolr6.6.0_top.jpg) and several 
screen shots over various times of jvisualvm.


There is also the GC log and the regular solr.log for one server (named 
Vesta).  Please note that we are using HDFS for storage.  I love top, 
but also use htop and atop as they show additional information.  In 
general we are RAM limited and therefore do not have much cache for 
OS/disk as we would like, but this issue is CPU related.  After 
restarting the one node, the CPU usage stayed low for a while, but then 
eventually comes up to ~800% where it will stay.


Please let me know if there is other information that I can provide, or 
what I should be looking for in the GC logs.  Thanks!


-Joe


On 8/18/2017 2:25 PM, Shawn Heisey wrote:

On 8/18/2017 10:37 AM, Joe Obernberger wrote:

Indexing about 15 million documents per day across 100 shards on 45
servers.  Up until about 350 million documents, each of the solr
instances was taking up about 1 core (100% CPU).  Recently, they all
jumped to 700%.  Is this normal?  Anything that I can check for?

I don't see anything unusual in the solr logs.  Sample from the GC logs:

A sample from GC logs won't reveal anything.  We would need the entire
GC log.  To share something like that, you need a file sharing site,
something like dropbox.  With the full log, we can analyze it for
indications of GC problems.

There are many things that can cause a sudden massive increase in CPU
usage.  In this case, it is likely due to increased requirements because
indexing 15 million documents per day has made the index larger, and now
it probably needs additional resources on each server that are not
available.

The most common need for additional resources is unallocated system
memory for the operating system to cache the index.  Something else that
sometimes happens is that the index outgrows the max heap size, which we
would be able to learn from the full GC log.

These problem are discussed here:

https://wiki.apache.org/solr/SolrPerformanceProblems

Another useful piece of information is obtained by running the "top"
utility on the commandline, pressing shift-M to sort by memory, and
taking a screenshot of that display.  Then you would need a file-sharing
website to share the image.

Thanks,
Shawn

---
This email has been checked for viruses by AVG.
http://www.avg.com

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Shawn Heisey

On 8/18/2017 10:37 AM, Joe Obernberger wrote:
> Indexing about 15 million documents per day across 100 shards on 45
> servers.  Up until about 350 million documents, each of the solr
> instances was taking up about 1 core (100% CPU).  Recently, they all
> jumped to 700%.  Is this normal?  Anything that I can check for?
> 
> I don't see anything unusual in the solr logs.  Sample from the GC logs:

A sample from GC logs won't reveal anything.  We would need the entire
GC log.  To share something like that, you need a file sharing site,
something like dropbox.  With the full log, we can analyze it for
indications of GC problems.

There are many things that can cause a sudden massive increase in CPU
usage.  In this case, it is likely due to increased requirements because
indexing 15 million documents per day has made the index larger, and now
it probably needs additional resources on each server that are not
available.

The most common need for additional resources is unallocated system
memory for the operating system to cache the index.  Something else that
sometimes happens is that the index outgrows the max heap size, which we
would be able to learn from the full GC log.

These problem are discussed here:

https://wiki.apache.org/solr/SolrPerformanceProblems

Another useful piece of information is obtained by running the "top"
utility on the commandline, pressing shift-M to sort by memory, and
taking a screenshot of that display.  Then you would need a file-sharing
website to share the image.

Thanks,
Shawn

Re: Solr query help

2017-08-18 Thread Webster Homer

What field types are you using for your dates?
Have a look at:
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates

On Thu, Aug 17, 2017 at 10:08 AM, Nawab Zada Asad Iqbal 
wrote:

> Hi Krishna
>
> I haven't used date range queries myself. But if Solr only supports a
> particular date format, you can write a thin client for queries, which will
> convert the date to solr's format and query solr.
>
> Nawab
>
> On Thu, Aug 17, 2017 at 7:36 AM, chiru s  wrote:
>
> > Hello guys
> >
> > I am working on Apache solr and I am stuck with a use case.
> >
> >
> > The input data will be in the documents like 2017/03/15 in 1st document,
> >
> > 2017/04/15 in 2nd doc,
> >
> > 2017/05/15 in 3rd doc,
> >
> > 2017/06/15 in 4th doc so on
> >
> > But while fetching the data it should fetch like 03/15/2017 for the first
> > doc and so on.
> >
> > My requirement is like this ..
> >
> >
> > The data is like above and when I do an fq with name:[2017/03/15 TO
> > 2017/05/15] it fetches me the 1st three documents.. but the need the data
> > as 03/15/2017 instead of 2017/03/15.
> >
> >
> > I tried solr.pattetnReplaceCharFilterFactory but it doesn't seem
> working..
> >
> > Can you please help on the above.
> >
> >
> > Thanks in advance
> >
> >
> > Krishna...
> >
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: Request Highlighting only for the final set of rows

2017-08-18 Thread Nawab Zada Asad Iqbal

Actually, part of me is thinking that there are valid use cases for having
fl and hl.fl with different values. e.g, receive name etc. in “clean” form
in fl field and receive both name and address in html formatted form (by
specifying in hl.fl)


On Fri, Aug 18, 2017 at 10:57 AM, Nawab Zada Asad Iqbal 
wrote:

> Actually, i realize that it is an incorrect use on my part to pass only
> id+score in fl and specify more fields in the hl.fl fields. This was
> somehow supported in older versions but the new behavior is actually a
> performance improvement for the scenario when user is asking for only ids.
>
>
> Nawab
>
> On Fri, Aug 18, 2017 at 8:33 AM, Nawab Zada Asad Iqbal 
> wrote:
>
>> Thanks Erick for the pointing to better option. I will explore that.
>> After your email, I found that if i have specified 'fl=*' in the query then
>> it is doing the right thing (a 2 pass process). However, my queries had
>> 'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found
>> that the shards are asked for highlighting all the results on the first
>> request (and there is no second request).
>>
>> The fl=* query is (in my sample case) finishing in 100 msec while same
>> query with fl=id+score finishes in 1200 msec.
>>
>> Here are the two queries;
>>
>> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&f
>> l=*&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/
>> solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrd
>> ev.test.net:8986/solr/filesearch&wt=json
>>
>>
>> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&f
>> l=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev.test
>> .net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesea
>> rch,solrdev.test.net:8986/solr/filesearch&wt=json
>>
>>
>> Thanks
>> Nawab
>>
>>
>>
>>
>> On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson 
>> wrote:
>>
>>> I don't think you're reading it correctly. First of all, if you're
>>> going to do be doing deep paging you should be using cusorMark, see:
>>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>>>
>>> Second, it's a two-pass process if you don't use cursormark. The first
>>> pass gets the candidate docs from each shard. But all it returns is
>>> the ID and sort criteria. Then the aggregator node gets the _true_ top
>>> N after sorting all the lists from each shard and issues a second
>>> request for _only_ those docs that have made the top N from each sub
>>> shard, and those should be the only ones highlighted.
>>>
>>> Do you have any evidence to the contrary that they're all being
>>> highlighted? Or are you misinterpreting the log message for the first
>>> pass?
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal 
>>> wrote:
>>> > Hi,
>>> >
>>> > In a multi-node solr installation (without SolrCloud), during a paging
>>> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
>>> rows
>>> > from each shard. If highlighting is ON, then the primary node is
>>> asking for
>>> > highlighting all the 1200 results from each shard, which doesn't scale
>>> > well. Is there a way to break the shard query in two steps e.g. ask
>>> for the
>>> > 1200 rows and after sorting the 1200 responses from each shard and
>>> finding
>>> > final rows to return (1001 to 1200) , issue another query to shards for
>>> > asking highlighted response for the relevant docs?
>>> >
>>> >
>>> >
>>> > Thanks
>>> > Nawab
>>>
>>
>>
>

Re: Tlogs not being deleted/truncated

2017-08-18 Thread Webster Homer

I have an update on this. While I was on vacation, there were a number of
alerts.
Our autoCommit settings were (and are) the following:

  ${solr.autoCommit.maxTime:60}
   false

The startup script was NOT setting solr.autoCommit.maxTime. It seemed that
autoCommits were sporadic at best. Our autoSoftCommit was working.
Our admistrators changed the Solr startup script to set
solr.autoCommit.maxTime. Which they set as follows, i the script.
SOLR_OPTS="$SOLR_OPTS -Dsolr.autoCommit.maxTime=6"

They claim that this has fixed our tlog problems across the board. Commits
appear to be reliable now. As a developer I don't have visibility into our
production systems. I find it odd that explicitly setting the value in the
solr startup fixed the issue. We had wanted to have this value determined
peer collection but it does seem to address the problem.

This seems like a bug in solr to have it behave like this!

We are running Solr 6.2.0 with our production systems in Google Cloud We
use cdcr to replicate from our on prem systems to the Google Cloud

On Wed, Jul 12, 2017 at 9:19 AM, Webster Homer 
wrote:

> We have buffers disabled as described in the CDCR documentation. We also
> have autoCommit set for hard commits, but openSearcher false. We also have
> autoSoftCommit set.
>
>
> On Tue, Jul 11, 2017 at 5:00 PM, Xie, Sean  wrote:
>
>> Please see my previous thread. I have to disable buffer on source cluster
>> and a scheduled hard commit with scheduled logscheduler to make it work.
>>
>>
>> -- Thank you
>> Sean
>>
>> From: jmyatt mailto:jmy...@wayfair.com>>
>> Date: Tuesday, Jul 11, 2017, 1:56 PM
>> To: solr-user@lucene.apache.org > solr-user@lucene.apache.org>>
>> Subject: [EXTERNAL] Re: Tlogs not being deleted/truncated
>>
>> another interesting clue in my case (different from what WebsterHomer is
>> seeing): the response from /cdcr?action=QUEUES reflects what I would
>> expect
>> to see in the tlog directory but it's not accurate.  By that I mean
>> tlogTotalSize shows 1500271 (bytes) and tlogTotalCount shows 2.  This
>> changes as more updates come in and autoCommit runs - sometimes
>> tlogTotalCount is 1 instead of 2, and the tlogTotalSize changes but stays
>> in
>> that low range.
>>
>> But on the filesystem, all the tlogs are still there.  Perhaps the ignored
>> exception noted above is in fact a problem?
>>
>>
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble
>> .com/Tlogs-not-being-deleted-truncated-tp4341958p4345477.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>> Confidentiality Notice::  This email, including attachments, may include
>> non-public, proprietary, confidential or legally privileged information.
>> If you are not an intended recipient or an authorized agent of an intended
>> recipient, you are hereby notified that any dissemination, distribution or
>> copying of the information contained in or transmitted with this e-mail is
>> unauthorized and strictly prohibited.  If you have received this email in
>> error, please notify the sender by replying to this message and permanently
>> delete this e-mail, its attachments, and any copies of it immediately.  You
>> should not retain, copy or use this e-mail or any attachment for any
>> purpose, nor disclose all or any part of the contents to any other person.
>> Thank you.
>>
>
>

-- 

This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Re: Request Highlighting only for the final set of rows

2017-08-18 Thread Nawab Zada Asad Iqbal

Actually, i realize that it is an incorrect use on my part to pass only
id+score in fl and specify more fields in the hl.fl fields. This was
somehow supported in older versions but the new behavior is actually a
performance improvement for the scenario when user is asking for only ids.


Nawab

On Fri, Aug 18, 2017 at 8:33 AM, Nawab Zada Asad Iqbal 
wrote:

> Thanks Erick for the pointing to better option. I will explore that. After
> your email, I found that if i have specified 'fl=*' in the query then it is
> doing the right thing (a 2 pass process). However, my queries had
> 'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found
> that the shards are asked for highlighting all the results on the first
> request (and there is no second request).
>
> The fl=* query is (in my sample case) finishing in 100 msec while same
> query with fl=id+score finishes in 1200 msec.
>
> Here are the two queries;
>
> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&;
> fl=*&start=200&rows=200&q=nawab&shards=solrdev.test.net:
> 8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,
> solrdev.test.net:8986/solr/filesearch&wt=json
>
>
> http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&;
> fl=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev.
> test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/
> filesearch,solrdev.test.net:8986/solr/filesearch&wt=json
>
>
> Thanks
> Nawab
>
>
>
>
> On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson 
> wrote:
>
>> I don't think you're reading it correctly. First of all, if you're
>> going to do be doing deep paging you should be using cusorMark, see:
>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>>
>> Second, it's a two-pass process if you don't use cursormark. The first
>> pass gets the candidate docs from each shard. But all it returns is
>> the ID and sort criteria. Then the aggregator node gets the _true_ top
>> N after sorting all the lists from each shard and issues a second
>> request for _only_ those docs that have made the top N from each sub
>> shard, and those should be the only ones highlighted.
>>
>> Do you have any evidence to the contrary that they're all being
>> highlighted? Or are you misinterpreting the log message for the first
>> pass?
>>
>> Best,
>> Erick
>>
>> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal 
>> wrote:
>> > Hi,
>> >
>> > In a multi-node solr installation (without SolrCloud), during a paging
>> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
>> rows
>> > from each shard. If highlighting is ON, then the primary node is asking
>> for
>> > highlighting all the 1200 results from each shard, which doesn't scale
>> > well. Is there a way to break the shard query in two steps e.g. ask for
>> the
>> > 1200 rows and after sorting the 1200 responses from each shard and
>> finding
>> > final rows to return (1001 to 1200) , issue another query to shards for
>> > asking highlighted response for the relevant docs?
>> >
>> >
>> >
>> > Thanks
>> > Nawab
>>
>
>

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Joe Obernberger


I was able to attach to one server by changing the startup and adding:

|-Dcom.sun.management.jmxremote \ 
-Dcom.sun.management.jmxremote.local.only=false \ 
-Dcom.sun.management.jmxremote.ssl=false \ 
-Dcom.sun.management.jmxremote.authenticate=false \ 
-Dcom.sun.management.jmxremote.port=18983 \ 
-Dcom.sun.management.jmxremote.rmi.port=18983|


I will let you know the results.

On 8/18/2017 12:43 PM, Michael Braun wrote:

Have you attached JVisualVM or a similar application to the process to
sample where the time is being spent? It can be very helpful for debugging
this sort of problem.

On Fri, Aug 18, 2017 at 12:37 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:


Indexing about 15 million documents per day across 100 shards on 45
servers.  Up until about 350 million documents, each of the solr instances
was taking up about 1 core (100% CPU).  Recently, they all jumped to 700%.
Is this normal?  Anything that I can check for?

I don't see anything unusual in the solr logs.  Sample from the GC logs:

---

2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc
.log.2
OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE (1.8.0_141-b16),
built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 4.4.7 20120313 (Red
Hat 4.4.7-18)
Memory: 4k page, physical 99016188k(796940k free), swap
33554428k(32614048k free)
CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12
-XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216
-XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184
-XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304
-XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300
-XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 <(803)%20550-7200>
-XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
/opt/solr6/server/logs -XX:ParallelGCThreads=16 -XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB
-XX:ThreadStackSize=256 -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
-XX:+UseLargePages
{Heap before GC invocations=559440 (full 0):
  garbage-first heap   total 29360128K, used 24944705K [0xc000,
0xc1003800, 0x0007c000)
   region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
  Metaspace   used 95460K, capacity 97248K, committed 97744K, reserved
1134592K
   class spaceused 11616K, capacity 12104K, committed 12240K, reserved
1048576K
2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation Pause)
(young)
Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
- age   1:   23419920 bytes,   23419920 total
- age   2:9355296 bytes,   32775216 total
- age   3:2455384 bytes,   35230600 total
- age   4:   38246704 bytes,   73477304 total
- age   5:   47064408 bytes,  120541712 total
- age   6:   13228864 bytes,  133770576 total
- age   7:   23990800 bytes,  157761376 total
- age   8:1031416 bytes,  158792792 total
- age   9:   17011128 bytes,  175803920 total
- age  10:7371888 bytes,  183175808 total
- age  11:6226576 bytes,  189402384 total
- age  12: 637184 bytes,  190039568 total
- age  13:   11577864 bytes,  201617432 total
- age  14:9519224 bytes,  211136656 total
- age  15: 672304 bytes,  211808960 total
, 0.0391210 secs]
[Parallel Time: 32.1 ms, GC Workers: 16]
   [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max:
522594835.2, Diff: 0.2]
   [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7,
Sum: 12.2]
   [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
  [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
   [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
   [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
   [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, Sum:
453.5]
   [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
  [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
   [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
2.4]
   [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 0.4,
Sum: 507.9]
   [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max:
522594867.0, Diff: 0.2]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.7 ms]
[Other: 5.2 ms]
   [Choose CSet: 0.0 ms]
   [Ref Proc: 2.9 ms]
   [Ref Enq: 0.1 ms]
   [Redirty Cards: 0.2 ms]
   [Humongous Register: 0.1 ms]
   [Humongous Reclaim: 0.0 ms]
   [Free CSet: 1.6 ms]
[Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap:
23.8G(28.0G)->7371.4M(28.0G)]
Heap after GC invocations=559441 (ful

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Joe Obernberger

Thank you Michael.  Oddly when I start jstatd on one of the servers, I 
see all the JVM processes in jvisualvm except the solr one!  Any idea why?



On 8/18/2017 12:43 PM, Michael Braun wrote:

Have you attached JVisualVM or a similar application to the process to
sample where the time is being spent? It can be very helpful for debugging
this sort of problem.

On Fri, Aug 18, 2017 at 12:37 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:


Indexing about 15 million documents per day across 100 shards on 45
servers.  Up until about 350 million documents, each of the solr instances
was taking up about 1 core (100% CPU).  Recently, they all jumped to 700%.
Is this normal?  Anything that I can check for?

I don't see anything unusual in the solr logs.  Sample from the GC logs:

---

2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc
.log.2
OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE (1.8.0_141-b16),
built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 4.4.7 20120313 (Red
Hat 4.4.7-18)
Memory: 4k page, physical 99016188k(796940k free), swap
33554428k(32614048k free)
CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12
-XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216
-XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184
-XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304
-XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300
-XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 <(803)%20550-7200>
-XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
/opt/solr6/server/logs -XX:ParallelGCThreads=16 -XX:+ParallelRefProcEnabled
-XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB
-XX:ThreadStackSize=256 -XX:+UseCompressedClassPointers
-XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
-XX:+UseLargePages
{Heap before GC invocations=559440 (full 0):
  garbage-first heap   total 29360128K, used 24944705K [0xc000,
0xc1003800, 0x0007c000)
   region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
  Metaspace   used 95460K, capacity 97248K, committed 97744K, reserved
1134592K
   class spaceused 11616K, capacity 12104K, committed 12240K, reserved
1048576K
2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation Pause)
(young)
Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
- age   1:   23419920 bytes,   23419920 total
- age   2:9355296 bytes,   32775216 total
- age   3:2455384 bytes,   35230600 total
- age   4:   38246704 bytes,   73477304 total
- age   5:   47064408 bytes,  120541712 total
- age   6:   13228864 bytes,  133770576 total
- age   7:   23990800 bytes,  157761376 total
- age   8:1031416 bytes,  158792792 total
- age   9:   17011128 bytes,  175803920 total
- age  10:7371888 bytes,  183175808 total
- age  11:6226576 bytes,  189402384 total
- age  12: 637184 bytes,  190039568 total
- age  13:   11577864 bytes,  201617432 total
- age  14:9519224 bytes,  211136656 total
- age  15: 672304 bytes,  211808960 total
, 0.0391210 secs]
[Parallel Time: 32.1 ms, GC Workers: 16]
   [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max:
522594835.2, Diff: 0.2]
   [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7,
Sum: 12.2]
   [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
  [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
   [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
   [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.1]
   [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, Sum:
453.5]
   [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
  [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
   [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
2.4]
   [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 0.4,
Sum: 507.9]
   [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max:
522594867.0, Diff: 0.2]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.7 ms]
[Other: 5.2 ms]
   [Choose CSet: 0.0 ms]
   [Ref Proc: 2.9 ms]
   [Ref Enq: 0.1 ms]
   [Redirty Cards: 0.2 ms]
   [Humongous Register: 0.1 ms]
   [Humongous Reclaim: 0.0 ms]
   [Free CSet: 1.6 ms]
[Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap:
23.8G(28.0G)->7371.4M(28.0G)]
Heap after GC invocations=559441 (full 0):
  garbage-first heap   total 29360128K, used 7548353K [0xc000,
0xc1003800, 0x0007c000)
   region size 16384K, 13 young (212992K), 13 survivors (212992K)
  Metaspace   used 95460K, capacity 97248K,

Re: Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Michael Braun

Have you attached JVisualVM or a similar application to the process to
sample where the time is being spent? It can be very helpful for debugging
this sort of problem.

On Fri, Aug 18, 2017 at 12:37 PM, Joe Obernberger <
joseph.obernber...@gmail.com> wrote:

> Indexing about 15 million documents per day across 100 shards on 45
> servers.  Up until about 350 million documents, each of the solr instances
> was taking up about 1 core (100% CPU).  Recently, they all jumped to 700%.
> Is this normal?  Anything that I can check for?
>
> I don't see anything unusual in the solr logs.  Sample from the GC logs:
>
> ---
>
> 2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc
> .log.2
> OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE (1.8.0_141-b16),
> built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 4.4.7 20120313 (Red
> Hat 4.4.7-18)
> Memory: 4k page, physical 99016188k(796940k free), swap
> 33554428k(32614048k free)
> CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12
> -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216
> -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184
> -XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304
> -XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300
> -XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 <(803)%20550-7200>
> -XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9
> -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
> /opt/solr6/server/logs -XX:ParallelGCThreads=16 -XX:+ParallelRefProcEnabled
> -XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
> -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB
> -XX:ThreadStackSize=256 -XX:+UseCompressedClassPointers
> -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation
> -XX:+UseLargePages
> {Heap before GC invocations=559440 (full 0):
>  garbage-first heap   total 29360128K, used 24944705K [0xc000,
> 0xc1003800, 0x0007c000)
>   region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
>  Metaspace   used 95460K, capacity 97248K, committed 97744K, reserved
> 1134592K
>   class spaceused 11616K, capacity 12104K, committed 12240K, reserved
> 1048576K
> 2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation Pause)
> (young)
> Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
> - age   1:   23419920 bytes,   23419920 total
> - age   2:9355296 bytes,   32775216 total
> - age   3:2455384 bytes,   35230600 total
> - age   4:   38246704 bytes,   73477304 total
> - age   5:   47064408 bytes,  120541712 total
> - age   6:   13228864 bytes,  133770576 total
> - age   7:   23990800 bytes,  157761376 total
> - age   8:1031416 bytes,  158792792 total
> - age   9:   17011128 bytes,  175803920 total
> - age  10:7371888 bytes,  183175808 total
> - age  11:6226576 bytes,  189402384 total
> - age  12: 637184 bytes,  190039568 total
> - age  13:   11577864 bytes,  201617432 total
> - age  14:9519224 bytes,  211136656 total
> - age  15: 672304 bytes,  211808960 total
> , 0.0391210 secs]
>[Parallel Time: 32.1 ms, GC Workers: 16]
>   [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max:
> 522594835.2, Diff: 0.2]
>   [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7,
> Sum: 12.2]
>   [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
>  [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
>   [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
>   [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
> Sum: 0.1]
>   [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, Sum:
> 453.5]
>   [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
>  [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
>   [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum:
> 2.4]
>   [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 0.4,
> Sum: 507.9]
>   [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max:
> 522594867.0, Diff: 0.2]
>[Code Root Fixup: 0.1 ms]
>[Code Root Purge: 0.0 ms]
>[Clear CT: 1.7 ms]
>[Other: 5.2 ms]
>   [Choose CSet: 0.0 ms]
>   [Ref Proc: 2.9 ms]
>   [Ref Enq: 0.1 ms]
>   [Redirty Cards: 0.2 ms]
>   [Humongous Register: 0.1 ms]
>   [Humongous Reclaim: 0.0 ms]
>   [Free CSet: 1.6 ms]
>[Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap:
> 23.8G(28.0G)->7371.4M(28.0G)]
> Heap after GC invocations=559441 (full 0):
>  garbage-first heap   total 29360128K, used 7548353K [0xc000,
> 0xc1003800, 0x0007c000)
>   region size 16384K, 13 young (212992K), 13 survivors (212992K)
>  Metaspace   used 95460K, capacity 97248K, committed 97744K, reserved
> 1134592K

Solr 6.6.0 - High CPU during indexing

2017-08-18 Thread Joe Obernberger

Indexing about 15 million documents per day across 100 shards on 45 
servers.  Up until about 350 million documents, each of the solr 
instances was taking up about 1 core (100% CPU).  Recently, they all 
jumped to 700%.  Is this normal?  Anything that I can check for?


I don't see anything unusual in the solr logs.  Sample from the GC logs:

---

2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc.log.2
OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE 
(1.8.0_141-b16), built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 
4.4.7 20120313 (Red Hat 4.4.7-18)
Memory: 4k page, physical 99016188k(796940k free), swap 
33554428k(32614048k free)
CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12 
-XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216 
-XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184 
-XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304 
-XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300 
-XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 
-XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9 
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100 
/opt/solr6/server/logs -XX:ParallelGCThreads=16 
-XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintGC 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps 
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC 
-XX:+PrintTenuringDistribution -XX:-ResizePLAB -XX:ThreadStackSize=256 
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC 
-XX:+UseGCLogFileRotation -XX:+UseLargePages

{Heap before GC invocations=559440 (full 0):
 garbage-first heap   total 29360128K, used 24944705K 
[0xc000, 0xc1003800, 0x0007c000)

  region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
 Metaspace   used 95460K, capacity 97248K, committed 97744K, 
reserved 1134592K
  class spaceused 11616K, capacity 12104K, committed 12240K, 
reserved 1048576K
2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation 
Pause) (young)

Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
- age   1:   23419920 bytes,   23419920 total
- age   2:9355296 bytes,   32775216 total
- age   3:2455384 bytes,   35230600 total
- age   4:   38246704 bytes,   73477304 total
- age   5:   47064408 bytes,  120541712 total
- age   6:   13228864 bytes,  133770576 total
- age   7:   23990800 bytes,  157761376 total
- age   8:1031416 bytes,  158792792 total
- age   9:   17011128 bytes,  175803920 total
- age  10:7371888 bytes,  183175808 total
- age  11:6226576 bytes,  189402384 total
- age  12: 637184 bytes,  190039568 total
- age  13:   11577864 bytes,  201617432 total
- age  14:9519224 bytes,  211136656 total
- age  15: 672304 bytes,  211808960 total
, 0.0391210 secs]
   [Parallel Time: 32.1 ms, GC Workers: 16]
  [GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max: 
522594835.2, Diff: 0.2]
  [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7, 
Sum: 12.2]

  [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
 [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
  [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 
0.0, Sum: 0.1]
  [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, 
Sum: 453.5]

  [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
 [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
  [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, 
Sum: 2.4]
  [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 
0.4, Sum: 507.9]
  [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max: 
522594867.0, Diff: 0.2]

   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 1.7 ms]
   [Other: 5.2 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 2.9 ms]
  [Ref Enq: 0.1 ms]
  [Redirty Cards: 0.2 ms]
  [Humongous Register: 0.1 ms]
  [Humongous Reclaim: 0.0 ms]
  [Free CSet: 1.6 ms]
   [Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap: 
23.8G(28.0G)->7371.4M(28.0G)]

Heap after GC invocations=559441 (full 0):
 garbage-first heap   total 29360128K, used 7548353K 
[0xc000, 0xc1003800, 0x0007c000)

  region size 16384K, 13 young (212992K), 13 survivors (212992K)
 Metaspace   used 95460K, capacity 97248K, committed 97744K, 
reserved 1134592K
  class spaceused 11616K, capacity 12104K, committed 12240K, 
reserved 1048576K

}
 [Times: user=0.54 sys=0.00, real=0.04 secs]
2017-08-18T11:53:16.024-0400: 522594.874: Total time for which 
application threads were stopped: 0.0471187 seconds, Stopping threads 
took: 0.0001739 seconds
2017-08-18T11:53:16.811-0400: 522595.661: Total time for which 
application threads were stopped: 0.0019163 seconds, Stopping threads

Re: Request Highlighting only for the final set of rows

2017-08-18 Thread Nawab Zada Asad Iqbal

Thanks Erick for the pointing to better option. I will explore that. After
your email, I found that if i have specified 'fl=*' in the query then it is
doing the right thing (a 2 pass process). However, my queries had
'fl=id+score' (or sometimes fl=id&fl=score), in both of these cases I found
that the shards are asked for highlighting all the results on the first
request (and there is no second request).

The fl=* query is (in my sample case) finishing in 100 msec while same
query with fl=id+score finishes in 1200 msec.

Here are the two queries;

http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&fl=*&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch&wt=json


http://solrdev.test.net:8984/solr/filesearch/select?&hl=on&fl=id&fl=score&start=200&rows=200&q=nawab&shards=solrdev.test.net:8984/solr/filesearch,solrdev.test.net:8985/solr/filesearch,solrdev.test.net:8986/solr/filesearch&wt=json


Thanks
Nawab




On Fri, Aug 18, 2017 at 7:23 AM, Erick Erickson 
wrote:

> I don't think you're reading it correctly. First of all, if you're
> going to do be doing deep paging you should be using cusorMark, see:
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
>
> Second, it's a two-pass process if you don't use cursormark. The first
> pass gets the candidate docs from each shard. But all it returns is
> the ID and sort criteria. Then the aggregator node gets the _true_ top
> N after sorting all the lists from each shard and issues a second
> request for _only_ those docs that have made the top N from each sub
> shard, and those should be the only ones highlighted.
>
> Do you have any evidence to the contrary that they're all being
> highlighted? Or are you misinterpreting the log message for the first
> pass?
>
> Best,
> Erick
>
> On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal 
> wrote:
> > Hi,
> >
> > In a multi-node solr installation (without SolrCloud), during a paging
> > scenario (e.g., start=1000, rows=200), the primary node asks for 1200
> rows
> > from each shard. If highlighting is ON, then the primary node is asking
> for
> > highlighting all the 1200 results from each shard, which doesn't scale
> > well. Is there a way to break the shard query in two steps e.g. ask for
> the
> > 1200 rows and after sorting the 1200 responses from each shard and
> finding
> > final rows to return (1001 to 1200) , issue another query to shards for
> > asking highlighted response for the relevant docs?
> >
> >
> >
> > Thanks
> > Nawab
>

Re: Request Highlighting only for the final set of rows

2017-08-18 Thread Erick Erickson

I don't think you're reading it correctly. First of all, if you're
going to do be doing deep paging you should be using cusorMark, see:
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.

Second, it's a two-pass process if you don't use cursormark. The first
pass gets the candidate docs from each shard. But all it returns is
the ID and sort criteria. Then the aggregator node gets the _true_ top
N after sorting all the lists from each shard and issues a second
request for _only_ those docs that have made the top N from each sub
shard, and those should be the only ones highlighted.

Do you have any evidence to the contrary that they're all being
highlighted? Or are you misinterpreting the log message for the first
pass?

Best,
Erick

On Thu, Aug 17, 2017 at 5:43 PM, Nawab Zada Asad Iqbal  wrote:
> Hi,
>
> In a multi-node solr installation (without SolrCloud), during a paging
> scenario (e.g., start=1000, rows=200), the primary node asks for 1200 rows
> from each shard. If highlighting is ON, then the primary node is asking for
> highlighting all the 1200 results from each shard, which doesn't scale
> well. Is there a way to break the shard query in two steps e.g. ask for the
> 1200 rows and after sorting the 1200 responses from each shard and finding
> final rows to return (1001 to 1200) , issue another query to shards for
> asking highlighted response for the relevant docs?
>
>
>
> Thanks
> Nawab

Re: Match with AND across multiple fields

2017-08-18 Thread Erick Erickson

Solr does not implement pure boolean logic, see:
https://lucidworks.com/2011/12/28/why-not-and-or-and-not/

As for your particular query, parenthesize, something like:
name AND (dimension1 or dimension1x2 or dimenstion 1x2x3)

Best,
Erick

On Fri, Aug 18, 2017 at 2:12 AM, jesseqper  wrote:
> In my index I have products that have multiple dimensions. I want the user to
> be able to search with /name/ + and up to 3 /dimensions/. So a query can
> occur like: /ProductX 10x20/, or: /ProductX 10x20x30/. Now I get to many
> results back, because matches are like:  /name/ AND /dimension/ OR
> /dimension/ OR /dimension/. All should be AND.
>
> I'm quite new to Solr. Is this something I should configure in "dismax"?
>
> The fields I'm using:
>
>  type="double"/>
>  stored="true" type="double"/>
>  stored="true" type="double"/>
>  
>
> SearchHandler:
>
> 
> 
>  dismax
>  explicit
>  0.3
>
> 
> supplierArticleId_Prefix
>  
> 
> UUID,score
>  
>  
> 2<-1 5<-2 6<90%
>  
>  100
>  *:*
> 
> 
>spellcheck
> 
>   
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Match-with-AND-across-multiple-fields-tp4351043.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get results in multiple orders (multiple boosts)

2017-08-18 Thread Walter Underwood

Why do you want to do this in Solr? This would be pretty easy in SQL. If you 
want to sort, use a relational database.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 18, 2017, at 2:52 AM, Luca Dall'Osto  
> wrote:
> 
> Hello Tom,
> thanks for you reply.
> As say in last email, I made the custom function in JS.I posted in pastebin 
> right now: https://pastebin.com/faXNi0fR
> 
> 
> What I have to do is create the same function in Solr... I will take a look 
> to your link and try to create the custom function.
> Thanks
> 
> 
> Luca
> 
> 
> 
> 
>On Friday, August 18, 2017 10:58 AM, Tom Evans  
> wrote:
> 
> 
> On Fri, Aug 18, 2017 at 8:21 AM, Luca Dall'Osto
>  wrote:
>> 
>> Yes, of course, and excuse me for the misunderstanding.
>> 
>> 
>> In my scenario I have to display a list with hundreds of documents.
>> An user can show this documents in a particular order, this order is decided 
>> by user in a settings view.
>> 
>> 
>> Order levels are for example:
>> 1) Order by category, as most important.
>> 2) Order by source, as second level.
>> 3) Order by date (ascending or descending).
>> 4) Order by title (ascending or descending).
>> 
>> 
>> For category order, in settings view, user has an box with a list of all 
>> categories available for him/her.
>> User drag&drop elements of the list to set in the favorite order.
>> Same thing for sources.
>> 
> 
> Solr can only sort by indexed fields, it needs to be able to compare
> one document to another document, and the only information available
> at that point are the indexed fields.
> 
> This would be untenable in your scenario, because you cannot add a
> category..sort_order field to every document for every user.
> 
> If this custom sorting is a hard requirement, the only feasible
> solution I see is to write a custom sorting plugin, that provides a
> function that you can sort on. This blog post describes how this can
> be achieved:
> 
> https://medium.com/culture-wavelabs/sorting-based-on-a-custom-function-in-solr-c94ddae99a12
> 
> I would imagine that you would need one sort function, maybe called
> usersortorder(), to which you would provide the users preferred sort
> ordering (which you would retrieve from wherever you store such
> information) and the field that you want sorted. It would look
> something like this:
> 
> usersortorder("category_id", "3,5,1,7,2,12,14,58") DESC,
> usersortorder("source_id", "5,2,1,4,3") DESC, date DESC, title DESC
> 
> Cheers
> 
> Tom
> 
>

Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

2017-08-18 Thread Günter Hipler


Hi Erik,

thanks for your reply. I made some deeper investigations to tackle the 
reason for the behavior but wasn't successful so far

Answer to your questions:
- yes I completely re-indexed the data
- yes I'm running a collection of around 5.000 queries coming from our 
productive logs


Now my current state of investigation:
1) a query on our current system (4.10) is using around 200 ms for 
processing facets on a larger resultset (here just one example)

http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&indent=on&q.alt=*:*&ps=2&hl=true&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&q.op=AND&hl.simple.pre=START_HILITE&qf=title_short^1000+title_alt^200+title^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+author_additional_gnd_txt_mv^100+title_additional_gnd_txt_mv^100+publplace_additional_gnd_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+cancisbn_isn_mv+variant_isbn_isn_mv+issn+incoissn_isn_mv+localcode+id&hl.fl=fulltext&wt=xml&mm=100%25&facet.field={!ex%3Dunion_filter}union&facet.field={!ex%3DnavAuthor_full_filter}navAuthor_full&facet.field={!ex%3Dformat_hierarchy_str_mv_filter}format_hierarchy_str_mv&facet.field={!ex%3Dlanguage_filter}language&facet.field=navSub_green&facet.field={!ex%3DnavSubform_filter}navSubform&facet.field=publishDate&qt=edismax&json.nl=arrarr&start=0&sort=score+desc&rows=0&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&pf=title_short^1000&facet.mincount=1&facet=true&facet.sort=count

while the same query on 6.x is using more than 4000 ms not uncommon more 
than 1ms

https://gist.github.com/guenterh/8032bddd9bfce31324d1a8651b8d282b
(server is publicly not available)

2) I used several solr 6 versions (6.3 until 6.6) because other 
(library) networks running big indexes reported they too had faceting 
problems and one solved it with 6.3


3) I tried the way we built our old index schema (facet fields based on 
text types) as well as a schema with string fields for docvalues (the 
way we want to go in the future) but had the same problems


4) I played around with new possibilities of facet.methods 
(https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Thefacet.methodParameter 
- not available in version 4) but wasn't able to improve the results.


I have the impression something changed significantly in the way how 
facets are processed but unfortunately can't figure out how to make it 
that our use case isn't so badly affected as it is by now.


Thanks for hints!

Günter


On 09.08.2017 17:22, Erick Erickson wrote:

Two questions:

1> did you completely re-index under 6x? My guess is "yes", since you
jumped two major versions and 6x won't read a 4x index. If not you may
be getting some performance degradation due to back-compat..

2> Try turning &debug=timing. that breaks down the time spent in each
component and may give a clue, Highlighting has changed significantly
so that's one place I'd look.

And I'm assuming you're running a suite of tests, trying just a few
queries is uninformative due to loading parts of the index into
memory.

Best,
Erick

On Wed, Aug 9, 2017 at 1:09 AM, guenterh.li...@bluewin.ch
 wrote:

Hi,
we are updating our SOLR infrastructure from version 4.10.2 to the latest
6.6.

We realize a significant degradation of the response time while running
match-all queries with facets (query in [1]) With version 4.x these kind of
queries never took longer than 2000 ms.

Now all of these queries need more than 9000 ms.

Our index [2] [3] contains around 30 Mio docs. Because we want to use
doc-values for facets and sort functions we changed our doc-processing
significantly replacing all text type with string fields.

The behavior of normal term queries is acceptable although it's a little bit
slower compared with the current productive environment. Yesterday I run a
couple of performance tests

I looked around and came across this (older) issue [4] which is partially
related to our observations but actually I cannot find a solution for our
behavior.

Did we miss something on the way of the development from version 4 / 5 / 6
which might be the reason for the degradation and we should change our
queries?

Thanks a lot for any hints

Günter



[1]
http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^2

SplitShard Replica Placement

2017-08-18 Thread Chris Ulicny

Hi all,

I've run into an issue with solr 6.3.0 where the splitshard command placed
both replicas of the new smaller shard on the same node, and I was curious
as to whether the behavior should be expected or not.

Without having dug into the source code, this is what I've observed
splitshard doing until now for a 2 replica setup (1 leader, 1 follower).
The leader replica splits into the requested number of new shards on the
same node. Then the second copy of the new shards are created on other
nodes which may or may not be the same node that the original follower
replica was located on.

How does solr determine where to put the new follower replicas, and is
there a way to prevent it from colocating the leader and the follower of
the new shards?

Thanks,
Chris

Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

2017-08-18 Thread Günter Hipler


Hi Erik,

thanks for your reply. I made some deeper investigations to tackle the 
reason for the behavior but wasn't successful so far

Answer to your questions:
- yes I completely re-indexed the data
- yes I'm running a collection of around 5.000 queries coming from our 
productive logs


Now my current state of investigation:
1) a query on our current system (4.10) is using around 200 ms for 
processing facets on a larger resultset (here just one example)

http://search.swissbib.ch/solr/sb-biblio/select?debugQuery=true&q=*:*&indent=on&q.alt=*:*&ps=2&hl=true&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&q.op=AND&hl.simple.pre=START_HILITE&qf=title_short^1000+title_alt^200+title^200+title_sub^200+title_old^200+title_new^200+author^750+author_additional^100+author_additional_dsv11_txt_mv^100+title_additional_dsv11_txt_mv^100+author_additional_gnd_txt_mv^100+title_additional_gnd_txt_mv^100+publplace_additional_gnd_txt_mv^100+series^200+topic^500+addfields_txt_mv^50+publplace_txt_mv^25+publplace_dsv11_txt_mv^25+fulltext+callnumber^1000+ctrlnum^1000+publishDate+isbn+cancisbn_isn_mv+variant_isbn_isn_mv+issn+incoissn_isn_mv+localcode+id&hl.fl=fulltext&wt=xml&mm=100%25&facet.field={!ex%3Dunion_filter}union&facet.field={!ex%3DnavAuthor_full_filter}navAuthor_full&facet.field={!ex%3Dformat_hierarchy_str_mv_filter}format_hierarchy_str_mv&facet.field={!ex%3Dlanguage_filter}language&facet.field=navSub_green&facet.field={!ex%3DnavSubform_filter}navSubform&facet.field=publishDate&qt=edismax&json.nl=arrarr&start=0&sort=score+desc&rows=0&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&pf=title_short^1000&facet.mincount=1&facet=true&facet.sort=count

while the same query on 6.x is using more than 4000 ms not uncommon more 
than 1ms

https://gist.github.com/guenterh/8032bddd9bfce31324d1a8651b8d282b
(server is publicly not available)

2) I used several solr 6 versions (6.3 until 6.6) because other 
(library) networks running big indexes reported they too had faceting 
problems and one solved it with 6.3


3) I tried the way we built our old index schema (facet fields based on 
text types) as well as a schema with string fields for docvalues (the 
way we want to go in the future) but had the same problems


4) I played around with new possibilities of facet.methods 
(https://lucene.apache.org/solr/guide/6_6/faceting.html#Faceting-Thefacet.methodParameter 
- not available in version 4) but wasn't able to improve the results.


I have the impression something changed significantly in the way how 
facets are processed but unfortunately can't figure out how to make it 
that our use case isn't so badly affected as it is by now.


Thanks for hints!

Günter


On 09.08.2017 17:22, Erick Erickson wrote:

Two questions:

1> did you completely re-index under 6x? My guess is "yes", since you
jumped two major versions and 6x won't read a 4x index. If not you may
be getting some performance degradation due to back-compat..

2> Try turning &debug=timing. that breaks down the time spent in each
component and may give a clue, Highlighting has changed significantly
so that's one place I'd look.

And I'm assuming you're running a suite of tests, trying just a few
queries is uninformative due to loading parts of the index into
memory.

Best,
Erick

On Wed, Aug 9, 2017 at 1:09 AM, guenterh.li...@bluewin.ch
 wrote:

Hi,
we are updating our SOLR infrastructure from version 4.10.2 to the latest
6.6.

We realize a significant degradation of the response time while running
match-all queries with facets (query in [1]) With version 4.x these kind of
queries never took longer than 2000 ms.

Now all of these queries need more than 9000 ms.

Our index [2] [3] contains around 30 Mio docs. Because we want to use
doc-values for facets and sort functions we changed our doc-processing
significantly replacing all text type with string fields.

The behavior of normal term queries is acceptable although it's a little bit
slower compared with the current productive environment. Yesterday I run a
couple of performance tests

I looked around and came across this (older) issue [4] which is partially
related to our observations but actually I cannot find a solution for our
behavior.

Did we miss something on the way of the development from version 4 / 5 / 6
which might be the reason for the degradation and we should change our
queries?

Thanks a lot for any hints

Günter



[1]
http://localhost:8080/solr/sb-biblio/select?rows=0&q=*:*&facet.field=union&facet.field=navAuthor_full&facet.field=format&facet.field=language&facet.field=navSub_green&facet.field=navSubform&facet.field=publishDate&qt=edismax&ps=2&hl=true&json.nl=arrarr&bf=recip(abs(ms(NOW/DAY,freshness)),3.16e-10,100,100)&fl=*,score&hl.fragsize=250&start=0&q.op=AND&sort=score+desc&rows=20&hl.simple.pre=START_HILITE&facet.limit=100&hl.simple.post=END_HILITE&spellcheck=false&qf=title_short^1000+title_alt^200+title_sub^200+title_old^2

Re: Solr Logs to ELK / AWS Firestream

2017-08-18 Thread Sebastian Klemke

Hey

On Do, 2017-08-17 at 10:15 -0600, John Bickerstaff wrote:
> I'm trying to get Solr logs into AWS Firestream.
> 
> Not having a lot of luck.
> 
> Does anyone out there have any experience getting Solr logs into an ELK
> stack?  Or, better yet, getting Solr Logs into AWS Firestream?
> 
> We direct logs to SLF4J and use logback as our SLF4j implementation.
> 
> I have a number of issues, but won't go into them here - since they won't
> mean much except in context.  If you have some knowledge here - let me know
> and I'll ask my specific questions.

We're using net.logstash.log4j.JSONEventLayoutV1 output to json logs and
have logstash collect them. The jsonevent-layout dependency and its
transitive dependencies have to be added to system classloader classpath
by putting them to lib/ext folder. Probably gelf appender would also
work, but we'd like to keep a local backup.

Regards,

Sebastian

-- 
Sebastian Klemke
Senior Software Engineer

ResearchGate GmbH
Invalidenstr. 115, 10115 Berlin, Germany

www.researchgate.net

Registered Seat: Hannover, HR B 202837
Managing Directors: Dr Ijad Madisch, Dr Sören Hofmayer VAT-ID:
DE258434568
A proud affiliate of: ResearchGate Corporation, 350 Townsend St #754,
San Francisco, CA 94107

Re: Get results in multiple orders (multiple boosts)

2017-08-18 Thread Luca Dall'Osto

Hello Tom,
thanks for you reply.
As say in last email, I made the custom function in JS.I posted in pastebin 
right now: https://pastebin.com/faXNi0fR

What I have to do is create the same function in Solr... I will take a look to 
your link and try to create the custom function.
Thanks

Luca

On Friday, August 18, 2017 10:58 AM, Tom Evans  
wrote:

 On Fri, Aug 18, 2017 at 8:21 AM, Luca Dall'Osto
 wrote:
>
> Yes, of course, and excuse me for the misunderstanding.
>
>
> In my scenario I have to display a list with hundreds of documents.
> An user can show this documents in a particular order, this order is decided 
> by user in a settings view.
>
>
> Order levels are for example:
> 1) Order by category, as most important.
> 2) Order by source, as second level.
> 3) Order by date (ascending or descending).
> 4) Order by title (ascending or descending).
>
>
> For category order, in settings view, user has an box with a list of all 
> categories available for him/her.
> User drag&drop elements of the list to set in the favorite order.
> Same thing for sources.
>

Solr can only sort by indexed fields, it needs to be able to compare
one document to another document, and the only information available
at that point are the indexed fields.

This would be untenable in your scenario, because you cannot add a
category..sort_order field to every document for every user.

If this custom sorting is a hard requirement, the only feasible
solution I see is to write a custom sorting plugin, that provides a
function that you can sort on. This blog post describes how this can
be achieved:

https://medium.com/culture-wavelabs/sorting-based-on-a-custom-function-in-solr-c94ddae99a12

I would imagine that you would need one sort function, maybe called
usersortorder(), to which you would provide the users preferred sort
ordering (which you would retrieve from wherever you store such
information) and the field that you want sorted. It would look
something like this:

usersortorder("category_id", "3,5,1,7,2,12,14,58") DESC,
usersortorder("source_id", "5,2,1,4,3") DESC, date DESC, title DESC

Cheers

Tom

Match with AND across multiple fields

2017-08-18 Thread jesseqper

In my index I have products that have multiple dimensions. I want the user to
be able to search with /name/ + and up to 3 /dimensions/. So a query can
occur like: /ProductX 10x20/, or: /ProductX 10x20x30/. Now I get to many
results back, because matches are like:  /name/ AND /dimension/ OR
/dimension/ OR /dimension/. All should be AND. 

I'm quite new to Solr. Is this something I should configure in "dismax"? 

The fields I'm using: 




 

SearchHandler:



 dismax
 explicit
 0.3


supplierArticleId_Prefix
  

UUID,score
 
 
2<-1 5<-2 6<90%
 
 100
 *:*


   spellcheck

  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Match-with-AND-across-multiple-fields-tp4351043.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Get results in multiple orders (multiple boosts)

2017-08-18 Thread Tom Evans

On Fri, Aug 18, 2017 at 8:21 AM, Luca Dall'Osto
 wrote:
>
> Yes, of course, and excuse me for the misunderstanding.
>
>
> In my scenario I have to display a list with hundreds of documents.
> An user can show this documents in a particular order, this order is decided 
> by user in a settings view.
>
>
> Order levels are for example:
> 1) Order by category, as most important.
> 2) Order by source, as second level.
> 3) Order by date (ascending or descending).
> 4) Order by title (ascending or descending).
>
>
> For category order, in settings view, user has an box with a list of all 
> categories available for him/her.
> User drag&drop elements of the list to set in the favorite order.
> Same thing for sources.
>

Solr can only sort by indexed fields, it needs to be able to compare
one document to another document, and the only information available
at that point are the indexed fields.

This would be untenable in your scenario, because you cannot add a
category..sort_order field to every document for every user.

If this custom sorting is a hard requirement, the only feasible
solution I see is to write a custom sorting plugin, that provides a
function that you can sort on. This blog post describes how this can
be achieved:

https://medium.com/culture-wavelabs/sorting-based-on-a-custom-function-in-solr-c94ddae99a12

I would imagine that you would need one sort function, maybe called
usersortorder(), to which you would provide the users preferred sort
ordering (which you would retrieve from wherever you store such
information) and the field that you want sorted. It would look
something like this:

usersortorder("category_id", "3,5,1,7,2,12,14,58") DESC,
usersortorder("source_id", "5,2,1,4,3") DESC, date DESC, title DESC

Cheers

Tom

Solr cloud replica nodes missing some documents

2017-08-18 Thread Sanjay Lokhande



Hello  guys,

  I am having 5 nodes solr cloud setup with single shard. The solr version
is 5.2.1.
  server1 (http://146.XXX.com:4001/solr/contracts_shard1_replica4)is the
leader.
  A document with id '43e14a86cbdd422880cac22d9a15d3c0' was not replicated
3 nodes.
  Log shows that the "{add=[43e14a86cbdd422880cac22d9a15d3c0
(1573510697298427904)]}" request is received only by leader and server5
node.
  The server2, server3 & server4 node did not receive the request and hence
the document is missing in these nodes.

 Search "43e14a86cbdd422880cac22d9a15d3c0 "
  C:\solrIssue\solr_server1.log
INFO  - 2017-07-21 05:54:59.853; [contracts shard1 core_node2
contracts_shard1_replica4]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica4] webapp=/solr path=/update params=
{wt=javabin&version=2} {deleteByQuery=id:(9467353f398448788c261aa347d75b8b
93332ab7f7ff4141a371713871ab65ad 8568e0eab8364bfc89c876aadfa01022
43e14a86cbdd422880cac22d9a15d3c0 a0af8cb24ef94d25b9691eee1f7024ca 8ad...
INFO  - 2017-07-21 05:54:59.853; [contracts shard1 core_node2
contracts_shard1_replica4]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica4] webapp=/solr path=/update params=
{wt=javabin&version=2} {deleteByQuery=id:(9467353f398448788c261aa347d75b8b
93332ab7f7ff4141a371713871ab65ad 8568e0eab8364bfc89c876aadfa01022
43e14a86cbdd422880cac22d9a15d3c0 a0af8cb24ef94d25b9691eee1f7024ca 8ad...
INFO  - 2017-07-21 05:59:23.845; [contracts shard1 core_node2
contracts_shard1_replica4]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica4] webapp=/solr path=/update params=
{wt=javabin&version=2} {add=[43e14a86cbdd422880cac22d9a15d3c0
(1573510697298427904)]} 0 26582
  C:\solrIssue\solr_server2\solr.log.1
INFO  - 2017-07-21 05:54:59.595; [contracts shard1 core_node4
contracts_shard1_replica5]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica5] webapp=/solr path=/update params=
{update.distrib=FROMLEADER&_version_=-1573510446380482560&distrib.from=http://146.XXX.com:4001/solr/contracts_shard1_replica4/&wt=javabin&version=2}
 {deleteByQuery=id:(9467353f398448788c261aa347d75b8b
93332ab7f7ff4141a371713871ab65ad 8568e0eab8364bfc89c876aadfa01022
43e14a86cbdd422880cac22d9a15d3c0 a0af8cb24ef94d25b9691eee1f7024ca 8ad...
INFO  - 2017-07-21 05:54:59.595; [contracts shard1 core_node4
contracts_shard1_replica5]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica5] webapp=/solr path=/update params=
{update.distrib=FROMLEADER&_version_=-1573510446380482560&distrib.from=http://146.XXX.com:4001/solr/contracts_shard1_replica4/&wt=javabin&version=2}
 {deleteByQuery=id:(9467353f398448788c261aa347d75b8b
93332ab7f7ff4141a371713871ab65ad 8568e0eab8364bfc89c876aadfa01022
43e14a86cbdd422880cac22d9a15d3c0 a0af8cb24ef94d25b9691eee1f7024ca 8ad...
  C:\solrIssue\solr_server3.log
INFO  - 2017-07-21 05:54:59.844; [contracts shard1 core_node1
contracts_shard1_replica3]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica3] webapp=/solr path=/update params=
{update.distrib=FROMLEADER&_version_=-1573510446380482560&distrib.from=http://146.XXX.com:4001/solr/contracts_shard1_replica4/&wt=javabin&version=2}
 {deleteByQuery=id:(9467353f398448788c261aa347d75b8b
93332ab7f7ff4141a371713871ab65ad 8568e0eab8364bfc89c876aadfa01022
43e14a86cbdd422880cac22d9a15d3c0 a0af8cb24ef94d25b9691eee1f7024ca 8ad...
INFO  - 2017-07-21 05:54:59.844; [contracts shard1 core_node1
contracts_shard1_replica3]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica3] webapp=/solr path=/update params=
{update.distrib=FROMLEADER&_version_=-1573510446380482560&distrib.from=http://146.XXX.com:4001/solr/contracts_shard1_replica4/&wt=javabin&version=2}
 {deleteByQuery=id:(9467353f398448788c261aa347d75b8b
93332ab7f7ff4141a371713871ab65ad 8568e0eab8364bfc89c876aadfa01022
43e14a86cbdd422880cac22d9a15d3c0 a0af8cb24ef94d25b9691eee1f7024ca 8ad...
  C:\solrIssue\solr_server4\solr.log.1
INFO  - 2017-07-21 05:54:59.734; [contracts shard1 core_node3
contracts_shard1_replica1]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica1] webapp=/solr path=/update params=
{update.distrib=FROMLEADER&_version_=-1573510446380482560&distrib.from=http://146.XXX.com:4001/solr/contracts_shard1_replica4/&wt=javabin&version=2}
 {deleteByQuery=id:(9467353f398448788c261aa347d75b8b
93332ab7f7ff4141a371713871ab65ad 8568e0eab8364bfc89c876aadfa01022
43e14a86cbdd422880cac22d9a15d3c0 a0af8cb24ef94d25b9691eee1f7024ca 8ad...
INFO  - 2017-07-21 05:54:59.734; [contracts shard1 core_node3
contracts_shard1_replica1]
org.apache.solr.update.processor.LogUpdateProcessor;
[contracts_shard1_replica1] webapp=/solr path=/update params=
{update.distrib=FROMLEADER&_version_=-1573510446380482560&distrib.from=http://146.XXX.com:4001/solr/contracts_shard1_replica4/&wt=javabin&version=2}
 {delet

Re: Get results in multiple orders (multiple boosts)

2017-08-18 Thread Luca Dall'Osto


Yes, of course, and excuse me for the misunderstanding.


In my scenario I have to display a list with hundreds of documents.
An user can show this documents in a particular order, this order is decided by 
user in a settings view.


Order levels are for example:
1) Order by category, as most important.
2) Order by source, as second level.
3) Order by date (ascending or descending).
4) Order by title (ascending or descending).


For category order, in settings view, user has an box with a list of all 
categories available for him/her.
User drag&drop elements of the list to set in the favorite order.
Same thing for sources.


Let me show you an example of the list:
User show home page with a list of that documents:


documents : [
    0 => {
        "id" : 100,
        "title" : "Title A",
        "category" : 10,
        "source" : 3,
        "date" : "2017-08-17",
    },
    1 => {
        "id" : 101,
        "title" : "Title B",
        "category" : 50,
        "source" : 1,
        "date" : "2017-08-17",
    },
    2 => {
        "id" : 102,
        "title" : "Title A",
        "category" : 10,
        "source" : 5,
        "date" : "2017-08-17",
    },
    3 => {
        "id" : 103,
        "title" : "Title C",
        "category" : 10,
        "source" : 5,
        "date" : "2017-07-23",
    },
    4 => {
        "id" : 104,
        "title" : "Title C",
        "category" : 4,
        "source" : 3,
        "date" : "2017-08-17",
    },
];




This user has the category order like:
category_order: [
    0 => 10,
    1 => 4,
    2 => 50,
]


... and source order like:
source_order: [
    0 => 5,
    1 => 3,
    2 => 1000,
    3 => 1
]


Now, this user has specified in settings that he/she will show:
1) first documents order by category
2) then, user will have these documents in source order
3) then, order by date DESCENDING (newest first)
4) and finally by name as last order ASCENDING (A to Z).


I have to query documents from Solr with these orders.
The results would be like
1) document with id = 102,
2) document with id = 103,
3) document with id = 100,
4) document with id = 104,
5) document with id = 101,


I hope I explained it well.


PS: If you need, I wrote a custom sort function that sort elements of a list 
(instance object of jQuery plugin List.js) in the correct order I need.
It's JavasScript code, made before I implement Solr pagination.
With Solr pagination it don't work.
If you need it let me know, I will post it on Pastebin.


Thank you



Luca





On Wednesday, August 16, 2017 12:39 PM, Rick Leir  wrote:


 

 Luca
Can you give me an example? If category_id is 9500, what would you want to sort 
on? -- Rick

On August 16, 2017 5:19:56 AM EDT, Luca Dall'Osto 
 wrote:
>Hello Rick,
>I have no algorithm: user choose the order of the categories, and the
>sources, For example:
>
>
>For Category:
>- at position 0 category_id 9500.
>- at position 1 category_id 10.
>- at position 2 category_ud 555.
>(etc...)
>
>
>For Source:
>
>- at position 0 source_id 12.
>- at position 1 source_id 30.
>
>- at position 2 source_id 3.
>
>(etc...)
>
>
>After that, user decide what kind of sort apply.
>For example: fist by DATE, then by CATEGORY, then by SOURCE and then by
>NAME.
>I have 1 array with all category Ids with correct order decided by user
>and another one with source Ids in correct order decided by user.
>Then I have another array that specify the sort type (like example,
>user should ask documents ordered by DATE first, then by CATEGORY,
>etc... )
>Natural sort order could be fine only for DATE and NAME, but for
>CATEGORY and SOURCE I have to use the array with ids sorted by user.
>Thanks!
>
>Luca 
>
>On Tuesday, August 8, 2017 6:54 PM, Rick Leir 
>wrote:
> 
>
> Luca,
>What is the algorithm for the custom sort order?  -- Rick
>
>On August 7, 2017 6:38:49 AM EDT, Luca Dall'Osto
> wrote:
>>Hello Rick,
>>thanks for your answer.
>>Yes, I compose solr query from frontend request, but I'm not able to
>>sort by a custom order, only by natural order (for example:
>>sort=category desc, source desc, /*...*/ ).
>>How do you set a custom sort order in solr?
>>Thanks
>>
>>Luca
>>
>>
>> 
>>
>>On Friday, August 4, 2017 7:41 PM, Rick Leir 
>>wrote:
>> 
>>
>> Luca
>>I hope you have a web app in front of Solr. It could accept parameters
>>from the browser, then construct the query as necessary to do your
>>sorting. Cheers -- Rick
>>
>>On August 4, 2017 5:32:31 AM EDT, Luca Dall'Osto
>> wrote:
>>>Hello,
>>>sorry for the late, I was out of my home.
>>>
>>>
>>>In response to Rick: 
>>>I can't do that because: 
>>>1) each user should have multiple sort (for example user "A" can sort
>>>by date and then by category and then by name ...) .
>>>2) the sort is not natural sort: user has a custom order for a field
>>>(for example sorting category field of user "A" could be category 10
>>at
>>>position 1, category 2 at position 2, category 9500 at position 3,
>>>category 40 at position 5 ...).
>>>
>>>
>>>In response to Susheel:h

Re: Solr query help

Re: Get results in multiple orders (multiple boosts)

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr query help

Re: Request Highlighting only for the final set of rows

Re: Tlogs not being deleted/truncated

Re: Request Highlighting only for the final set of rows

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Re: Solr 6.6.0 - High CPU during indexing

Solr 6.6.0 - High CPU during indexing

Re: Request Highlighting only for the final set of rows

Re: Request Highlighting only for the final set of rows

Re: Match with AND across multiple fields

Re: Get results in multiple orders (multiple boosts)

Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

SplitShard Replica Placement

Re: response time degradation with matchall queries / changin from SOLR 4.10 -> 6.x

Re: Solr Logs to ELK / AWS Firestream

Re: Get results in multiple orders (multiple boosts)

Match with AND across multiple fields

Re: Get results in multiple orders (multiple boosts)

Solr cloud replica nodes missing some documents

Re: Get results in multiple orders (multiple boosts)

30 matches

Site Navigation

Mail list logo

Footer information