date:20160222

Re: Best practices for Solr (how to update jar files safely)

2016-02-22 Thread Ramkumar R. Aiyengar

I side with Toke on this. Enterprise bare metal machines often have
hundreds of gigs of memory and tens of CPU cores -- you would have to fit
multiple instances in a machine to make use of them to circumvent huge
heaps.

If this is not a common case now, it could well be in the future the way
hardware evolves -- so I would rather mention the factors which need
multiple instances than discourage them.
On 20 Feb 2016 14:55, "Toke Eskildsen"  wrote:

> Shawn Heisey  wrote:
> > I've updated the "Taking Solr to Production" reference guide page with
> > what I feel is an appropriate caution against running multiple instances
> > in a typical installation.  I'd actually like to use stronger language,
>
> And I would like you to use softer language.
>
> Machines gets bigger all the time and as you state yourself, GC can
> (easily) be a problem with the heap grows. With reference to the 32GB JVM
> limit for small pointers, a max Xmx just below 32GB looks like a practical
> choice for a Solr installation (if possible of course): Running 2 instances
> of 31GB will provide more usable memory than a single instance of 64GB.
>
> https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-memory-oddities/
>
> Caveat: I have not done any testing on this with Solr, so I do not know
> how large the effect is. Some things, such as String faceting, DocValues
> structures and some of the field caches are array-of-atomics oriented and
> will not suffer with larger pointers. Other things, such as numerics
> faceting, large rows-settings and grouping uses a lot of objects and will
> require more memory. The overhead will differ depending on usage.
>
> We tend to use separate Solr installations on the same machines. For some
> machines we do it to allow for independent upgrades (long story), for
> others because a heap of 200GB is not something we are ready to experiment
> with.
>
> - Toke Eskildsen
>

Possible reasons for multiple searchers for same core in Solr 4.6.1

2016-02-22 Thread Dhritiman Das

Hi,

We are using Solr 4.6.1 in our application and have written some custom
plugins/components at many places. We are facing a issue and needed your
views for debugging the same.

Issue: After application starts up, after sometime, we see multiple
searchers
opened for the same core ( we have seen 3/4/5 and more searchers as well).
Each searcher consumes its own memory for the caches and ultimately the app
runs out of memory.

We have reviewed initially in our custom code and have seen that in almost
all places (like SolrEventListeners ) where we get SolrIndexSearcher
reference
from Solr code, we get it as a raw object rather than a RefCounted, so the
chances of this being a case of forgetting to decrement a RefCounted after
use are low.

I wanted to check with you if you can help with ideas for debugging the
issue
(possible points where a searcher might have leaked from Solr to user code)
as it is badly affecting our production systems.

Thanks,
Dhritiman

Index time or query time boost, and help with boost syntax

2016-02-22 Thread jimi.hullegard

Hi,

We have a use case where we want to influence the score of the documents based 
on the document type, and I am a bit unsure what is the best way to achieve 
this. In essence we have about 100.000 documents, of about 15 different 
document types. And we more or less want to tweak the score differently for 
each document type (ie it is not just one document type that should be boosed 
over all the others).

How would you suggest that we do this? First I thought that query time boosing 
would be perfect for this, because that way we can tweak and fine tune the 
boost levels without having to reindex everything each time. But to be honest, 
I really don't understand how I would put such a query together, using the 
edismax parser. I can't seem to find one single example for edismax for this, 
using the multiplicative boost, that boosts like this: documentType:person^1.8 
documentType:publication^1.5 documentType:news^1.5 documentType:event^1.3 
etc... Can someone help me out with the syntax?

Another approach could be that we use index time boost. That would simplify the 
querys, and to be honest I don't think that we need to modify the boosting 
factors much after the initial tweaking is done, and also our indexing process 
is fairly quick and light weight, so it isn't a big deal to perform a full 
reindex.
But here I am also unsure of how to set that up properly. Basically we want to 
boost the documents based on document type, regardless of the query. According 
to the documentaiton, this is what happens when one uses the boost attribute on 
the doc element in the xml. However the documentation also mentions that this 
is just "a convinience mechanism equivilent to specifying a boost attribute on 
each of the individual fields that support norms". This leaves me wondering:

1. If boost is defined on both the doc and field level, how is that 
interpreted? Are the values merged using 
add/multiply/max/some-other-math-function? Or is the doc boost just used as a 
default value for fields that doesn't defined their own boost?
2. What about fields that doesn't have norms? If a query matches such a field, 
wouldn't that effect the score, without me being able to effect that score?
3. On a general note: Is the score I'm boosting really the 
total/outermost/final score of the document? So that a boost of 2.0 would 
double the final score of that document, all else equal? Or I'm I simply 
boosting one "inner score", that in turn is used in some complex math 
expression so that it might not influence the final score at all in 
circumstances, and other times might only influence the score in a much smaller 
way?

An alternative I guess could be to start out with query time boosting like 
above, to find the apropriate boosting levels. And then convert this to some 
kind of hybrid solition afterwards, where the boost factor is stored in a field 
in the document (thus being set at index time), and then being used in a boost 
function in the query. With this solution, I guess that it would also be 
possible to have multiple "boost fields" in the documents, each with different 
relative boost values based on document type, and then be able to choose at 
query time what boost field we want. Would that be a good solution you think? 
But would it be possible to go from a query boost of the type 
"documentType:person^1.8 ..." to a function query boost that uses a document 
field with that value? Ie, would the resulting scores be the same for 
"documentType:person^1.8 ..." on one hand, and a function boost query with a 
field that has the value 1.8 for documents of type person? Or could the boost 
values from these different boost styles result in different final scores?

Regards
/Jimi

Bug? Cannot use rule with free disk because solr is getting wrong free disk size

2016-02-22 Thread Robert C Delorimier

Environment: Centos 6
Solr version: 5.2.1
Java Version: 7


Adding rules to collection creation did not work because solr does not return 
the correct value for freedisk

Example: 
http://server2:18983/solr/admin/collections?action=CREATE&rule=replica:*,shard:*,freedisk:%3E24&name=search_create_test&numShards=4&replicationFactor=2&maxShardsPerNode=10&collection.configName=log_search

Result:



400
128


org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not identify nodes matching the rules [{ "shard":"*", "replica":"*", 
"freedisk":">24"}] tag values{ "server256:18983_solr":{"freedisk":24}, 
"server262:18983_solr":{"freedisk":24}, "server260:18983_solr":{"freedisk":24}, 
"server261:28983_solr":{"freedisk":24}, "server262:8983_solr":{"freedisk":24}, 
"server260:28983_solr":{"freedisk":24}, "server256:28983_solr":{"freedisk":24}, 
"server261:8983_solr":{"freedisk":24}, "server260:8983_solr":{"freedisk":24}, 
"server261:18983_solr":{"freedisk":24}, "server262:28983_solr":{"freedisk":24}, 
"server256:8983_solr":{"freedisk":24}}



Could not identify nodes matching the rules [{ "shard":"*", "replica":"*", 
"freedisk":">24"}] tag values{ "server256:18983_solr":{"freedisk":24}, 
"server262:18983_solr":{"freedisk":24}, "server260:18983_solr":{"freedisk":24}, 
"server261:28983_solr":{"freedisk":24}, "server262:8983_solr":{"freedisk":24}, 
"server260:28983_solr":{"freedisk":24}, "server256:28983_solr":{"freedisk":24}, 
"server261:8983_solr":{"freedisk":24}, "server260:8983_solr":{"freedisk":24}, 
"server261:18983_solr":{"freedisk":24}, "server262:28983_solr":{"freedisk":24}, 
"server256:8983_solr":{"freedisk":24}}

400



Could not identify nodes matching the rules [{ "shard":"*", "replica":"*", 
"freedisk":">24"}] tag values{ "server256:18983_solr":{"freedisk":24}, 
"server262:18983_solr":{"freedisk":24}, "server260:18983_solr":{"freedisk":24}, 
"server261:28983_solr":{"freedisk":24}, "server262:8983_solr":{"freedisk":24}, 
"server260:28983_solr":{"freedisk":24}, "server256:28983_solr":{"freedisk":24}, 
"server261:8983_solr":{"freedisk":24}, "server260:8983_solr":{"freedisk":24}, 
"server261:18983_solr":{"freedisk":24}, "server262:28983_solr":{"freedisk":24}, 
"server256:8983_solr":{"freedisk":24}}

400




The result is saying that there are 24GB of free disk on every server when they 
are different values, and the values are higher.

Any ideas?

Robert

CLOSE_WAIT and high search latency

2016-02-22 Thread Niraj Aswani

Hi,

I am on solr 4.8.1 and running master-slave setup with lots of cores (>3K).
Internally I maintain an instance of HTTPSolrServer for each core that is
reused for querying the respective cores. A request is received by an
intermediary tomcat and forwarded to another tomcat running Solr.

Over the period we see high search latency. Some requests start to take too
long and eventually result into timeouts.

Investigating this, I see that, over the period, a high number of
CLOSE_WAIT sockets (>3300) are building up. Running `netstat -p` seems to
suggest that these sockets were initiated by the intermediary tomcats when
communicating to the Solr.

Questions are:

- Why do we see such high number of CLOSE_WAiT sockets? Shouldn't the
HTTPSolrServer take care of closing these connections after communicating
with the Solr server?

- Does the high number of CLOSE_WAIT have anything to do with search
latency?

Any suggestion on the matter is highly appreciated!

Regards,
Niraj

Re: Exception SolrServerException: No live SolrServers available to handle this request:

2016-02-22 Thread Binoy Dalal

In the cloud section in the admin console, do you see all your shards in a
live state?

On Tue, 23 Feb 2016, 10:25 Mugeesh Husain  wrote:

> Hi,
>  solr servers are up and listening. and i also check zookeeper
> clustersterstate.json as below
>
> get /clusterstate.json
> {}
> cZxid = 0x10013
> ctime = Fri Jan 15 01:40:37 IST 2016
> mZxid = 0x11799
> mtime = Fri Feb 19 18:59:37 IST 2016
> pZxid = 0x10013
> cversion = 0
> dataVersion = 58
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 2
> numChildren = 0
>
>
> if it is coming some docvalues ,Actuallly i am trying to fetchgrouping
> record in ro field.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exception-SolrServerException-No-live-SolrServers-available-to-handle-this-request-tp4258898p4259026.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal

Re: SOLR cloud startup - zookeeper ensemble

2016-02-22 Thread bbarani

Ok when I run the below command it looks like its ignoring the double quotes.

solr start -c -z "localhost:2181,localhost:2182,localhost:2183" -e cloud


This interactive session will help you launch a SolrCloud cluster on your
local
workstation.
To begin, how many Solr nodes would you like to run in your local cluster?
(spec
ify 1-4 nodes) [2]:
2
Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]:
8983
Please enter the port for node2 [7574]:
7573
Solr home directory
C:\Users\bb728a\Downloads\solr-5.5.0\solr-5.5.0\example\clou
d\node1\solr already exists.
C:\Users\bb728a\Downloads\solr-5.5.0\solr-5.5.0\example\cloud\node2 already
exis
ts.

Starting up Solr on port 8983 using command:
C:\Users\bb728a\Downloads\solr-5.5.0\solr-5.5.0\bin\solr.cmd start -cloud -p
898
3 -s
"C:\Users\bb728a\Downloads\solr-5.5.0\solr-5.5.0\example\cloud\node1\solr"
-z *localhost:2181,localhost:2182,localhost:2183*


Invalid command-line option: localhost:2182


Usage: solr start [-f] [-c] [-h hostname] [-p port] [-d directory] [-z
zkHost] [
-m memory] [-e example] [-s solr.solr.home] [-a "additional-options"] [-V]




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-cloud-startup-zookeeper-ensemble-tp4259023p4259028.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Exception SolrServerException: No live SolrServers available to handle this request:

2016-02-22 Thread Mugeesh Husain

Hi,
 solr servers are up and listening. and i also check zookeeper
clustersterstate.json as below

get /clusterstate.json
{}
cZxid = 0x10013
ctime = Fri Jan 15 01:40:37 IST 2016
mZxid = 0x11799
mtime = Fri Feb 19 18:59:37 IST 2016
pZxid = 0x10013
cversion = 0
dataVersion = 58
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 2
numChildren = 0


if it is coming some docvalues ,Actuallly i am trying to fetchgrouping
record in ro field.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-SolrServerException-No-live-SolrServers-available-to-handle-this-request-tp4258898p4259026.html
Sent from the Solr - User mailing list archive at Nabble.com.

SOLR cloud startup poniting to zookeeper ensemble

2016-02-22 Thread bbarani

I downloaded the latest version of SOLR (5.5.0) and also installed zookeeper
on port 2181,2182,2183 and its running fine. 

Now when I try to start the SOLR instance using the below command its just
showing help content rather than executing the command.

bin/solr start -e cloud -z localhost:2181,localhost:2182,localhost:2183
-noprompt

The below command works with one zookeeper host.
solr start -e cloud -z localhost:2181 -noprompt

Am I missing anything?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-cloud-startup-poniting-to-zookeeper-ensemble-tp4259023.html
Sent from the Solr - User mailing list archive at Nabble.com.

Fwd: Best Practice to Design Solr Schema in the case of Multistore market place and frequent update

2016-02-22 Thread Sumit Agarwal

Dear All,

Please help to provide input to design requirement sent in previous mail.
is there any way to search past months archive  "lucene-solr-user mailing
list archives"?

is it possible to achieve below solution to merge result from Solr and
mysql? How to search if one field exist in solr and other in mysql?

E.g-Suppose we have indexed product information (changing not frequently
e.g-Brand,size,location etc.) in Solr and quantity & price in mysql as this
data change frequently (one document for each store).
1.if user select brand then we can search directly from Solr first on the
basis of brand & user location and then get price and quantity from mysql
on the basis of ids retrive from solr (maximum 50 product per page). This
seems to be fine keeping in view performance.
2.if user select price and brand then in this case first we need to search
mysql on the basis of price range to retrieve product ids.These product ids
need to pass to solr to filter on the basis of brand and location.This
might not be good idea because ids may be in million as well.Please suggest
what is the best way to handle these scenarios?

Looking forward for your help.

Thank you..
Sumit Agarwal

On Mon, Feb 22, 2016 at 6:35 PM, Sumit Agarwal <2005.su...@gmail.com> wrote:

> Dear All,.
>
>
> Please share your input to design schema on the basis of below input?
>
>
> How to design Solr schema in the case of Multi store marketplace
> application like Amazon?
>
>
> I have asked to design solr schema for application in which product is
> selling by multiple store and each store is having its own quantity and
> price. Keeping in view this requirement
>
> I can think below schema design.
>
>
> 1.  Demoralize data for store and product table. In this case there is
> lot of redundancy and if 7m product sold by 100 stores then it will be
> 100X7M record. I don’t think good design because it will have lot of
> redundancy in each document and 7M documents will become 700M documents in
> solr. Apart from this On Product listing page I need to show product  along
> with minimum price exist among the stores. To get this information I need
> to get group store products and pass to group query as price ascending so
> that minimum price document will be retrieved  per group. As per Google
> Group performance with facet is bad in solr.
>
>  2.  If I kept product document separately from store documents then I
> need to execute more than a query to get products attribute and store
> product having minimum price. This may be  performance issue.
>
>  3.   Solr has introduced new functionality indexing block. As price and
> quantity will change frequently then it is difficult to modify these fields
> in block because whole block needs to re-indexed every time.
>
> 4.how to handle if fields such as seller price,offer price are changing
> frequently?
>
> Please suggest what is the best practice to design schema in this scenario?
>

Re: words with spaces within

2016-02-22 Thread Walter Underwood

This happens for fonts where Tika does not have font metrics. Open the document 
in Adobe Reader, then use document info to find the list of fonts.

Then post this question to the Tika list.

Fix it in Tika, don’t patch it in Solr.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Feb 22, 2016, at 6:40 PM, Binoy Dalal  wrote:
> 
> Is there some set pattern to how these words occur or do they occur
> randomly in the text, i.e., somewhere it'll be "subtitle" and somewhere "s
> u b t i t l e"?
> 
> On Tue, 23 Feb 2016, 05:01 Francisco Andrés Fernández 
> wrote:
> 
>> Hi all,
>> I'm extracting some text from pdf. As result, some important words end with
>> spaces between characters. I know they are words but, don't know how to
>> make Solr detect and index them.
>> For example, I could have the word "Subtitle" that I want to detect,
>> written like "S u b t i t l e". If I would parse the text with a standard
>> tokenizer, the word will be lost.
>> How could I make Solr detect this type of word occurrence?
>> Many thanks,
>> 
>> Francisco
>> 
> -- 
> Regards,
> Binoy Dalal

Re: words with spaces within

2016-02-22 Thread Binoy Dalal

Is there some set pattern to how these words occur or do they occur
randomly in the text, i.e., somewhere it'll be "subtitle" and somewhere "s
u b t i t l e"?

On Tue, 23 Feb 2016, 05:01 Francisco Andrés Fernández 
wrote:

> Hi all,
> I'm extracting some text from pdf. As result, some important words end with
> spaces between characters. I know they are words but, don't know how to
> make Solr detect and index them.
> For example, I could have the word "Subtitle" that I want to detect,
> written like "S u b t i t l e". If I would parse the text with a standard
> tokenizer, the word will be lost.
> How could I make Solr detect this type of word occurrence?
> Many thanks,
>
> Francisco
>
-- 
Regards,
Binoy Dalal

Re: Exception SolrServerException: No live SolrServers available to handle this request:

2016-02-22 Thread Binoy Dalal

Are you sure all your solr servers are up and listening?
If you're using zookeeper, check if zookeeper has your nodes listed in the
cluster state.

On Tue, 23 Feb 2016, 00:45 Mugeesh Husain  wrote:

> I am getting no live node exception i dont know why
>
> In my schema define ro field such as
>
>   
>
> 
>   
> 
> ...
>   
> 
>   
>   
> 
> 
>
>
>
>
> {
>   "responseHeader":{
> "status":500,
> "QTime":9,
> "params":{
>   "q":"_id:(1 3 2)",
>   "indent":"true",
>   "fl":"ro",
>   "group.ngroups":"true",
>   "wt":"json",
>   "group.field":"ro",
>   "group":"true"}},
>   "error":{
> "msg":"org.apache.solr.client.solrj.SolrServerException: No live
> SolrServers available to handle this
> request:[http://localhost:8983/solr/Restaurant_Restaurant_2_replica2,
> http://localhost:8984/solr/Restaurant_Restaurant_1_replica1,
> http://localhost:8983/solr/Restaurant_Restaurant_1_replica2]";,
> "trace":"org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this
> request:[http://localhost:8983/solr/Restaurant_Restaurant_2_replica2,
> http://localhost:8984/solr/Restaurant_Restaurant_1_replica1,
> http://localhost:8983/solr/Restaurant_Restaurant_1_replica2]\n\tat
>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:397)\n\tat
>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Exception-SolrServerException-No-live-SolrServers-available-to-handle-this-request-tp4258898.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
-- 
Regards,
Binoy Dalal

Slow HTTP responses Solr on CDH 5.3 release , solr server behind NAT

2016-02-22 Thread Wyatt Rivers

Hey all,
  I am using openstack sahara to launch a CDH 5.3 cluster. The
cluster has an internal network and each node has a floating IP attached to
it. Essentially it is sitting behind a router.

Here is my problem any http requests that coming from outside the internal
network it takes about 10 seconds to respond. A long time. I mean browsing
to the admin page takes 5 minutes to load.

A request that is initiated inside the network ie. from a 192.168 address
the response is as expected, sub second.

So I tried turning on all the logs to debug, to find some idea, and snooped
the network here is what I can say

There initial http requests comes through to the server immediately as
expected, But then there is about a 5 to 10 second response until I see a
log activity in solr mentioning the resource is a servlet warning log
statement. I receive the response in my browser less than a second later.
This happens for every resource the web page requires.

Any ideas where I could start digging ? Or what the issue might be ? I
should not that if I install solr standalone and not with CDH I do not have
this issue.

Re: SolrCloud, Best performance directly from C

2016-02-22 Thread Shawn Heisey

On 2/22/2016 1:40 PM, Robert Brown wrote:
> As a pure C user, without wishing to use Java, what's my best approach
> for managing the SolrCloud environment?

The most responsive client you would be able to write would use the C
binding for zookeeper, to keep track of clusterstate like
CloudSolrClient does:

http://zookeeper.apache.org/doc/r3.4.8/zookeeperProgrammers.html#C+Binding

If you use this client, the C code would get information from Zookeeper
and use that to manage a list of active servers, collections, and other
information.

Using the HTTP endpoints (Collections API and the zookeeper HTTP
endpoint used by the admin UI) is an option, but then you would need to
regularly poll for status.  The zookeeper client is nearly instantly
notified of changes in zookeeper locations that it is watching, so it is
much more responsive.

Using the zookeeper client would very likely *not* be a trivial
project.  I believe you could use the CloudSolrClient java source as a
model.

Thanks,
Shawn

words with spaces within

2016-02-22 Thread Francisco Andrés Fernández

Hi all,
I'm extracting some text from pdf. As result, some important words end with
spaces between characters. I know they are words but, don't know how to
make Solr detect and index them.
For example, I could have the word "Subtitle" that I want to detect,
written like "S u b t i t l e". If I would parse the text with a standard
tokenizer, the word will be lost.
How could I make Solr detect this type of word occurrence?
Many thanks,

Francisco

SolrCloud, Best performance directly from C

2016-02-22 Thread Robert Brown


Hi,

As a pure C user, without wishing to use Java, what's my best approach 
for managing the SolrCloud environment?


I operate a FastCGI environment, so I have the persistence to cache the 
state of the "cloud".


So far I see good utilisation of the collections API being my best bet?

Any other thoughts or experiences?

Thanks,
Rob

very slow frequent updates

2016-02-22 Thread Roland Szűcs

Hi folks,

We use SOLR 5.2.1. We have ebooks stored in SOLR. The majority of the
fields do not change at all like content, author, publisher Only the
price field changes frequently.

We let the customers to make full text search so we indexed the content
filed. Due to the frequency of the price updates we use the atomic update
feature. As a requirement of the atomic updates we have to store all the
fields even the content field which is 1MB/document and we did not want to
store it just index it.

As we wanted to update 100 documents with atomic update it took about 3
minutes. Taking into account that our metadata /document is 1 Kb and our
content field / document is 1MB we use 1000 more memory to accelerate the
update process.

I am almost 100% sure that we make something wrong.

What is the best practice of the frequent updates when 99% part of a given
document is constant forever?

Thank in advance

-- 
 Roland Szűcs
 Connect with
me on Linkedin 

CEO Phone: +36 1 210 81 13
Bookandwalk.hu

Exception SolrServerException: No live SolrServers available to handle this request:

2016-02-22 Thread Mugeesh Husain

I am getting no live node exception i dont know why

In my schema define ro field such as

  


  

...
  

  
  






{
  "responseHeader":{
"status":500,
"QTime":9,
"params":{
  "q":"_id:(1 3 2)",
  "indent":"true",
  "fl":"ro",
  "group.ngroups":"true",
  "wt":"json",
  "group.field":"ro",
  "group":"true"}},
  "error":{
"msg":"org.apache.solr.client.solrj.SolrServerException: No live
SolrServers available to handle this
request:[http://localhost:8983/solr/Restaurant_Restaurant_2_replica2,
http://localhost:8984/solr/Restaurant_Restaurant_1_replica1,
http://localhost:8983/solr/Restaurant_Restaurant_1_replica2]";,
"trace":"org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this
request:[http://localhost:8983/solr/Restaurant_Restaurant_2_replica2,
http://localhost:8984/solr/Restaurant_Restaurant_1_replica1,
http://localhost:8983/solr/Restaurant_Restaurant_1_replica2]\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:397)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2068)\n\tat 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-SolrServerException-No-live-SolrServers-available-to-handle-this-request-tp4258898.html
Sent from the Solr - User mailing list archive at Nabble.com.

AW: AW: AW: OutOfMemory when batchupdating from SolrJ

2016-02-22 Thread Clemens Wyss DEV

Hi Shawn,
important note ahead: I appreciate your help very much!

> it's too proprietary for public eyes
That's not the reason for not posting all the code. I just tried to extract the 
"relevant parts" in order to prevent not seeing the forest for the trees. 
And yes " addMultipleDocuments" ends up in solrClient.add( batch ). There are 
no other substitutions/modifications, except for it "being executed in a 
lambda". If you desire I can send you the code of our SolrUtil class...? 

> For efficiency reasons, you might try "batch.clear();" instead of creating a 
> new ArrayList
Will do that, allthough I am not hunting for nano's, at least not at the moment 
;)

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Montag, 22. Februar 2016 15:57
An: solr-user@lucene.apache.org
Betreff: Re: AW: AW: OutOfMemory when batchupdating from SolrJ

On 2/22/2016 1:55 AM, Clemens Wyss DEV wrote:
> SolrClient solrClient = getSolrClient( coreName, true ); 
> Collection batch = new 
> ArrayList(); while ( elements.hasNext() ) {
>   IIndexableElement elem = elements.next();
>   SolrInputDocument doc = createSolrDocForElement( elem, provider, locale ); 
> // [1]
>   if ( doc != null )
>   {
> batch.add( doc );
> if ( batch.size() == 100 )
> {
>   solrClient.add( documents ); // [2]
>   batch = new ArrayList(); // [3]
> }
>   }
> }
> if ( !batch.isEmpty() )
> {
>   addMultipleDocuments( uniqueProviderName, solrClient, batch );
>   batch = null;
> }

Did you type the above code as a paraphrase of the actual code, or is that a 
copy/paste of actual code? I'm guessing that it was typed or hand-edited to 
something more simple, not pasted as-is, because in the loop you have an add() 
call on the client, but outside the loop you have something entirely different 
-- a locally defined method called addMultipleDocuments.  I would expect these 
calls to be identical.  Your followup, where you pointed out an error in the 
add statement, suggests even more that you did not provide actual code.

I understand the desire to protect your investment in your work, but partial 
information makes it difficult to offer help.  You could send the code to me 
unicast if you think it's too proprietary for public eyes, but if you do that, 
I will keep the discussion on the list and only talk about the code in general 
terms.

The mailing list will generally eat any attachments, other means are necessary 
to relay code.

For efficiency reasons, you might try "batch.clear();" instead of creating a 
new ArrayList.  It doesn't seem likely that this would actually fix the 
problem, but since the true code is probably different, I can't say for sure.

Thanks,
Shawn

Re: Facet Filter

2016-02-22 Thread Toke Eskildsen

Anil  wrote:
> it means to we need to create two fields of same content to support
> facet and case insensitive , term search on a field. Agree?

As things are now, yes.

- Toke Eskildsen

RE: Slow commits

2016-02-22 Thread Adam Neal [Extranet]

I have also tried it with the fields just indexed and not stored, the 
performance is the same so I doubt it is related to stored field compression.

I can file a JIRA, unfortunately I wont be able to provide example files for 
this system but I will try and reproduce it with some test data and include 
that. 

From: Yonik Seeley [ysee...@gmail.com]
Sent: 22 February 2016 15:43
To: solr-user@lucene.apache.org
Subject: Re: Slow commits


On Mon, Feb 22, 2016 at 10:22 AM, Adam Neal [Extranet]  wrote:
> Highest count is fairly equal between string and text. They are not indexed 
> but stored and no docvalues used

Ah, that's a big clue...
I wonder if it's related to stored fields compression.  In Solr 5.5.
there is a way to tune compression, but the default is already
BEST_SPEED.
I don't know if an easy way to turn compression off to see if that is
the culprit.

Whatever the reasons, this seems to be a large performance regression,
and it's not clear if it's expected or not.
Could you file a JIRA issue for this?

-Yonik

> 
> From: Yonik Seeley [ysee...@gmail.com]
> Sent: 22 February 2016 14:40
> To: solr-user@lucene.apache.org
> Subject: Re: Slow commits
>
>
> What are the types of the fields with the highest count?  I assume
> they are indexed.  Are they stored, and do they have docValues?
>
> -Yonik
>
>
> On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet]  
> wrote:
>> Well I got the numbers wrong, there are actually around 66000 fields on the 
>> index. I have restructured the index and there are now around 1500 fiields. 
>> This has resulted in the commit taking 34 seconds which is acceptable for my 
>> usage however it is still significantly slower than the 4.10.2 commit on the 
>> original 66000 fields which was around 1 second.
>> 
>> From: Adam Neal [Extranet] [an...@mass.co.uk]
>> Sent: 19 February 2016 17:43
>> To: solr-user@lucene.apache.org
>> Subject: RE: Slow commits
>>
>> I'm out of the office now so I don't have the numbers to hand but from 
>> memory I think there are probably around 800-1000 fields or so. I will 
>> confirm on Monday.
>>
>> If i have time over the weekend I will try and recreate the problem at home 
>> and see if I can post up a sample.
>> 
>> From: Yonik Seeley [ysee...@gmail.com]
>> Sent: 19 February 2016 16:25
>> To: solr-user@lucene.apache.org
>> Subject: Re: Slow commits
>>
>> On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet]  
>> wrote:
>>> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
>>> commits on one of my cores. The core in question is relatively small (56k 
>>> docs) and the issue only shows when commiting after a number of deletes, 
>>> commiting after additions is fine. As an example commiting after deleting 
>>> approximately 10% of the documents takes around 25mins. The same test on 
>>> the 4.10.2 instance takes around 1 second.
>>>
>>> I have done some investigation and the problem appears to be caused by 
>>> having dynamic fields, the core in question has a large number, performing 
>>> the same operation on this core with the dynamic fields removed sees a big 
>>> improvement on the performance with the commit taking 11 seconds (still not 
>>> quite on a par with 4.10.2).
>>
>> Dynamic fields is a Solr schema concept, and does not translate to any
>> differences in Lucene.
>> You may be hitting something due to a large number of fields (at the
>> lucene level, each field name is a different field).  How many
>> different fields (i.e. fieldnames) do you have across the entire
>> index?
>>
>> -Yonik
>>
>> #
>>
>> This E-mail is the property of Mass Consultants Ltd. It is confidential and 
>> intended only for the use of the addressee or with its permission.  Use by 
>> anyone else for any purpose is prohibited.  If you are not the addressee, 
>> you should not use, disclose, copy or distribute this e-mail and should 
>> notify us of receipt immediately by return e-mail to the address where the 
>> e-mail originated.
>>
>> This E-mail may not have been sent through a secure system and accordingly 
>> (i) its contents should not be relied upon by any person without independent 
>> verification from Mass Consultants Ltd and (ii) it is the responsibility of 
>> the recipient to ensure that the onward transmission, opening or use of this 
>> message and any attachments will not adversely affect its systems or data. 
>> No responsibility is accepted by Mass Consultants Ltd in this regard.
>>
>> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
>> monitored by systems or persons other than the addressee, for the purposes 
>> of ascertaining whether the communication complies with the law and Mass 
>> Consultants Ltd's policies.
>>
>> Mass Consultants Ltd is r

Re: Sort vs boost

2016-02-22 Thread Anil

Thanks Emir
On Feb 22, 2016 7:31 PM, "Emir Arnautovic" 
wrote:

> Hi Anil,
> Decision also depends on your usecase - if you are sure that there will be
> no cases where documents matches are of different score or you don't care
> about how well document match query (e.g. all queries will be single term
> query) then sorting by time is way to go. But, if there is chance that some
> doc will end up being first because it was the latest even it poorly
> matches query, then sort by time is not an option.
> In cases like this I usually go with extreme - imagine you forgot to
> remove frequent "X" from stopwords and you search for "A X B". If you are
> OR-ing in queries, top document will be the one added the last one with X
> regardless if it has A or B. If there is chance of such scenario, you
> should use boost - it may be slightly more expensive, but much safer.
>
> Regards,
> Emir
>
> On 22.02.2016 11:39, Anil wrote:
>
>> Hi,
>>
>> we would like to display recent records on top.
>>
>> two ways
>>
>> 1. boost by create time desc
>> 2. sort create time by desc
>>
>> i tried both, seems both looks good.
>>
>> which one is better in terms of performance ?
>>
>> i noticed, sort is good than boost in terms of performance.
>>
>> Please correct me if I am wrong.
>>
>> Regards,
>> Anil
>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Solr InputFormat Exist?

2016-02-22 Thread Jamie Johnson

Is there an equivalent of the ESInputFormat (
https://github.com/elastic/elasticsearch-hadoop/blob/03c056142a5ab7422b81bb1f519fd67a9581405f/mr/src/main/java/org/elasticsearch/hadoop/mr/EsInputFormat.java)
in Solr or is there any work that is planned in this regard?

-Jamie

Re: Slow commits

2016-02-22 Thread Yonik Seeley

On Mon, Feb 22, 2016 at 10:22 AM, Adam Neal [Extranet]  wrote:
> Highest count is fairly equal between string and text. They are not indexed 
> but stored and no docvalues used

Ah, that's a big clue...
I wonder if it's related to stored fields compression.  In Solr 5.5.
there is a way to tune compression, but the default is already
BEST_SPEED.
I don't know if an easy way to turn compression off to see if that is
the culprit.

Whatever the reasons, this seems to be a large performance regression,
and it's not clear if it's expected or not.
Could you file a JIRA issue for this?

-Yonik

> 
> From: Yonik Seeley [ysee...@gmail.com]
> Sent: 22 February 2016 14:40
> To: solr-user@lucene.apache.org
> Subject: Re: Slow commits
>
>
> What are the types of the fields with the highest count?  I assume
> they are indexed.  Are they stored, and do they have docValues?
>
> -Yonik
>
>
> On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet]  
> wrote:
>> Well I got the numbers wrong, there are actually around 66000 fields on the 
>> index. I have restructured the index and there are now around 1500 fiields. 
>> This has resulted in the commit taking 34 seconds which is acceptable for my 
>> usage however it is still significantly slower than the 4.10.2 commit on the 
>> original 66000 fields which was around 1 second.
>> 
>> From: Adam Neal [Extranet] [an...@mass.co.uk]
>> Sent: 19 February 2016 17:43
>> To: solr-user@lucene.apache.org
>> Subject: RE: Slow commits
>>
>> I'm out of the office now so I don't have the numbers to hand but from 
>> memory I think there are probably around 800-1000 fields or so. I will 
>> confirm on Monday.
>>
>> If i have time over the weekend I will try and recreate the problem at home 
>> and see if I can post up a sample.
>> 
>> From: Yonik Seeley [ysee...@gmail.com]
>> Sent: 19 February 2016 16:25
>> To: solr-user@lucene.apache.org
>> Subject: Re: Slow commits
>>
>> On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet]  
>> wrote:
>>> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
>>> commits on one of my cores. The core in question is relatively small (56k 
>>> docs) and the issue only shows when commiting after a number of deletes, 
>>> commiting after additions is fine. As an example commiting after deleting 
>>> approximately 10% of the documents takes around 25mins. The same test on 
>>> the 4.10.2 instance takes around 1 second.
>>>
>>> I have done some investigation and the problem appears to be caused by 
>>> having dynamic fields, the core in question has a large number, performing 
>>> the same operation on this core with the dynamic fields removed sees a big 
>>> improvement on the performance with the commit taking 11 seconds (still not 
>>> quite on a par with 4.10.2).
>>
>> Dynamic fields is a Solr schema concept, and does not translate to any
>> differences in Lucene.
>> You may be hitting something due to a large number of fields (at the
>> lucene level, each field name is a different field).  How many
>> different fields (i.e. fieldnames) do you have across the entire
>> index?
>>
>> -Yonik
>>
>> #
>>
>> This E-mail is the property of Mass Consultants Ltd. It is confidential and 
>> intended only for the use of the addressee or with its permission.  Use by 
>> anyone else for any purpose is prohibited.  If you are not the addressee, 
>> you should not use, disclose, copy or distribute this e-mail and should 
>> notify us of receipt immediately by return e-mail to the address where the 
>> e-mail originated.
>>
>> This E-mail may not have been sent through a secure system and accordingly 
>> (i) its contents should not be relied upon by any person without independent 
>> verification from Mass Consultants Ltd and (ii) it is the responsibility of 
>> the recipient to ensure that the onward transmission, opening or use of this 
>> message and any attachments will not adversely affect its systems or data. 
>> No responsibility is accepted by Mass Consultants Ltd in this regard.
>>
>> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
>> monitored by systems or persons other than the addressee, for the purposes 
>> of ascertaining whether the communication complies with the law and Mass 
>> Consultants Ltd's policies.
>>
>> Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
>> Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 
>> (0) 1480 222600.
>>
>> #
>>
>> #
>>
>> This E-mail is the property of Mass Consultants Ltd. It is confidential and 
>> intended only for the use of the addressee or with its per

RE: Slow commits

2016-02-22 Thread Adam Neal [Extranet]

Highest count is fairly equal between string and text. They are not indexed but 
stored and no docvalues used

From: Yonik Seeley [ysee...@gmail.com]
Sent: 22 February 2016 14:40
To: solr-user@lucene.apache.org
Subject: Re: Slow commits


What are the types of the fields with the highest count?  I assume
they are indexed.  Are they stored, and do they have docValues?

-Yonik


On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet]  wrote:
> Well I got the numbers wrong, there are actually around 66000 fields on the 
> index. I have restructured the index and there are now around 1500 fiields. 
> This has resulted in the commit taking 34 seconds which is acceptable for my 
> usage however it is still significantly slower than the 4.10.2 commit on the 
> original 66000 fields which was around 1 second.
> 
> From: Adam Neal [Extranet] [an...@mass.co.uk]
> Sent: 19 February 2016 17:43
> To: solr-user@lucene.apache.org
> Subject: RE: Slow commits
>
> I'm out of the office now so I don't have the numbers to hand but from memory 
> I think there are probably around 800-1000 fields or so. I will confirm on 
> Monday.
>
> If i have time over the weekend I will try and recreate the problem at home 
> and see if I can post up a sample.
> 
> From: Yonik Seeley [ysee...@gmail.com]
> Sent: 19 February 2016 16:25
> To: solr-user@lucene.apache.org
> Subject: Re: Slow commits
>
> On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet]  
> wrote:
>> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
>> commits on one of my cores. The core in question is relatively small (56k 
>> docs) and the issue only shows when commiting after a number of deletes, 
>> commiting after additions is fine. As an example commiting after deleting 
>> approximately 10% of the documents takes around 25mins. The same test on the 
>> 4.10.2 instance takes around 1 second.
>>
>> I have done some investigation and the problem appears to be caused by 
>> having dynamic fields, the core in question has a large number, performing 
>> the same operation on this core with the dynamic fields removed sees a big 
>> improvement on the performance with the commit taking 11 seconds (still not 
>> quite on a par with 4.10.2).
>
> Dynamic fields is a Solr schema concept, and does not translate to any
> differences in Lucene.
> You may be hitting something due to a large number of fields (at the
> lucene level, each field name is a different field).  How many
> different fields (i.e. fieldnames) do you have across the entire
> index?
>
> -Yonik
>
> #
>
> This E-mail is the property of Mass Consultants Ltd. It is confidential and 
> intended only for the use of the addressee or with its permission.  Use by 
> anyone else for any purpose is prohibited.  If you are not the addressee, you 
> should not use, disclose, copy or distribute this e-mail and should notify us 
> of receipt immediately by return e-mail to the address where the e-mail 
> originated.
>
> This E-mail may not have been sent through a secure system and accordingly 
> (i) its contents should not be relied upon by any person without independent 
> verification from Mass Consultants Ltd and (ii) it is the responsibility of 
> the recipient to ensure that the onward transmission, opening or use of this 
> message and any attachments will not adversely affect its systems or data. No 
> responsibility is accepted by Mass Consultants Ltd in this regard.
>
> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
> monitored by systems or persons other than the addressee, for the purposes of 
> ascertaining whether the communication complies with the law and Mass 
> Consultants Ltd's policies.
>
> Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
> Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 
> (0) 1480 222600.
>
> #
>
> #
>
> This E-mail is the property of Mass Consultants Ltd. It is confidential and 
> intended only for the use of the addressee or with its permission.  Use by 
> anyone else for any purpose is prohibited.  If you are not the addressee, you 
> should not use, disclose, copy or distribute this e-mail and should notify us 
> of receipt immediately by return e-mail to the address where the e-mail 
> originated.
>
> This E-mail may not have been sent through a secure system and accordingly 
> (i) its contents should not be relied upon by any person without independent 
> verification from Mass Consultants Ltd and (ii) it is the responsibility of 
> the recipient to ensure that the onward transmission, opening or use of this

Re: AW: AW: OutOfMemory when batchupdating from SolrJ

2016-02-22 Thread Shawn Heisey

On 2/22/2016 1:55 AM, Clemens Wyss DEV wrote:
> SolrClient solrClient = getSolrClient( coreName, true );
> Collection batch = new ArrayList();
> while ( elements.hasNext() )
> {
>   IIndexableElement elem = elements.next();
>   SolrInputDocument doc = createSolrDocForElement( elem, provider, locale ); 
> // [1]
>   if ( doc != null )
>   {
> batch.add( doc );
> if ( batch.size() == 100 )
> {
>   solrClient.add( documents ); // [2]
>   batch = new ArrayList(); // [3]
> }
>   }
> }
> if ( !batch.isEmpty() )
> {
>   addMultipleDocuments( uniqueProviderName, solrClient, batch );
>   batch = null;
> }

Did you type the above code as a paraphrase of the actual code, or is
that a copy/paste of actual code? I'm guessing that it was typed or
hand-edited to something more simple, not pasted as-is, because in the
loop you have an add() call on the client, but outside the loop you have
something entirely different -- a locally defined method called
addMultipleDocuments.  I would expect these calls to be identical.  Your
followup, where you pointed out an error in the add statement, suggests
even more that you did not provide actual code.

I understand the desire to protect your investment in your work, but
partial information makes it difficult to offer help.  You could send
the code to me unicast if you think it's too proprietary for public
eyes, but if you do that, I will keep the discussion on the list and
only talk about the code in general terms.

The mailing list will generally eat any attachments, other means are
necessary to relay code.

For efficiency reasons, you might try "batch.clear();" instead of
creating a new ArrayList.  It doesn't seem likely that this would
actually fix the problem, but since the true code is probably different,
I can't say for sure.

Thanks,
Shawn

Re: Slow commits

2016-02-22 Thread Yonik Seeley

What are the types of the fields with the highest count?  I assume
they are indexed.  Are they stored, and do they have docValues?

-Yonik


On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet]  wrote:
> Well I got the numbers wrong, there are actually around 66000 fields on the 
> index. I have restructured the index and there are now around 1500 fiields. 
> This has resulted in the commit taking 34 seconds which is acceptable for my 
> usage however it is still significantly slower than the 4.10.2 commit on the 
> original 66000 fields which was around 1 second.
> 
> From: Adam Neal [Extranet] [an...@mass.co.uk]
> Sent: 19 February 2016 17:43
> To: solr-user@lucene.apache.org
> Subject: RE: Slow commits
>
> I'm out of the office now so I don't have the numbers to hand but from memory 
> I think there are probably around 800-1000 fields or so. I will confirm on 
> Monday.
>
> If i have time over the weekend I will try and recreate the problem at home 
> and see if I can post up a sample.
> 
> From: Yonik Seeley [ysee...@gmail.com]
> Sent: 19 February 2016 16:25
> To: solr-user@lucene.apache.org
> Subject: Re: Slow commits
>
> On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet]  
> wrote:
>> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
>> commits on one of my cores. The core in question is relatively small (56k 
>> docs) and the issue only shows when commiting after a number of deletes, 
>> commiting after additions is fine. As an example commiting after deleting 
>> approximately 10% of the documents takes around 25mins. The same test on the 
>> 4.10.2 instance takes around 1 second.
>>
>> I have done some investigation and the problem appears to be caused by 
>> having dynamic fields, the core in question has a large number, performing 
>> the same operation on this core with the dynamic fields removed sees a big 
>> improvement on the performance with the commit taking 11 seconds (still not 
>> quite on a par with 4.10.2).
>
> Dynamic fields is a Solr schema concept, and does not translate to any
> differences in Lucene.
> You may be hitting something due to a large number of fields (at the
> lucene level, each field name is a different field).  How many
> different fields (i.e. fieldnames) do you have across the entire
> index?
>
> -Yonik
>
> #
>
> This E-mail is the property of Mass Consultants Ltd. It is confidential and 
> intended only for the use of the addressee or with its permission.  Use by 
> anyone else for any purpose is prohibited.  If you are not the addressee, you 
> should not use, disclose, copy or distribute this e-mail and should notify us 
> of receipt immediately by return e-mail to the address where the e-mail 
> originated.
>
> This E-mail may not have been sent through a secure system and accordingly 
> (i) its contents should not be relied upon by any person without independent 
> verification from Mass Consultants Ltd and (ii) it is the responsibility of 
> the recipient to ensure that the onward transmission, opening or use of this 
> message and any attachments will not adversely affect its systems or data. No 
> responsibility is accepted by Mass Consultants Ltd in this regard.
>
> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
> monitored by systems or persons other than the addressee, for the purposes of 
> ascertaining whether the communication complies with the law and Mass 
> Consultants Ltd's policies.
>
> Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
> Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 
> (0) 1480 222600.
>
> #
>
> #
>
> This E-mail is the property of Mass Consultants Ltd. It is confidential and 
> intended only for the use of the addressee or with its permission.  Use by 
> anyone else for any purpose is prohibited.  If you are not the addressee, you 
> should not use, disclose, copy or distribute this e-mail and should notify us 
> of receipt immediately by return e-mail to the address where the e-mail 
> originated.
>
> This E-mail may not have been sent through a secure system and accordingly 
> (i) its contents should not be relied upon by any person without independent 
> verification from Mass Consultants Ltd and (ii) it is the responsibility of 
> the recipient to ensure that the onward transmission, opening or use of this 
> message and any attachments will not adversely affect its systems or data. No 
> responsibility is accepted by Mass Consultants Ltd in this regard.
>
> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
> monitored by systems or persons other than th

Re: Sort vs boost

2016-02-22 Thread Emir Arnautovic


Hi Anil,
Decision also depends on your usecase - if you are sure that there will 
be no cases where documents matches are of different score or you don't 
care about how well document match query (e.g. all queries will be 
single term query) then sorting by time is way to go. But, if there is 
chance that some doc will end up being first because it was the latest 
even it poorly matches query, then sort by time is not an option.
In cases like this I usually go with extreme - imagine you forgot to 
remove frequent "X" from stopwords and you search for "A X B". If you 
are OR-ing in queries, top document will be the one added the last one 
with X regardless if it has A or B. If there is chance of such scenario, 
you should use boost - it may be slightly more expensive, but much safer.


Regards,
Emir

On 22.02.2016 11:39, Anil wrote:

Hi,

we would like to display recent records on top.

two ways

1. boost by create time desc
2. sort create time by desc

i tried both, seems both looks good.

which one is better in terms of performance ?

i noticed, sort is good than boost in terms of performance.

Please correct me if I am wrong.

Regards,
Anil



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

AW: AW: OutOfMemory when batchupdating from SolrJ

2016-02-22 Thread Clemens Wyss DEV

> solrClient.add( documents ); // [2]
is of course:
solrClient.add( batch ); // [2]

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Montag, 22. Februar 2016 09:55
An: solr-user@lucene.apache.org
Betreff: AW: AW: OutOfMemory when batchupdating from SolrJ

Find attached the relevant part of the batch-update:
...
SolrClient solrClient = getSolrClient( coreName, true ); 
Collection batch = new ArrayList(); while 
( elements.hasNext() ) {
  IIndexableElement elem = elements.next();
  SolrInputDocument doc = createSolrDocForElement( elem, provider, locale ); // 
[1]
  if ( doc != null )
  {
batch.add( doc );
if ( batch.size() == 100 )
{
  solrClient.add( documents ); // [2]
  batch = new ArrayList(); // [3]
}
  }
}
if ( !batch.isEmpty() )
{
  addMultipleDocuments( uniqueProviderName, solrClient, batch );
  batch = null;
}
...

IIndexableElement is part of our index/search framework.

[1] creating a single SolrInputDocument

[2] handing the SIDs to SolrJ/SolrClient

[3] creating a new batch, i.e. releasing the SolrInputDocuments

The above code is being executed in an ExecutorService handed in as a lambda. 
I.e.:
executorService.submit( () -> {

} );

Thanks for any advices. If needed, I can also provide the OOM-heapdump ...

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org]
Gesendet: Freitag, 19. Februar 2016 18:59
An: solr-user@lucene.apache.org
Betreff: Re: AW: OutOfMemory when batchupdating from SolrJ

On 2/19/2016 3:08 AM, Clemens Wyss DEV wrote:
> The logic is somewhat this:
>
> SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got 
> more elements to index ) {
>   batch = create 100 SolrInputDocuments
>   solrClient.add( batch )
>  }

How much data is going into each of those SolrInputDocument objects?

If the amount of data is very small (a few kilobytes), then this sounds like 
your program has a memory leak.  Can you provide more code detail? 
Ideally, you would make the entire code available by placing it on the Internet 
somewhere and providing a URL.  If there's anything sensitive in the code, like 
passwords or public IP addresses, feel free to redact it, but try not to remove 
anything that affects how the code operates.

Thanks,
Shawn

RE: Slow commits

2016-02-22 Thread Adam Neal [Extranet]

Yup, that's correct. Not talking massive amounts of data really. The commit 
performance difference between 4.10.2 and 5.3.1 is huge in this case.

From: Susheel Kumar [susheel2...@gmail.com]
Sent: 22 February 2016 13:31
To: solr-user@lucene.apache.org
Subject: Re: Slow commits

Sorry, I see now you mentioned 56K docs which is pretty small.

On Mon, Feb 22, 2016 at 8:30 AM, Susheel Kumar 
wrote:

> Adam - how many documents you have in your index?
>
> Thanks,
> Susheel
>
> On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet] 
> wrote:
>
>> Well I got the numbers wrong, there are actually around 66000 fields on
>> the index. I have restructured the index and there are now around 1500
>> fiields. This has resulted in the commit taking 34 seconds which is
>> acceptable for my usage however it is still significantly slower than the
>> 4.10.2 commit on the original 66000 fields which was around 1 second.
>> 
>> From: Adam Neal [Extranet] [an...@mass.co.uk]
>> Sent: 19 February 2016 17:43
>> To: solr-user@lucene.apache.org
>> Subject: RE: Slow commits
>>
>> I'm out of the office now so I don't have the numbers to hand but from
>> memory I think there are probably around 800-1000 fields or so. I will
>> confirm on Monday.
>>
>> If i have time over the weekend I will try and recreate the problem at
>> home and see if I can post up a sample.
>> 
>> From: Yonik Seeley [ysee...@gmail.com]
>> Sent: 19 February 2016 16:25
>> To: solr-user@lucene.apache.org
>> Subject: Re: Slow commits
>>
>> On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet] 
>> wrote:
>> > I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with
>> slow commits on one of my cores. The core in question is relatively small
>> (56k docs) and the issue only shows when commiting after a number of
>> deletes, commiting after additions is fine. As an example commiting after
>> deleting approximately 10% of the documents takes around 25mins. The same
>> test on the 4.10.2 instance takes around 1 second.
>> >
>> > I have done some investigation and the problem appears to be caused by
>> having dynamic fields, the core in question has a large number, performing
>> the same operation on this core with the dynamic fields removed sees a big
>> improvement on the performance with the commit taking 11 seconds (still not
>> quite on a par with 4.10.2).
>>
>> Dynamic fields is a Solr schema concept, and does not translate to any
>> differences in Lucene.
>> You may be hitting something due to a large number of fields (at the
>> lucene level, each field name is a different field).  How many
>> different fields (i.e. fieldnames) do you have across the entire
>> index?
>>
>> -Yonik
>>
>>
>> #
>>
>> This E-mail is the property of Mass Consultants Ltd. It is confidential
>> and intended only for the use of the addressee or with its permission.  Use
>> by anyone else for any purpose is prohibited.  If you are not the
>> addressee, you should not use, disclose, copy or distribute this e-mail and
>> should notify us of receipt immediately by return e-mail to the address
>> where the e-mail originated.
>>
>> This E-mail may not have been sent through a secure system and
>> accordingly (i) its contents should not be relied upon by any person
>> without independent verification from Mass Consultants Ltd and (ii) it is
>> the responsibility of the recipient to ensure that the onward transmission,
>> opening or use of this message and any attachments will not adversely
>> affect its systems or data. No responsibility is accepted by Mass
>> Consultants Ltd in this regard.
>>
>> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may
>> be monitored by systems or persons other than the addressee, for the
>> purposes of ascertaining whether the communication complies with the law
>> and Mass Consultants Ltd's policies.
>>
>> Mass Consultants Ltd is registered in England No. 1705804, Enterprise
>> House, Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom.
>> Tel: +44 (0) 1480 222600.
>>
>>
>> #
>>
>>
>> #
>>
>> This E-mail is the property of Mass Consultants Ltd. It is confidential
>> and intended only for the use of the addressee or with its permission.  Use
>> by anyone else for any purpose is prohibited.  If you are not the
>> addressee, you should not use, disclose, copy or distribute this e-mail and
>> should notify us of receipt immediately by return e-mail to the address
>> where the e-mail originated.
>>
>> This E-mail may not have been sent through a secure system and
>> accordingly (i) its contents should not be relied upon by any person
>> without independent v

Re: Slow commits

2016-02-22 Thread Susheel Kumar

Sorry, I see now you mentioned 56K docs which is pretty small.

On Mon, Feb 22, 2016 at 8:30 AM, Susheel Kumar 
wrote:

> Adam - how many documents you have in your index?
>
> Thanks,
> Susheel
>
> On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet] 
> wrote:
>
>> Well I got the numbers wrong, there are actually around 66000 fields on
>> the index. I have restructured the index and there are now around 1500
>> fiields. This has resulted in the commit taking 34 seconds which is
>> acceptable for my usage however it is still significantly slower than the
>> 4.10.2 commit on the original 66000 fields which was around 1 second.
>> 
>> From: Adam Neal [Extranet] [an...@mass.co.uk]
>> Sent: 19 February 2016 17:43
>> To: solr-user@lucene.apache.org
>> Subject: RE: Slow commits
>>
>> I'm out of the office now so I don't have the numbers to hand but from
>> memory I think there are probably around 800-1000 fields or so. I will
>> confirm on Monday.
>>
>> If i have time over the weekend I will try and recreate the problem at
>> home and see if I can post up a sample.
>> 
>> From: Yonik Seeley [ysee...@gmail.com]
>> Sent: 19 February 2016 16:25
>> To: solr-user@lucene.apache.org
>> Subject: Re: Slow commits
>>
>> On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet] 
>> wrote:
>> > I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with
>> slow commits on one of my cores. The core in question is relatively small
>> (56k docs) and the issue only shows when commiting after a number of
>> deletes, commiting after additions is fine. As an example commiting after
>> deleting approximately 10% of the documents takes around 25mins. The same
>> test on the 4.10.2 instance takes around 1 second.
>> >
>> > I have done some investigation and the problem appears to be caused by
>> having dynamic fields, the core in question has a large number, performing
>> the same operation on this core with the dynamic fields removed sees a big
>> improvement on the performance with the commit taking 11 seconds (still not
>> quite on a par with 4.10.2).
>>
>> Dynamic fields is a Solr schema concept, and does not translate to any
>> differences in Lucene.
>> You may be hitting something due to a large number of fields (at the
>> lucene level, each field name is a different field).  How many
>> different fields (i.e. fieldnames) do you have across the entire
>> index?
>>
>> -Yonik
>>
>>
>> #
>>
>> This E-mail is the property of Mass Consultants Ltd. It is confidential
>> and intended only for the use of the addressee or with its permission.  Use
>> by anyone else for any purpose is prohibited.  If you are not the
>> addressee, you should not use, disclose, copy or distribute this e-mail and
>> should notify us of receipt immediately by return e-mail to the address
>> where the e-mail originated.
>>
>> This E-mail may not have been sent through a secure system and
>> accordingly (i) its contents should not be relied upon by any person
>> without independent verification from Mass Consultants Ltd and (ii) it is
>> the responsibility of the recipient to ensure that the onward transmission,
>> opening or use of this message and any attachments will not adversely
>> affect its systems or data. No responsibility is accepted by Mass
>> Consultants Ltd in this regard.
>>
>> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may
>> be monitored by systems or persons other than the addressee, for the
>> purposes of ascertaining whether the communication complies with the law
>> and Mass Consultants Ltd's policies.
>>
>> Mass Consultants Ltd is registered in England No. 1705804, Enterprise
>> House, Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom.
>> Tel: +44 (0) 1480 222600.
>>
>>
>> #
>>
>>
>> #
>>
>> This E-mail is the property of Mass Consultants Ltd. It is confidential
>> and intended only for the use of the addressee or with its permission.  Use
>> by anyone else for any purpose is prohibited.  If you are not the
>> addressee, you should not use, disclose, copy or distribute this e-mail and
>> should notify us of receipt immediately by return e-mail to the address
>> where the e-mail originated.
>>
>> This E-mail may not have been sent through a secure system and
>> accordingly (i) its contents should not be relied upon by any person
>> without independent verification from Mass Consultants Ltd and (ii) it is
>> the responsibility of the recipient to ensure that the onward transmission,
>> opening or use of this message and any attachments will not adversely
>> affect its systems or data. No responsibility is accepted by Mass
>> Consultants Ltd in this regard.
>>
>> Any e

Re: Slow commits

2016-02-22 Thread Susheel Kumar

Adam - how many documents you have in your index?

Thanks,
Susheel

On Mon, Feb 22, 2016 at 4:37 AM, Adam Neal [Extranet] 
wrote:

> Well I got the numbers wrong, there are actually around 66000 fields on
> the index. I have restructured the index and there are now around 1500
> fiields. This has resulted in the commit taking 34 seconds which is
> acceptable for my usage however it is still significantly slower than the
> 4.10.2 commit on the original 66000 fields which was around 1 second.
> 
> From: Adam Neal [Extranet] [an...@mass.co.uk]
> Sent: 19 February 2016 17:43
> To: solr-user@lucene.apache.org
> Subject: RE: Slow commits
>
> I'm out of the office now so I don't have the numbers to hand but from
> memory I think there are probably around 800-1000 fields or so. I will
> confirm on Monday.
>
> If i have time over the weekend I will try and recreate the problem at
> home and see if I can post up a sample.
> 
> From: Yonik Seeley [ysee...@gmail.com]
> Sent: 19 February 2016 16:25
> To: solr-user@lucene.apache.org
> Subject: Re: Slow commits
>
> On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet] 
> wrote:
> > I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with
> slow commits on one of my cores. The core in question is relatively small
> (56k docs) and the issue only shows when commiting after a number of
> deletes, commiting after additions is fine. As an example commiting after
> deleting approximately 10% of the documents takes around 25mins. The same
> test on the 4.10.2 instance takes around 1 second.
> >
> > I have done some investigation and the problem appears to be caused by
> having dynamic fields, the core in question has a large number, performing
> the same operation on this core with the dynamic fields removed sees a big
> improvement on the performance with the commit taking 11 seconds (still not
> quite on a par with 4.10.2).
>
> Dynamic fields is a Solr schema concept, and does not translate to any
> differences in Lucene.
> You may be hitting something due to a large number of fields (at the
> lucene level, each field name is a different field).  How many
> different fields (i.e. fieldnames) do you have across the entire
> index?
>
> -Yonik
>
>
> #
>
> This E-mail is the property of Mass Consultants Ltd. It is confidential
> and intended only for the use of the addressee or with its permission.  Use
> by anyone else for any purpose is prohibited.  If you are not the
> addressee, you should not use, disclose, copy or distribute this e-mail and
> should notify us of receipt immediately by return e-mail to the address
> where the e-mail originated.
>
> This E-mail may not have been sent through a secure system and accordingly
> (i) its contents should not be relied upon by any person without
> independent verification from Mass Consultants Ltd and (ii) it is the
> responsibility of the recipient to ensure that the onward transmission,
> opening or use of this message and any attachments will not adversely
> affect its systems or data. No responsibility is accepted by Mass
> Consultants Ltd in this regard.
>
> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may
> be monitored by systems or persons other than the addressee, for the
> purposes of ascertaining whether the communication complies with the law
> and Mass Consultants Ltd's policies.
>
> Mass Consultants Ltd is registered in England No. 1705804, Enterprise
> House, Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom.
> Tel: +44 (0) 1480 222600.
>
>
> #
>
>
> #
>
> This E-mail is the property of Mass Consultants Ltd. It is confidential
> and intended only for the use of the addressee or with its permission.  Use
> by anyone else for any purpose is prohibited.  If you are not the
> addressee, you should not use, disclose, copy or distribute this e-mail and
> should notify us of receipt immediately by return e-mail to the address
> where the e-mail originated.
>
> This E-mail may not have been sent through a secure system and accordingly
> (i) its contents should not be relied upon by any person without
> independent verification from Mass Consultants Ltd and (ii) it is the
> responsibility of the recipient to ensure that the onward transmission,
> opening or use of this message and any attachments will not adversely
> affect its systems or data. No responsibility is accepted by Mass
> Consultants Ltd in this regard.
>
> Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may
> be monitored by systems or persons other than the addressee, for the
> purposes of ascertaining whether the communication complies with the law
> and Mass Cons

Re: Facet Filter

2016-02-22 Thread Anil

Thank you.

it means to we need to create two fields of same content to support facet
and case insensitive , term search on a field. Agree?

Thanks again,.

Regards,
Anil

On 22 February 2016 at 16:07, Toke Eskildsen  wrote:

> On Mon, 2016-02-22 at 11:48 +0530, Anil wrote:
> > solr Documentation says docValues=true/false works for only few fields.
> > will that work on Text field ?
>
> No. It might at some point, but so far it is just a feature request:
> https://issues.apache.org/jira/browse/SOLR-8362
>
>
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>

Sort vs boost

2016-02-22 Thread Anil

Hi,

we would like to display recent records on top.

two ways

1. boost by create time desc
2. sort create time by desc

i tried both, seems both looks good.

which one is better in terms of performance ?

i noticed, sort is good than boost in terms of performance.

Please correct me if I am wrong.

Regards,
Anil

RE: Frequent connection reset in AbstractFullDistribZkTestBase

2016-02-22 Thread Markus Jelsma

Hi - we have quite some unit tests implementing AbstractFullDistribZkTestBase. 
Since the upgrade to 5.4.1 we frequently see tests failing due to connection 
reset problems. Is there an issue connected to this problem? Is there something 
else i can do?

Thanks,
Markus

 
 
-Original message-
> From:Markus Jelsma 
> Sent: Monday 22nd February 2016 11:38
> To: solr-user 
> Subject: Frequent connection reset in AbstractFullDistribZkTestBase
> 
> q
>

Frequent connection reset in AbstractFullDistribZkTestBase

2016-02-22 Thread Markus Jelsma

Re: Facet Filter

2016-02-22 Thread Toke Eskildsen

On Mon, 2016-02-22 at 11:48 +0530, Anil wrote:
> solr Documentation says docValues=true/false works for only few fields.
> will that work on Text field ?

No. It might at some point, but so far it is just a feature request:
https://issues.apache.org/jira/browse/SOLR-8362

- Toke Eskildsen, State and University Library, Denmark

Re: both way synonyms with ManagedSynonymFilterFactory

2016-02-22 Thread Jan Høydahl

Hi

Did you get any Further with this?
I reproduced your situation with Solr 5.5.

Think the issue here is that when the SynonymFilter is created based on the 
managed map, option “expand” is always set to “false”, while the default for 
file-based synonym dictionary is “true”.

So with expand=false, what happens is that the input word (e.g. “mb”) is 
*replaced* with the synonym “megabytes”. Confusingly enough, when synonyms are 
applied both on index and query side, your document will contain “megabytes” 
instead of “mb”, but when you query for “mb”, the same happens on query side, 
so you will actually match :-)

I think what we need is to switch default to expand=true, and make it 
configurable also in the managed factory.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 11. feb. 2016 kl. 10.16 skrev Bjørn Hjelle :
> 
> Hi,
> 
> one-way managed synonyms seems to work fine, but I cannot make both-way
> synonyms work.
> 
> Steps to reproduce with Solr 5.4.1:
> 
> 1. create a core:
> $ bin/solr create_core -c test -d server/solr/configsets/basic_configs
> 
> 2. edit schema.xml so fieldType text_general looks like this:
> 
> positionIncrementGap="100">
>  
>
> />
>
>  
>
> 
> 3. reload the core:
> 
> $ curl -X GET "
> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test";
> 
> 4. add synonyms, one one-way synonym, one two-way, reload the core again:
> 
> $ curl -X PUT -H 'Content-type:application/json' --data-binary
> '{"mad":["angry","upset"]}' "
> http://localhost:8983/solr/test/schema/analysis/synonyms/english";
> $ curl -X PUT -H 'Content-type:application/json' --data-binary
> '["mb","megabytes"]' "
> http://localhost:8983/solr/test/schema/analysis/synonyms/english";
> $ curl -X GET "
> http://localhost:8983/solr/admin/cores?action=RELOAD&core=test";
> 
> 5. list the synonyms:
> {
>  "responseHeader":{
>"status":0,
>"QTime":0},
>  "synonymMappings":{
>"initArgs":{"ignoreCase":false},
>"initializedOn":"2016-02-11T09:00:50.354Z",
>"managedMap":{
>  "mad":["angry",
>"upset"],
>  "mb":["megabytes"],
>  "megabytes":["mb"]}}}
> 
> 
> 6. add two documents:
> 
> $ bin/post -c test -type 'application/json' -d '[{"id" : "1", "title_t" :
> "10 megabytes makes me mad" },{"id" : "2", "title_t" : "100 mb should be
> sufficient" }]'
> $ bin/post -c test -type 'application/json' -d '[{"id" : "2", "title_t" :
> "100 mb should be sufficient" }]'
> 
> 7. search for the documents:
> 
> - all these return the first document, so one-way synonyms work:
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:angry&indent=true";
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:upset&indent=true";
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:mad&indent=true";
> 
> - this only returns the document with "mb":
> 
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:mb&indent=true";
> 
> - this only returns the document with "megabytes"
> 
> $ curl -X GET "
> http://localhost:8983/solr/test/select?q=title_t:megabytes&indent=true";
> 
> 
> Any input on how to make this work would be appreciated.
> 
> Thanks,
> Bjørn

RE: Slow commits

2016-02-22 Thread Adam Neal [Extranet]

Well I got the numbers wrong, there are actually around 66000 fields on the 
index. I have restructured the index and there are now around 1500 fiields. 
This has resulted in the commit taking 34 seconds which is acceptable for my 
usage however it is still significantly slower than the 4.10.2 commit on the 
original 66000 fields which was around 1 second.

From: Adam Neal [Extranet] [an...@mass.co.uk]
Sent: 19 February 2016 17:43
To: solr-user@lucene.apache.org
Subject: RE: Slow commits

I'm out of the office now so I don't have the numbers to hand but from memory I 
think there are probably around 800-1000 fields or so. I will confirm on Monday.

If i have time over the weekend I will try and recreate the problem at home and 
see if I can post up a sample.

From: Yonik Seeley [ysee...@gmail.com]
Sent: 19 February 2016 16:25
To: solr-user@lucene.apache.org
Subject: Re: Slow commits

On Fri, Feb 19, 2016 at 8:51 AM, Adam Neal [Extranet]  wrote:
> I've recently upgraded from 4.10.2 to 5.3.1 and I've hit an issue with slow 
> commits on one of my cores. The core in question is relatively small (56k 
> docs) and the issue only shows when commiting after a number of deletes, 
> commiting after additions is fine. As an example commiting after deleting 
> approximately 10% of the documents takes around 25mins. The same test on the 
> 4.10.2 instance takes around 1 second.
>
> I have done some investigation and the problem appears to be caused by having 
> dynamic fields, the core in question has a large number, performing the same 
> operation on this core with the dynamic fields removed sees a big improvement 
> on the performance with the commit taking 11 seconds (still not quite on a 
> par with 4.10.2).

Dynamic fields is a Solr schema concept, and does not translate to any
differences in Lucene.
You may be hitting something due to a large number of fields (at the
lucene level, each field name is a different field).  How many
different fields (i.e. fieldnames) do you have across the entire
index?

-Yonik

#

This E-mail is the property of Mass Consultants Ltd. It is confidential and 
intended only for the use of the addressee or with its permission.  Use by 
anyone else for any purpose is prohibited.  If you are not the addressee, you 
should not use, disclose, copy or distribute this e-mail and should notify us 
of receipt immediately by return e-mail to the address where the e-mail 
originated.

This E-mail may not have been sent through a secure system and accordingly (i) 
its contents should not be relied upon by any person without independent 
verification from Mass Consultants Ltd and (ii) it is the responsibility of the 
recipient to ensure that the onward transmission, opening or use of this 
message and any attachments will not adversely affect its systems or data. No 
responsibility is accepted by Mass Consultants Ltd in this regard.

Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
monitored by systems or persons other than the addressee, for the purposes of 
ascertaining whether the communication complies with the law and Mass 
Consultants Ltd's policies.

Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 (0) 
1480 222600.

#

#

This E-mail is the property of Mass Consultants Ltd. It is confidential and 
intended only for the use of the addressee or with its permission.  Use by 
anyone else for any purpose is prohibited.  If you are not the addressee, you 
should not use, disclose, copy or distribute this e-mail and should notify us 
of receipt immediately by return e-mail to the address where the e-mail 
originated.

This E-mail may not have been sent through a secure system and accordingly (i) 
its contents should not be relied upon by any person without independent 
verification from Mass Consultants Ltd and (ii) it is the responsibility of the 
recipient to ensure that the onward transmission, opening or use of this 
message and any attachments will not adversely affect its systems or data. No 
responsibility is accepted by Mass Consultants Ltd in this regard.

Any e-mails that are sent to Mass Consultants Ltd's e-mail addresses may be 
monitored by systems or persons other than the addressee, for the purposes of 
ascertaining whether the communication complies with the law and Mass 
Consultants Ltd's policies.

Mass Consultants Ltd is registered in England No. 1705804, Enterprise House, 
Great North Road, Little Paxton, Cambs., PE19 6BN, United Kingdom. Tel: +44 (0) 
1480 222600.

###

Re: Boost exact search

2016-02-22 Thread elisabeth benoit

Hello,

There was a discussion on this thread about exact match

http://www.mail-archive.com/solr-user%40lucene.apache.org/msg118115.html


they mention an example on this page


https://github.com/cominvent/exactmatch


Best regards,
Elisabeth

2016-02-19 18:01 GMT+01:00 Loïc Stéphan :

> Hello,
>
>
>
> We try to boost exact search to improve relevance.
>
> We followed this article :
> http://everydaydeveloper.blogspot.fr/2012/02/solr-improve-relevancy-by-boosting.html
> and this
> http://stackoverflow.com/questions/29103155/solr-exact-match-boost-over-text-containing-the-exact-match
>  but it doesn’t work for us.
>
>
>
> What is the best way to do this ?
>
>
>
> Thanks in advance
>
>
>
> [image: cid:image001.jpg@01CDD6D4.98875830]
>
>
>
> *--*
>
> *LOIC STEPHAN*
> Responsable TMA
>
> *www.w-seils.com *
>
>
>
> *lstep...@w-seils.com *
> Tel   *+33 (0)2 28 22 75 42 <%2B33%20%280%292%2028%2023%2070%2072>*
>
>
>
>
>

AW: AW: OutOfMemory when batchupdating from SolrJ

2016-02-22 Thread Clemens Wyss DEV

Find attached the relevant part of the batch-update:
...
SolrClient solrClient = getSolrClient( coreName, true );
Collection batch = new ArrayList();
while ( elements.hasNext() )
{
  IIndexableElement elem = elements.next();
  SolrInputDocument doc = createSolrDocForElement( elem, provider, locale ); // 
[1]
  if ( doc != null )
  {
batch.add( doc );
if ( batch.size() == 100 )
{
  solrClient.add( documents ); // [2]
  batch = new ArrayList(); // [3]
}
  }
}
if ( !batch.isEmpty() )
{
  addMultipleDocuments( uniqueProviderName, solrClient, batch );
  batch = null;
}
...

IIndexableElement is part of our index/search framework.

[1] creating a single SolrInputDocument

[2] handing the SIDs to SolrJ/SolrClient

[3] creating a new batch, i.e. releasing the SolrInputDocuments

The above code is being executed in an ExecutorService handed in as a lambda. 
I.e.:
executorService.submit( () -> {

} );

Thanks for any advices. If needed, I can also provide the OOM-heapdump ...

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Freitag, 19. Februar 2016 18:59
An: solr-user@lucene.apache.org
Betreff: Re: AW: OutOfMemory when batchupdating from SolrJ

On 2/19/2016 3:08 AM, Clemens Wyss DEV wrote:
> The logic is somewhat this:
>
> SolrClient solrClient = new HttpSolrClient( coreUrl ); while ( got 
> more elements to index ) {
>   batch = create 100 SolrInputDocuments
>   solrClient.add( batch )
>  }

How much data is going into each of those SolrInputDocument objects?

If the amount of data is very small (a few kilobytes), then this sounds like 
your program has a memory leak.  Can you provide more code detail? 
Ideally, you would make the entire code available by placing it on the Internet 
somewhere and providing a URL.  If there's anything sensitive in the code, like 
passwords or public IP addresses, feel free to redact it, but try not to remove 
anything that affects how the code operates.

Thanks,
Shawn

42 matches

Mail list logo