Re: Solr Atomic Updates

2015-06-03 Thread Erick Erickson
Basically, I think about using SolrCloud whenever you have to split
your corpus into more than one core (shard in SolrCloud terms). Or
when you require fault tolerance in terms of machines going up and
down.

Despite the name, it does _not_ require AWS or similar, and you can
run SolrCloud on a single machine, that is host multiple shards on a
single physical machine to take advantage of the many CPU cores often
available on modern hardware. Or you can host your SolrCloud in your
own data center. Or, really, anywhere that you have one or more
machines available that can talk to each other.

I _really_ recommend you look at this option before pursuing your
original question, it's vastly easier to let SolrCloud handle your
routing, queries etc. than re-invent all that yourself.

Best,
Erick

On Wed, Jun 3, 2015 at 11:23 AM, Ксения Баталова batalova...@gmail.com wrote:
 Upayavira,

 I'm using stand-alone Solr instances.

 I've not learnt SolrCloud yet.

 Please, give me some advice when SolrCloud is better then stand-alone
 Solr instances.

 Or when it is worth to choose SolrCloud.

 _ _ _

 Batalova Kseniya


 If you are using stand-alone Solr instances, then it is your
 responsibility to decide which node a document resides in, and thus to
 which core you will send your update request.

 If, however, you used SolrCloud, it would handle that for you - deciding
 which node should contain a document, and directing the update their all
 behind the scenes for you.

 Upayavira

 On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
 Hi!

 Thanks for your quick reply.

 The problem that all my index is consists of several parts (several
 cores)

 and while updating I don't know in advance in which part updated id is
 lying (in which core the document with specified id is lying).

 For example, I have two cores (*Core1 *and *Core2*) and I want to
 update the document with id *Id1 *and I don't know where this document
 is lying.

 So, I have to do two select-queries to my cores to know where it is.

 And then generate update-query to necessary core.

 What am I doing wrong?

 I remind that I'm using SOLR 4.4.0.

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 Best regards,
 Batalova Kseniya


 What exactly is the problem? And why do you care about cores, per se -
 other than to send the update to the core/collection you are trying to
 update? You should specify the core/collection name in the URL.

 You should also be using the Solr reference guide rather than the (old)
 wiki:
 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


 -- Jack Krupansky

 On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
 wrote:

  Hi!
 
  I'm using *SOLR 4.4.0* for searching in my project.
  Now I am facing a problem of atomic updates in multiple cores.
  From wiki:
 
  curl *http://localhost:8983/solr/update
  http://localhost:8983/solr/update *-H
  'Content-type:application/json' -d '
  [
   {
*id*: *TestDoc1*,
title : {set:test1},
revision  : {inc:3},
publisher : {add:TestPublisher}
   },
   {
id: TestDoc2,
publisher : {add:TestPublisher}
   }
  ]'
 
  As well as I understand, this means that the document, for example, with id
  *TestDoc1*, will be searched for updating *only in one core*.
  And if there is no any document with id *TestDoc1*, the document will be
  created.
  Can I somehow to specify the* list of cores* for searching and then
  updating necessary document with specific id?
 
  It's something like *shards *parameter in *select* query.
  From wiki:
 
  #now do a distributed search across both servers with your browser or curl
  curl '
  http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
  '
 
  Or is it planned in the future?
 
  Thanks in advance.
 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
  Best regards,
  Batalova Kseniya
 


Re: retrieving large number of docs

2015-06-03 Thread Robust Links
Hi Erick

they are on the same JVM. I had already tried the core join strategy but
that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and
core1, and I run this query on core0

/select?q=QUERYfq={!join from=id1 to=id2
fromIndex=core1}facet=truefacet.field=tag

has 2 problems
1) i need to specify the docIDs with the fq (so back to the same
fq={!terms} problem), and
2) faceting doesnt work


Flattening the data is not possible due to security reasons.

Am I using join correctly?

thank you Erick

Peyman

On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Are these indexes on different machines? Because if they're in the
 same JVM, you might be able to use cross-core joins. Be aware, though,
 that joining on high-cardinality fields (which, by definition, docID
 probably is) is where pseudo joins perform worst.

 Have you considered flattening the data and including whatever
 information you have in your from index in your main index? Because
  100ms response is probably not going to be tough if you have to have
 two indexes/cores.

 Best,
 Erick

 On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com
 wrote:
  You may have to do something custom to meet your needs.
 
  10,000 DocID's is not huge but you're latency requirement are pretty low.
 
  Are your DocID's by any chance integers? This can make custom PostFilters
  run much faster.
 
  You should also be aware of the Streaming API in Solr 5.1 which will give
  you fast Map/Reduce approaches (
 
 http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
 ).
 
  Joel Bernstein
  http://joelsolr.blogspot.com/
 
  On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com
 wrote:
 
  Hey Joel
 
  see below
 
  On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com
 wrote:
 
   A few questions for you:
  
   How large can the list of filtering ID's be?
  
 
   10k
 
 
  
   What's your expectation on latency?
  
 
  10 latency 100
 
 
  
   What version of Solr are you using?
  
 
  5.0.0
 
 
  
   SolrCloud or not?
  
 
  not
 
 
 
  
   Joel Bernstein
   http://joelsolr.blogspot.com/
  
   On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com
   wrote:
  
Hi
   
I have a set of document IDs from one core and i want to query
 another
   core
using the ids retrieved from the first core...the constraint is that
  the
size of doc ID set can be very large. I want to:
   
1) retrieve these docs from the 2nd index
2) facet on the results
   
I can think of 3 solutions:
   
1) boolean query
2) terms fq
3) use a DB rather than Solr
   
I am trying to keep latencies down so prefer to not use (3). The
  problem
with (1) is maxBooleanclauses is hardwired and I am not sure when I
  will
hit the exception. Option (2) seems to also hit limits.. so if I do
   
select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
f=id}LONG_LIST_OF_IDS
   
solr just goes blank. I have tried adding cost=200 to try to run the
   query
first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
   could
be a solution but the problem then is that the faceting results
   correspond
to the paged IDs and not the global set.
   
My filter cache spec is as follows
   
  filterCache class=solr.FastLRUCache
 size=100
 initialSize=100
 autowarmCount=10/
   
   
What would be the best way for me to solve this problem?
   
thank you
   
  
 



Re: retrieving large number of docs

2015-06-03 Thread Jack Krupansky
Specify the join query parser for the main query. See:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser


-- Jack Krupansky

On Wed, Jun 3, 2015 at 3:32 PM, Robust Links pey...@robustlinks.com wrote:

 Hi Erick

 they are on the same JVM. I had already tried the core join strategy but
 that doesnt solve the faceting problem... i.e if i have 2 cores, core0 and
 core1, and I run this query on core0

 /select?q=QUERYfq={!join from=id1 to=id2
 fromIndex=core1}facet=truefacet.field=tag

 has 2 problems
 1) i need to specify the docIDs with the fq (so back to the same
 fq={!terms} problem), and
 2) faceting doesnt work


 Flattening the data is not possible due to security reasons.

 Am I using join correctly?

 thank you Erick

 Peyman

 On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson erickerick...@gmail.com
 wrote:

  Are these indexes on different machines? Because if they're in the
  same JVM, you might be able to use cross-core joins. Be aware, though,
  that joining on high-cardinality fields (which, by definition, docID
  probably is) is where pseudo joins perform worst.
 
  Have you considered flattening the data and including whatever
  information you have in your from index in your main index? Because
   100ms response is probably not going to be tough if you have to have
  two indexes/cores.
 
  Best,
  Erick
 
  On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com
  wrote:
   You may have to do something custom to meet your needs.
  
   10,000 DocID's is not huge but you're latency requirement are pretty
 low.
  
   Are your DocID's by any chance integers? This can make custom
 PostFilters
   run much faster.
  
   You should also be aware of the Streaming API in Solr 5.1 which will
 give
   you fast Map/Reduce approaches (
  
 
 http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
  ).
  
   Joel Bernstein
   http://joelsolr.blogspot.com/
  
   On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com
  wrote:
  
   Hey Joel
  
   see below
  
   On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com
  wrote:
  
A few questions for you:
   
How large can the list of filtering ID's be?
   
  
10k
  
  
   
What's your expectation on latency?
   
  
   10 latency 100
  
  
   
What version of Solr are you using?
   
  
   5.0.0
  
  
   
SolrCloud or not?
   
  
   not
  
  
  
   
Joel Bernstein
http://joelsolr.blogspot.com/
   
On Wed, Jun 3, 2015 at 1:23 PM, Robust Links 
 pey...@robustlinks.com
wrote:
   
 Hi

 I have a set of document IDs from one core and i want to query
  another
core
 using the ids retrieved from the first core...the constraint is
 that
   the
 size of doc ID set can be very large. I want to:

 1) retrieve these docs from the 2nd index
 2) facet on the results

 I can think of 3 solutions:

 1) boolean query
 2) terms fq
 3) use a DB rather than Solr

 I am trying to keep latencies down so prefer to not use (3). The
   problem
 with (1) is maxBooleanclauses is hardwired and I am not sure when
 I
   will
 hit the exception. Option (2) seems to also hit limits.. so if I
 do

 select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
 f=id}LONG_LIST_OF_IDS

 solr just goes blank. I have tried adding cost=200 to try to run
 the
query
 first fq={!terms f=id cost=200} but still no good. Paging on doc
 IDs
could
 be a solution but the problem then is that the faceting results
correspond
 to the paged IDs and not the global set.

 My filter cache spec is as follows

   filterCache class=solr.FastLRUCache
  size=100
  initialSize=100
  autowarmCount=10/


 What would be the best way for me to solve this problem?

 thank you

   
  
 



Re: Solr Atomic Updates

2015-06-03 Thread Jack Krupansky
BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster
would make a lot more sense since a cloud is typically a very large
collection of machines and more of a place than a specific configuration,
while a Solr deployment is more typically a more modest number of machines,
a cluster. It just seems totally out of sync with the current popular
conception of a cloud, and it helps confuse people as to when and where
people can use it. I think it must have occurred after the end of my tenure
at Lucid (October 2011), because my recollection is that it was then just
known as distributed.

-- Jack Krupansky

On Wed, Jun 3, 2015 at 3:26 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Basically, I think about using SolrCloud whenever you have to split
 your corpus into more than one core (shard in SolrCloud terms). Or
 when you require fault tolerance in terms of machines going up and
 down.

 Despite the name, it does _not_ require AWS or similar, and you can
 run SolrCloud on a single machine, that is host multiple shards on a
 single physical machine to take advantage of the many CPU cores often
 available on modern hardware. Or you can host your SolrCloud in your
 own data center. Or, really, anywhere that you have one or more
 machines available that can talk to each other.

 I _really_ recommend you look at this option before pursuing your
 original question, it's vastly easier to let SolrCloud handle your
 routing, queries etc. than re-invent all that yourself.

 Best,
 Erick

 On Wed, Jun 3, 2015 at 11:23 AM, Ксения Баталова batalova...@gmail.com
 wrote:
  Upayavira,
 
  I'm using stand-alone Solr instances.
 
  I've not learnt SolrCloud yet.
 
  Please, give me some advice when SolrCloud is better then stand-alone
  Solr instances.
 
  Or when it is worth to choose SolrCloud.
 
  _ _ _
 
  Batalova Kseniya
 
 
  If you are using stand-alone Solr instances, then it is your
  responsibility to decide which node a document resides in, and thus to
  which core you will send your update request.
 
  If, however, you used SolrCloud, it would handle that for you - deciding
  which node should contain a document, and directing the update their all
  behind the scenes for you.
 
  Upayavira
 
  On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
  Hi!
 
  Thanks for your quick reply.
 
  The problem that all my index is consists of several parts (several
  cores)
 
  and while updating I don't know in advance in which part updated id is
  lying (in which core the document with specified id is lying).
 
  For example, I have two cores (*Core1 *and *Core2*) and I want to
  update the document with id *Id1 *and I don't know where this document
  is lying.
 
  So, I have to do two select-queries to my cores to know where it is.
 
  And then generate update-query to necessary core.
 
  What am I doing wrong?
 
  I remind that I'm using SOLR 4.4.0.
 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  Best regards,
  Batalova Kseniya
 
 
  What exactly is the problem? And why do you care about cores, per se -
  other than to send the update to the core/collection you are trying to
  update? You should specify the core/collection name in the URL.
 
  You should also be using the Solr reference guide rather than the (old)
  wiki:
 
 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
 
 
  -- Jack Krupansky
 
  On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
 
  wrote:
 
   Hi!
  
   I'm using *SOLR 4.4.0* for searching in my project.
   Now I am facing a problem of atomic updates in multiple cores.
   From wiki:
  
   curl *http://localhost:8983/solr/update
   http://localhost:8983/solr/update *-H
   'Content-type:application/json' -d '
   [
{
 *id*: *TestDoc1*,
 title : {set:test1},
 revision  : {inc:3},
 publisher : {add:TestPublisher}
},
{
 id: TestDoc2,
 publisher : {add:TestPublisher}
}
   ]'
  
   As well as I understand, this means that the document, for example,
 with id
   *TestDoc1*, will be searched for updating *only in one core*.
   And if there is no any document with id *TestDoc1*, the document will
 be
   created.
   Can I somehow to specify the* list of cores* for searching and then
   updating necessary document with specific id?
  
   It's something like *shards *parameter in *select* query.
   From wiki:
  
   #now do a distributed search across both servers with your browser or
 curl
   curl '
  
 http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
   '
  
   Or is it planned in the future?
  
   Thanks in advance.
  
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
  
   Best regards,
   Batalova Kseniya
  



Re: SolrCloud 5.1 startup looking for standalone config

2015-06-03 Thread tuxedomoon
Yes adding _solr worked, thx.  But I also had to populate the SOLR_HOST param
for each of the 4 hosts, as in
SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com.   I'm in an EC2 VPN
environment which might be the problem.

This command now works (leaving off port)

http://s1/solr/admin/collections?action=CREATEname=mycollectionnumShards=3collection.configName=mycollection_cloud_confcreateNodeSet=s1_solr,s2_solr,s3_solr

The shard directories do now appear on s1,s2,s3 but the order is different
every time I DELETE the collection and rerun the CREATE, right now it is

s1: mycollection_shard2_replica1
s2: mycollection_shard3_replica1
s3: mycollection_shard1_replica1

I'll look further at your article but any advice appreciated on controlling
what hosts the shards land on.

Also are these considered leaders?  If so I don't understand the replica1
suffix.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-5-1-startup-looking-for-standalone-config-tp4209118p4209581.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Atomic Updates

2015-06-03 Thread Shawn Heisey
On 6/3/2015 2:19 PM, Jack Krupansky wrote:
 BTW, does anybody know how SolrCloud got that name? I mean, SolrCluster
 would make a lot more sense since a cloud is typically a very large
 collection of machines and more of a place than a specific configuration,
 while a Solr deployment is more typically a more modest number of machines,
 a cluster. It just seems totally out of sync with the current popular
 conception of a cloud, and it helps confuse people as to when and where
 people can use it. I think it must have occurred after the end of my tenure
 at Lucid (October 2011), because my recollection is that it was then just
 known as distributed.

This all happened before I was paying attention to any development stuff
on Solr.

The earliest mention I have found so far is this:

https://issues.apache.org/jira/browse/SOLR-1873

Here's the first revision of the SolrCloud wiki page that I can access:

http://wiki.apache.org/solr/SolrCloud?action=recallrev=1

I can't find anything about the origins.  I'd like to search the dev
list for history, but I can't find anyplace where this list is
searchable for the correct (2009-2010) timeframe.

Possible origins that I have thought of:

1) *Very* large clusters were envisioned.  There are real SolrCloud
installs consisting of hundreds of machines and billions of documents. 
That certainly qualifies for the cloud moniker.

2) Somebody was interested in leveraging a hot buzzword, to help
generate excitement and support for a new feature.

Thanks,
Shawn



Re: retrieving large number of docs

2015-06-03 Thread Robust Links
that doesnt work either, and even if it did, joining is not going to be a
solution since i cant query 1 core and facet on the result of the other. To
sum up, my problem is

core0

field:id
field: text

core1

field:id
field tag


I want to

1) query text field of core0,
2) use the {id} of matches (which can be 10K) to retrieve the docs in
core 1 with same id and
3) facet on tags in core1

Is this possible without denormalizing (which is not an option)?

thank you

On Wed, Jun 3, 2015 at 4:24 PM, Jack Krupansky jack.krupan...@gmail.com
wrote:

 Specify the join query parser for the main query. See:

 https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-JoinQueryParser


 -- Jack Krupansky

 On Wed, Jun 3, 2015 at 3:32 PM, Robust Links pey...@robustlinks.com
 wrote:

  Hi Erick
 
  they are on the same JVM. I had already tried the core join strategy but
  that doesnt solve the faceting problem... i.e if i have 2 cores, core0
 and
  core1, and I run this query on core0
 
  /select?q=QUERYfq={!join from=id1 to=id2
  fromIndex=core1}facet=truefacet.field=tag
 
  has 2 problems
  1) i need to specify the docIDs with the fq (so back to the same
  fq={!terms} problem), and
  2) faceting doesnt work
 
 
  Flattening the data is not possible due to security reasons.
 
  Am I using join correctly?
 
  thank you Erick
 
  Peyman
 
  On Wed, Jun 3, 2015 at 2:12 PM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Are these indexes on different machines? Because if they're in the
   same JVM, you might be able to use cross-core joins. Be aware, though,
   that joining on high-cardinality fields (which, by definition, docID
   probably is) is where pseudo joins perform worst.
  
   Have you considered flattening the data and including whatever
   information you have in your from index in your main index? Because
100ms response is probably not going to be tough if you have to have
   two indexes/cores.
  
   Best,
   Erick
  
   On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com
   wrote:
You may have to do something custom to meet your needs.
   
10,000 DocID's is not huge but you're latency requirement are pretty
  low.
   
Are your DocID's by any chance integers? This can make custom
  PostFilters
run much faster.
   
You should also be aware of the Streaming API in Solr 5.1 which will
  give
you fast Map/Reduce approaches (
   
  
 
 http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
   ).
   
Joel Bernstein
http://joelsolr.blogspot.com/
   
On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com
 
   wrote:
   
Hey Joel
   
see below
   
On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com
   wrote:
   
 A few questions for you:

 How large can the list of filtering ID's be?

   
 10k
   
   

 What's your expectation on latency?

   
10 latency 100
   
   

 What version of Solr are you using?

   
5.0.0
   
   

 SolrCloud or not?

   
not
   
   
   

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Jun 3, 2015 at 1:23 PM, Robust Links 
  pey...@robustlinks.com
 wrote:

  Hi
 
  I have a set of document IDs from one core and i want to query
   another
 core
  using the ids retrieved from the first core...the constraint is
  that
the
  size of doc ID set can be very large. I want to:
 
  1) retrieve these docs from the 2nd index
  2) facet on the results
 
  I can think of 3 solutions:
 
  1) boolean query
  2) terms fq
  3) use a DB rather than Solr
 
  I am trying to keep latencies down so prefer to not use (3). The
problem
  with (1) is maxBooleanclauses is hardwired and I am not sure
 when
  I
will
  hit the exception. Option (2) seems to also hit limits.. so if I
  do
 
  select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
  f=id}LONG_LIST_OF_IDS
 
  solr just goes blank. I have tried adding cost=200 to try to run
  the
 query
  first fq={!terms f=id cost=200} but still no good. Paging on doc
  IDs
 could
  be a solution but the problem then is that the faceting results
 correspond
  to the paged IDs and not the global set.
 
  My filter cache spec is as follows
 
filterCache class=solr.FastLRUCache
   size=100
   initialSize=100
   autowarmCount=10/
 
 
  What would be the best way for me to solve this problem?
 
  thank you
 

   
  
 



Re: BoolField fieldType

2015-06-03 Thread Erick Erickson
I took a quick look at the code and it _looks_ like any string
starting with t, T or 1 is evaluated as true and everything else
as false.

sortMissingLast determines sort order if you're sorting on this field
and the document doesn't have a value. Should the be sorted after or
before docs that have a value for the field?

Hmm, could use some better docs

Erick

On Wed, Jun 3, 2015 at 2:38 PM, Steven White swhite4...@gmail.com wrote:
 Hi everyone,

 This is a two part question:

 1) I see the following: fieldType name=boolean class=solr.BoolField
 sortMissingLast=true/

 a) what does sortMissingLast do?
 b) what kind of data is considered Boolean?  TRUE, True, true, 1,
 yes,, Yes, FALSE, etc.

 2) When searching, what do I search on: q=MyBoolField:what  That is what
 should what be?

 Thanks

 Steve


BoolField fieldType

2015-06-03 Thread Steven White
Hi everyone,

This is a two part question:

1) I see the following: fieldType name=boolean class=solr.BoolField
sortMissingLast=true/

a) what does sortMissingLast do?
b) what kind of data is considered Boolean?  TRUE, True, true, 1,
yes,, Yes, FALSE, etc.

2) When searching, what do I search on: q=MyBoolField:what  That is what
should what be?

Thanks

Steve


Re: SolrCloud 5.1 startup looking for standalone config

2015-06-03 Thread Shawn Heisey
On 6/3/2015 2:48 PM, tuxedomoon wrote:
 Yes adding _solr worked, thx.  But I also had to populate the SOLR_HOST param
 for each of the 4 hosts, as in
 SOLR_HOST=ec2-52-4-232-216.compute-1.amazonaws.com.   I'm in an EC2 VPN
 environment which might be the problem.

 This command now works (leaving off port)

 http://s1/solr/admin/collections?action=CREATEname=mycollectionnumShards=3collection.configName=mycollection_cloud_confcreateNodeSet=s1_solr,s2_solr,s3_solr

 The shard directories do now appear on s1,s2,s3 but the order is different
 every time I DELETE the collection and rerun the CREATE, right now it is

 s1: mycollection_shard2_replica1
 s2: mycollection_shard3_replica1
 s3: mycollection_shard1_replica1

 I'll look further at your article but any advice appreciated on controlling
 what hosts the shards land on.

 Also are these considered leaders?  If so I don't understand the replica1
 suffix.

A leader is merely a replica that has won an election and has a
temporary title.  It's still a replica, even if it's the ONLY replica.

I would need to look at the code to figure out how it works, but I would
imagine that the shards are shuffled randomly among the hosts so that
multiple collections will be evenly distributed across the cluster.  It
would take me quite a while to familiarize myself with the code before I
could figure out where to look.

If you want to have absolute control over shard and replica placement,
then you will probably need to follow steps similar to these:

* Create a collection with replicationFactor=1.
* Create foo_shardN_replica2 cores with CoreAdmin or ADDREPLICA where
you want them.
* Let the replication fully catch up.
* Use DELETEREPLICA on all the foo_shardN_replica1 cores.
* Manually create the foo_shardN_replica1 cores where you want them.
* Manually create any additional replicas that you desire.

Thanks,
Shawn



Lost connection to Zookeeper

2015-06-03 Thread Joseph Obernberger
Hi All - I've run into a problem where every-once in a while one or more 
of the shards (27 shard cluster) will loose connection to zookeeper and 
report updates are disabled.  In additional to the CLUSTERSTATUS 
timeout errors, which don't seem to cause any issue, this one certainly 
does as that shard no longer takes any (you guessed it!) updates!

We are using Zookeeper with 7 nodes (7 servers in our quorum).
There stack trace is:

-
282833508 [qtp1221263105-801058] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  [UNCLASS shard17 
core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update 
params={wt=javabinversion=2} {add=[COLLECT20001208773720 
(1502857505963769856)]} 0 3
282837711 [qtp1221263105-802489] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  [UNCLASS shard17 
core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update 
params={wt=javabinversion=2} {add=[COLLECT20001208773796 
(1502857510369886208)]} 0 3
282839485 [qtp1221263105-800319] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  [UNCLASS shard17 
core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update 
params={wt=javabinversion=2} {add=[COLLECT20001208773821 
(1502857512230060032)]} 0 4
282841460 [qtp1221263105-801228] INFO 
org.apache.solr.update.processor.LogUpdateProcessor  [UNCLASS shard17 
core_node17 UNCLASS] â [UNCLASS] webapp=/solr path=/update 
params={wt=javabinversion=2} {} 0 1
282841461 [qtp1221263105-801228] ERROR org.apache.solr.core.SolrCore  
[UNCLASS shard17 core_node17 UNCLASS] â 
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1474)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:661)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:104)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:94)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:96)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:103)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 

How http connections are handled in Solr?

2015-06-03 Thread Manohar Sripada
Hi,

I wanted to know in detail on how it is http connections are handled in
Solr.

1. From my code, I am using CloudSolrServer of solrj client library to get
the connection. From one of my previous discussion in this forum, I
understood that Solr uses Apache's HttpClient for connections and the
default maxConnections per host is 32 and default max connections is 128.


*CloudSolrServer cloudSolrServer = new CloudSolrServer(zookeeper_quorum);*

*cloudSolrServer.connect();*
My first question here is what does this maxConnectionsperHost and
maxConnections imply? Are these the connections from solrj client to the
Zookeeper quorum OR from solrj client to the solr nodes?

2. CloudSolrServer uses LBHttpSolrServer which does send requests in round
robin fashion, i.e., first request to node1, 2nd request to node2 etc. If
the answer to the above question is from solrj client to the solr nodes,
then does the http connection pool to the solr nodes from solrj client will
be created for the first request to a particular solr node during round
robin?

3. Consider in my solr cloud I have one collection with 8 shards spread on
4 solr nodes. My understanding is that solrj client will send a query to
one the solr core ( eg:solr core1) residing in one of the solr node (eg:
node1). The solr core1 is responsible for sending queries to all the 8 Solr
cores of that collection. Once it gets the response from all the solr
cores, it merges the data and returns to the client. In this process, how
the http connections between one solr node and rest of solr nodes are
handled.

Does Solr maintains a connection pool here between Solr nodes? If so, when
the connection pool is created between the Solr nodes?

Thanks,
Manohar


Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Shawn Heisey
On 6/3/2015 12:20 AM, Clemens Wyss DEV wrote:
 Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for 
 Solr.
 
 I am seeing the following OOMs:
 ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1] 
 org.apache.solr.common.SolrException; null:java.lang.RuntimeException: 
 java.lang.OutOfMemoryError: Java heap space

snip

 Caused by: java.lang.OutOfMemoryError: Java heap space
 WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1] 
 org.eclipse.jetty.servlet.ServletHandler; Error for 
 /solr/customer-1-de_CH_1/suggest_phrase
 java.lang.OutOfMemoryError: Java heap space
 
 The full commandline is
 /usr/local/java/bin/java -server -Xss256k -Xms16G
 -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 
 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark 
 -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 
 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc 
 -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
 -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
 -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log 
 -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC 
 -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr 
 -Dlog4j.configuration=file:/opt/solr/log4j.properties
 -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 
 /opt/solr/logs OPTIONS=default,rewrite
 
 So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does 
 not seem to happen. What am I missing? Is it o to pull a heapdump from Solr 
 before killing/rebooting in oom_solr.sh?
 
 Also I would like to know what query parameters were sent to 
 /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM 
 ...

The oom script just kills Solr with the KILL signal (-9) and logs the
kill.  That's it.  It does not attempt to make a heap dump.  If you
*want* to dump the heap on OOM, you can, with some additional options:

http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss/20496376#20496376

I don't know if a heap dump on OOM is compatible with the OOM script.
If Java chooses to run the OOM script before the heap dump is done, the
process will be killed before the heap finishes dumping.

FYI, the stacktrace on the OOM error, especially in a multi-threaded app
like Solr, will frequently be completely useless in tracking down the
problem.  The thread that makes the triggering memory allocation may be
completely unrelated.  This error happened on a suggest handler ... but
the large memory allocations may be happening in a completely different
part of the code.

We have not had any recent indications of a memory leak in Solr.  Memory
leaks in Solr *do* happen, but they are usually caught by the tests.
which run in a minimal memory space.  The project has continuous
integration servers set up that run all the tests many times per day.

If you are running out of heap with 16GB allocated, then either your
Solr installation is enormous or you've got a configuration that's not
tuned properly.  With a very large Solr installation, you may need to
simply allocate more memory to the heap ... which may mean that you'll
need to install more memory in the server.  The alternative would be
figuring out where you can change your configuration to reduce memory
requirements.

Here's some incomplete info on settings and situations that can require
a very large heap:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

To provide much help, we'll need lots of details about your system ...
number of documents in all cores, total index size on disk, your config,
possibly your schema, and maybe a few other things I haven't thought of yet.

Thanks,
Shawn



Re: Derive suggestions across multiple fields

2015-06-03 Thread Alessandro Benedetti
Can you share you suggester configurations ?
Have you read the guide I linked ?
Has the suggestion index/fst has been built ? ( you need to build the
suggester)

Cheers

2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 Thank you for your explanation.

 I'll not need to care where the suggestions are coming from. All the
 suggestions from different fields can be consolidate and display together.

 I've tried to put those field into a new Suggestion copy field, but no
 suggestion is shown when I set:
 str name=fieldSuggestion/str  !-- the indexed field to derive
 suggestions from --

 Is there a need to re-index the documents in order for this to work?

 Regards,
 Edwin



 On 2 June 2015 at 17:25, Alessandro Benedetti benedetti.ale...@gmail.com
 wrote:

  Hi Edwin,
  I have worked extensively recently in Suggester and the blog I feel to
  suggest is Erick's one.
  It's really detailed and good for a beginner and expert as well. [1]
 
  Apart that let's see you particular use case :
 
  1) Do you want to be able to get also where the suggestions are coming
 from
  ?
  e.g.
  suggestion1 from field1
  suggestion2 from field2 ?
  In this case I would try with multiple dictionaries but I am not sure
 Solr
  allows you to use them concurrently.
  But can be a really nice extension to develop.
 
  2) If you don't care where the suggestions are coming from, just use a
 copy
  field, where you copy the content of the interesting fields.
  The suggestions will come from the fields you have copied in the copy
  field, without distinction.
 
  Hope this helps you
 
  Cheers
 
 
  [1] http://lucidworks.com/blog/solr-suggester/
 
  2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
   Hi,
  
   Does anyone knows if we can derive suggestions across multiple fields?
  
   I tried to set something like this in my field in suggest
  searchComponents
   in solrconfig.xml, but nothing is returned. It only works when I set a
   single field, and not multiple field.
  
 searchComponent class=solr.SpellCheckComponent name=suggest
   lst name=spellchecker
 str name=namesuggest/str
 str
   name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str
  
  
 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
 str name=fieldContent, Summary/str  !-- the indexed field to
   derive suggestions from --
 float name=threshold0.005/float
 str name=buildOnCommittrue/str
   /lst
 /searchComponent
  
   I'm using solr 5.1.
  
   Regards,
   Edwin
  
 
 
 
  --
  --
 
  Benedetti Alessandro
  Visiting card : http://about.me/alessandro_benedetti
 
  Tyger, tyger burning bright
  In the forests of the night,
  What immortal hand or eye
  Could frame thy fearful symmetry?
 
  William Blake - Songs of Experience -1794 England
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Clemens Wyss DEV
Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for 
Solr.

I am seeing the following OOMs:
ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1] 
org.apache.solr.common.SolrException; null:java.lang.RuntimeException: 
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1] 
org.eclipse.jetty.servlet.ServletHandler; Error for 
/solr/customer-1-de_CH_1/suggest_phrase
java.lang.OutOfMemoryError: Java heap space

The full commandline is
/usr/local/java/bin/java -server -Xss256k -Xms16G
-Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 
-XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark 
-XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 
-XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc 
-XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
-XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log 
-Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC 
-Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr 
-Dlog4j.configuration=file:/opt/solr/log4j.properties
-jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 
/opt/solr/logs OPTIONS=default,rewrite

So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does 
not seem to happen. What am I missing? Is it o to pull a heapdump from Solr 
before killing/rebooting in oom_solr.sh?

Also I would like to know what query parameters were sent to 
/solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM ...




Re: Number of clustering labels to show

2015-06-03 Thread Zheng Lin Edwin Yeo
Thank you so much for your explanation.

On 2 June 2015 at 17:31, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 The scope in there is to try to make clustering lighter and more related to
 the query.
 The summary produced is a fragment that is surrounding the query terms in
 the document content.
 Actually this is arguably a way to improve the quality of clusters, but for
 sure it makes the clustering operation lighter, as the content used to
 produce the clusters is much smaller than the full content.

 We can discuss of course if the window of text surrounding queries match is
 really helpful to cluster the documents in a more precise way.
 That is not an easy research topic, and for sure it depends strictly on the
 use cases.
 For this reason a user should decide if going with the summary ( lighter)
 approach or the more comprehensive , full content approach.

 Cheers

 2015-06-02 3:21 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  Thank you so much Alessandro.
 
  But i do not find any difference with the quality of the clustering
 results
  when I change the hl.fragszie to a  even though I've set my
  carrot.produceSummary to true.
 
 
  Regards,
  Edwin
 
 
  On 1 June 2015 at 17:31, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  wrote:
 
   Only to clarify the initial mail, The carrot.fragSize has nothing to do
   with the number of clusters produced.
  
   When you select to work with field summary ( you will work only on
  snippets
   from the original content, snippets produced by the highlight of the
  query
   in the content), the fragSize will specify the size of these fragments.
  
   From Carrot documentation :
  
   carrot.produceSummary
  
   When true, the carrot.snippet
   https://wiki.apache.org/solr/ClusteringComponent#carrot.snippet
 field
   (if
   no snippet field, then the carrot.title
   https://wiki.apache.org/solr/ClusteringComponent#carrot.title field)
   will
   be highlighted and the highlighted text will be used for clustering.
   Highlighting is recommended when the snippet field contains a lot of
   content. Highlighting can also increase the quality of clustering
 because
   the clustered content will get an additional query-specific context.
   carrot.fragSize
  
   The frag size to use for highlighting. Meaningful only when
   carrot.produceSummary
   
 https://wiki.apache.org/solr/ClusteringComponent#carrot.produceSummary
   is
   true. If not specified, the default highlighting fragsize (hl.fragsize)
   will be used. If that isn't specified, then 100.
  
  
   Cheers
  
   2015-06-01 2:00 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
Thank you Stanislaw for the links. Will read them up to better
  understand
how the algorithm works.
   
Regards,
Edwin
   
On 29 May 2015 at 17:22, Stanislaw Osinski 
stanislaw.osin...@carrotsearch.com wrote:
   
 Hi,

 The number of clusters primarily depends on the parameters of the
specific
 clustering algorithm. If you're using the default Lingo algorithm,
  the
 number of clusters is governed by
 the LingoClusteringAlgorithm.desiredClusterCountBase parameter.
 Take
  a
look
 at the documentation (


   
  
 
 https://cwiki.apache.org/confluence/display/solr/Result+Clustering#ResultClustering-TweakingAlgorithmSettings
 )
 for some more details (the Tweaking at Query-Time section shows
 how
   to
 pass the specific parameters at request time). A complete overview
 of
   the
 Lingo clustering algorithm parameters is here:
 http://doc.carrot2.org/#section.component.lingo.

 Stanislaw

 --
 Stanislaw Osinski, stanislaw.osin...@carrotsearch.com
 http://carrotsearch.com

 On Fri, May 29, 2015 at 4:29 AM, Zheng Lin Edwin Yeo 
edwinye...@gmail.com
 
 wrote:

  Hi,
 
  I'm trying to increase the number of cluster result to be shown
   during
 the
  search. I tried to set carrot.fragSize=20 but only 15 cluster
  labels
   is
  shown. Even when I tried to set carrot.fragSize=5, there's also
 15
labels
  shown.
 
  Is this the correct way to do this? I understand that setting it
 to
   20
  might not necessary mean 20 lables will be shown, as the setting
 is
   for
  maximum number. But when I set this to 5, it should reduce the
  number
of
  labels to 5?
 
  I'm using Solr 5.1.
 
 
  Regards,
  Edwin
 

   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 

AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Clemens Wyss DEV
Ciao Shawn,
thanks for your reply. 
 The oom script just kills Solr with the KILL signal (-9) and logs the kill.  
I know. But my feeling is, that not even this happens, i.e. the script is not 
being executed. At least I see no solr_oom_killer-$SOLR_PORT-$NOW.log file ...

Btw:
Who re-starts solr after it's been killed?

 FYI, the stacktrace on the OOM error, especially in a multi-threaded app like 
 Solr, 
will frequently be completely useless in tracking down the problem.
I agree

 I don't know if a heap dump on OOM is compatible with the OOM script.
If Java chooses to run the OOM script before the heap dump is done, the 
process 
will be killed before the heap finishes dumping.
What if I did a jmap-call in the oom-script before killing the process?

-Clemens

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Mittwoch, 3. Juni 2015 09:16
An: solr-user@lucene.apache.org
Betreff: Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not 
triggered

On 6/3/2015 12:20 AM, Clemens Wyss DEV wrote:
 Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available for 
 Solr.
 
 I am seeing the following OOMs:
 ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1] 
 org.apache.solr.common.SolrException; null:java.lang.RuntimeException: 
 java.lang.OutOfMemoryError: Java heap space

snip

 Caused by: java.lang.OutOfMemoryError: Java heap space
 WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1] 
 org.eclipse.jetty.servlet.ServletHandler; Error for 
 /solr/customer-1-de_CH_1/suggest_phrase
 java.lang.OutOfMemoryError: Java heap space
 
 The full commandline is
 /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G 
 -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 
 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 
 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m 
 -XX:+UseCMSInitiatingOccupancyOnly 
 -XX:CMSInitiatingOccupancyFraction=50 
 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled 
 -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC 
 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
 -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime 
 -Xloggc:/opt/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 
 -DSTOP.KEY=solrrocks -Duser.timezone=UTC 
 -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr 
 -Dlog4j.configuration=file:/opt/solr/log4j.properties
 -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 
 8983 /opt/solr/logs OPTIONS=default,rewrite
 
 So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this does 
 not seem to happen. What am I missing? Is it o to pull a heapdump from Solr 
 before killing/rebooting in oom_solr.sh?
 
 Also I would like to know what query parameters were sent to 
 /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he OOM 
 ...

The oom script just kills Solr with the KILL signal (-9) and logs the kill.  
That's it.  It does not attempt to make a heap dump.  If you
*want* to dump the heap on OOM, you can, with some additional options:

http://stackoverflow.com/questions/542979/using-heapdumponoutofmemoryerror-parameter-for-heap-dump-for-jboss/20496376#20496376

I don't know if a heap dump on OOM is compatible with the OOM script.
If Java chooses to run the OOM script before the heap dump is done, the process 
will be killed before the heap finishes dumping.

FYI, the stacktrace on the OOM error, especially in a multi-threaded app like 
Solr, will frequently be completely useless in tracking down the problem.  The 
thread that makes the triggering memory allocation may be completely unrelated. 
 This error happened on a suggest handler ... but the large memory allocations 
may be happening in a completely different part of the code.

We have not had any recent indications of a memory leak in Solr.  Memory leaks 
in Solr *do* happen, but they are usually caught by the tests.
which run in a minimal memory space.  The project has continuous integration 
servers set up that run all the tests many times per day.

If you are running out of heap with 16GB allocated, then either your Solr 
installation is enormous or you've got a configuration that's not tuned 
properly.  With a very large Solr installation, you may need to simply allocate 
more memory to the heap ... which may mean that you'll need to install more 
memory in the server.  The alternative would be figuring out where you can 
change your configuration to reduce memory requirements.

Here's some incomplete info on settings and situations that can require a very 
large heap:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

To provide much help, we'll need lots of details about your system ...
number of documents in all cores, total index size on disk, your config, 
possibly your schema, and maybe a few other things I 

Re: Derive suggestions across multiple fields

2015-06-03 Thread Zheng Lin Edwin Yeo
This is my suggester configuration:

  searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
  str name=fieldtext/str  !-- the indexed field to derive
suggestions from --
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
  requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/suggest
lst name=defaults
   str name=echoParamsexplicit/str
  str name=defTypeedismax/str
   int name=rows10/int
   str name=wtjson/str
   str name=indenttrue/str
  str name=dftext/str

  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.collatetrue/str
/lst
arr name=components
  strsuggest/str
/arr
  /requestHandler


Yes, I've read the guide. I've found out that there is a need to do
re-indexing if I'm creating a new copyField. It works when I used the
copyField that's created before the indexing is done.

As I'm using the spellcheck dictionary as my suggester, so does that mean I
just need to build the spellcheck dictionary?


Regards,
Edwin


On 3 June 2015 at 17:36, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 Can you share you suggester configurations ?
 Have you read the guide I linked ?
 Has the suggestion index/fst has been built ? ( you need to build the
 suggester)

 Cheers

 2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  Thank you for your explanation.
 
  I'll not need to care where the suggestions are coming from. All the
  suggestions from different fields can be consolidate and display
 together.
 
  I've tried to put those field into a new Suggestion copy field, but no
  suggestion is shown when I set:
  str name=fieldSuggestion/str  !-- the indexed field to derive
  suggestions from --
 
  Is there a need to re-index the documents in order for this to work?
 
  Regards,
  Edwin
 
 
 
  On 2 June 2015 at 17:25, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  wrote:
 
   Hi Edwin,
   I have worked extensively recently in Suggester and the blog I feel to
   suggest is Erick's one.
   It's really detailed and good for a beginner and expert as well. [1]
  
   Apart that let's see you particular use case :
  
   1) Do you want to be able to get also where the suggestions are coming
  from
   ?
   e.g.
   suggestion1 from field1
   suggestion2 from field2 ?
   In this case I would try with multiple dictionaries but I am not sure
  Solr
   allows you to use them concurrently.
   But can be a really nice extension to develop.
  
   2) If you don't care where the suggestions are coming from, just use a
  copy
   field, where you copy the content of the interesting fields.
   The suggestions will come from the fields you have copied in the copy
   field, without distinction.
  
   Hope this helps you
  
   Cheers
  
  
   [1] http://lucidworks.com/blog/solr-suggester/
  
   2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
Hi,
   
Does anyone knows if we can derive suggestions across multiple
 fields?
   
I tried to set something like this in my field in suggest
   searchComponents
in solrconfig.xml, but nothing is returned. It only works when I set
 a
single field, and not multiple field.
   
  searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
  str name=namesuggest/str
  str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
   
   
  
 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
  str name=fieldContent, Summary/str  !-- the indexed field to
derive suggestions from --
  float name=threshold0.005/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent
   
I'm using solr 5.1.
   
Regards,
Edwin
   
  
  
  
   --
   --
  
   Benedetti Alessandro
   Visiting card : http://about.me/alessandro_benedetti
  
   Tyger, tyger burning bright
   In the forests of the night,
   What immortal hand or eye
   Could frame thy fearful symmetry?
  
   William Blake - Songs of Experience -1794 England
  
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England



Re: Solr Atomic Updates

2015-06-03 Thread Jack Krupansky
Explain a little about why you have separate cores, and how you decide
which core a new document should reside in. Your scenario still seems a bit
odd, so help us understand.


-- Jack Krupansky

On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова batalova...@gmail.com
wrote:

 Hi!

 Thanks for your quick reply.

 The problem that all my index is consists of several parts (several cores)

 and while updating I don't know in advance in which part updated id is
 lying (in which core the document with specified id is lying).

 For example, I have two cores (*Core1 *and *Core2*) and I want to
 update the document with id *Id1 *and I don't know where this document
 is lying.

 So, I have to do two select-queries to my cores to know where it is.

 And then generate update-query to necessary core.

 What am I doing wrong?

 I remind that I'm using SOLR 4.4.0.

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 Best regards,
 Batalova Kseniya


 What exactly is the problem? And why do you care about cores, per se -
 other than to send the update to the core/collection you are trying to
 update? You should specify the core/collection name in the URL.

 You should also be using the Solr reference guide rather than the (old)
 wiki:

 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


 -- Jack Krupansky

 On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
 wrote:

  Hi!
 
  I'm using *SOLR 4.4.0* for searching in my project.
  Now I am facing a problem of atomic updates in multiple cores.
  From wiki:
 
  curl *http://localhost:8983/solr/update
  http://localhost:8983/solr/update *-H
  'Content-type:application/json' -d '
  [
   {
*id*: *TestDoc1*,
title : {set:test1},
revision  : {inc:3},
publisher : {add:TestPublisher}
   },
   {
id: TestDoc2,
publisher : {add:TestPublisher}
   }
  ]'
 
  As well as I understand, this means that the document, for example, with
 id
  *TestDoc1*, will be searched for updating *only in one core*.
  And if there is no any document with id *TestDoc1*, the document will be
  created.
  Can I somehow to specify the* list of cores* for searching and then
  updating necessary document with specific id?
 
  It's something like *shards *parameter in *select* query.
  From wiki:
 
  #now do a distributed search across both servers with your browser or
 curl
  curl '
 
 http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
  '
 
  Or is it planned in the future?
 
  Thanks in advance.
 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
  Best regards,
  Batalova Kseniya
 



Re: Solr Atomic Updates

2015-06-03 Thread Upayavira
If you are using stand-alone Solr instances, then it is your
responsibility to decide which node a document resides in, and thus to
which core you will send your update request.

If, however, you used SolrCloud, it would handle that for you - deciding
which node should contain a document, and directing the update their all
behind the scenes for you.

Upayavira

On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
 Hi!
 
 Thanks for your quick reply.
 
 The problem that all my index is consists of several parts (several
 cores)
 
 and while updating I don't know in advance in which part updated id is
 lying (in which core the document with specified id is lying).
 
 For example, I have two cores (*Core1 *and *Core2*) and I want to
 update the document with id *Id1 *and I don't know where this document
 is lying.
 
 So, I have to do two select-queries to my cores to know where it is.
 
 And then generate update-query to necessary core.
 
 What am I doing wrong?
 
 I remind that I'm using SOLR 4.4.0.
 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 Best regards,
 Batalova Kseniya
 
 
 What exactly is the problem? And why do you care about cores, per se -
 other than to send the update to the core/collection you are trying to
 update? You should specify the core/collection name in the URL.
 
 You should also be using the Solr reference guide rather than the (old)
 wiki:
 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
 
 
 -- Jack Krupansky
 
 On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
 wrote:
 
  Hi!
 
  I'm using *SOLR 4.4.0* for searching in my project.
  Now I am facing a problem of atomic updates in multiple cores.
  From wiki:
 
  curl *http://localhost:8983/solr/update
  http://localhost:8983/solr/update *-H
  'Content-type:application/json' -d '
  [
   {
*id*: *TestDoc1*,
title : {set:test1},
revision  : {inc:3},
publisher : {add:TestPublisher}
   },
   {
id: TestDoc2,
publisher : {add:TestPublisher}
   }
  ]'
 
  As well as I understand, this means that the document, for example, with id
  *TestDoc1*, will be searched for updating *only in one core*.
  And if there is no any document with id *TestDoc1*, the document will be
  created.
  Can I somehow to specify the* list of cores* for searching and then
  updating necessary document with specific id?
 
  It's something like *shards *parameter in *select* query.
  From wiki:
 
  #now do a distributed search across both servers with your browser or curl
  curl '
  http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
  '
 
  Or is it planned in the future?
 
  Thanks in advance.
 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
  Best regards,
  Batalova Kseniya
 


Re: How to tell when Collector finishes collect loop?

2015-06-03 Thread Joel Bernstein
I think there are easier ways to do what you are trying to do.

Take a look at the Function query parser.

It will allow you to control the score for each document from within a
function query. The basic use case is this:

q={!func}myFunc()fq=my+query

In this scenario the func qparser plugin controls the score and the fq
provides the query.






Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 3, 2015 at 9:50 AM, adfel70 adfe...@gmail.com wrote:

 Hi guys, need your help (again):
 I have a search handler which need to override solr's scoring. I chose to
 implement it with RankQuery API, so when getTopDocsCollector() gets called
 it instantiates my TopDocsCollector instance, and every dicId gets its own
 score:

 public class MyScorerrankQuet extends RankQuery {
 ...

 @Override
 public TopDocsCollector getTopDocsCollector(int i,
 SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) {
 ...
 return new MyCollector(...)
 }
 }

 public class MyCollector  extends TopDocsCollector{
 //Initialized in constrctor
 MyScorer scorer;

 public MyCollector(){
 scorer = new MyScorer();
 scorer.start(); //the scorer's API needs to call
 start() before every
 query and close() at the end of the query
 }

 @Override
 public void collect(int id){
 //1. get specific field from the doc using DocValues and
 calculate score
 using my scorer
 //2. add docId and score (ScoreDoc object) into
 PriorityQueue.
 }
 }

 My problem is that I cant find a place to call scorer.close(), which need
 to
 be executed when the query ends (after we calculated score for each docID).
 I saw the DeligatingCollector has finish() method which is called after
 collector is done, but I cannot extend both TopDocsCollector and
 DeligatingCollector...





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-tell-when-Collector-finishes-collect-loop-tp4209447.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to tell when Collector finishes collect loop?

2015-06-03 Thread Joel Bernstein
The finish method would still be a problem using the func qparser.

Out of curiosity, why do you need to call close on the scorer?

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 3, 2015 at 10:53 AM, Joel Bernstein joels...@gmail.com wrote:

 I think there are easier ways to do what you are trying to do.

 Take a look at the Function query parser.

 It will allow you to control the score for each document from within a
 function query. The basic use case is this:

 q={!func}myFunc()fq=my+query

 In this scenario the func qparser plugin controls the score and the fq
 provides the query.






 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Jun 3, 2015 at 9:50 AM, adfel70 adfe...@gmail.com wrote:

 Hi guys, need your help (again):
 I have a search handler which need to override solr's scoring. I chose to
 implement it with RankQuery API, so when getTopDocsCollector() gets called
 it instantiates my TopDocsCollector instance, and every dicId gets its own
 score:

 public class MyScorerrankQuet extends RankQuery {
 ...

 @Override
 public TopDocsCollector getTopDocsCollector(int i,
 SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) {
 ...
 return new MyCollector(...)
 }
 }

 public class MyCollector  extends TopDocsCollector{
 //Initialized in constrctor
 MyScorer scorer;

 public MyCollector(){
 scorer = new MyScorer();
 scorer.start(); //the scorer's API needs to call
 start() before every
 query and close() at the end of the query
 }

 @Override
 public void collect(int id){
 //1. get specific field from the doc using DocValues and
 calculate score
 using my scorer
 //2. add docId and score (ScoreDoc object) into
 PriorityQueue.
 }
 }

 My problem is that I cant find a place to call scorer.close(), which need
 to
 be executed when the query ends (after we calculated score for each
 docID).
 I saw the DeligatingCollector has finish() method which is called after
 collector is done, but I cannot extend both TopDocsCollector and
 DeligatingCollector...





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-tell-when-Collector-finishes-collect-loop-tp4209447.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: How http connections are handled in Solr?

2015-06-03 Thread Shawn Heisey
On 6/3/2015 4:12 AM, Manohar Sripada wrote:
 1. From my code, I am using CloudSolrServer of solrj client library to get
 the connection. From one of my previous discussion in this forum, I
 understood that Solr uses Apache's HttpClient for connections and the
 default maxConnections per host is 32 and default max connections is 128.
 
 *CloudSolrServer cloudSolrServer = new CloudSolrServer(zookeeper_quorum);*
 
 *cloudSolrServer.connect();*
 My first question here is what does this maxConnectionsperHost and
 maxConnections imply? Are these the connections from solrj client to the
 Zookeeper quorum OR from solrj client to the solr nodes?

By default, CloudSolrServer sets up an HttpClient object that is given
to the LBHttpSolrServer instance inside it.  The LBHttpSolrServer object
shares that HttpClient between all of the HttpSolrServer objects that it
maintains.

You can configure your own HttpClient in your code and then use that to
create CloudSolrServer:

http://lucene.apache.org/solr/5_1_0/solr-solrj/org/apache/solr/client/solrj/impl/CloudSolrClient.html#CloudSolrClient%28java.util.Collection,%20java.lang.String,%20org.apache.http.client.HttpClient%29

Zookeeper is a separate jar entirely, and handles its own network
connectivity.  That connectivity is NOT http.

 3. Consider in my solr cloud I have one collection with 8 shards spread on
 4 solr nodes. My understanding is that solrj client will send a query to
 one the solr core ( eg:solr core1) residing in one of the solr node (eg:
 node1). The solr core1 is responsible for sending queries to all the 8 Solr
 cores of that collection. Once it gets the response from all the solr
 cores, it merges the data and returns to the client. In this process, how
 the http connections between one solr node and rest of solr nodes are
 handled.

For distributed searching, Solr uses the SolrJ client internally to
collect responses from the shards.  The HttpClient for THAT
communication is configured with the shardHandler in solrconfig.xml.

https://wiki.apache.org/solr/SolrConfigXml?highlight=%28shardhandler%29#Configuration_of_Shard_Handlers_for_Distributed_searches

Thanks,
Shawn



AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Clemens Wyss DEV
Hi Mark,
what exactly should I file? What needs to be added/appended to the issue?

Regards
Clemens

-Ursprüngliche Nachricht-
Von: Mark Miller [mailto:markrmil...@gmail.com] 
Gesendet: Mittwoch, 3. Juni 2015 14:23
An: solr-user@lucene.apache.org
Betreff: Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not 
triggered

We will have to a find a way to deal with this long term. Browsing the code I 
can see a variety of places where problem exception handling has been 
introduced since this all was fixed.

- Mark

On Wed, Jun 3, 2015 at 8:19 AM Mark Miller markrmil...@gmail.com wrote:

 File a JIRA issue please. That OOM Exception is getting wrapped in a 
 RuntimeException it looks. Bug.

 - Mark


 On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch
 wrote:

 Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G 
 available for Solr.

 I am seeing the following OOMs:
 ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
 org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.OutOfMemoryError: Java heap space
 WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
 org.eclipse.jetty.servlet.ServletHandler; Error for 
 /solr/customer-1-de_CH_1/suggest_phrase
 java.lang.OutOfMemoryError: Java heap space

 The full commandline is
 /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G 
 -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 
 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m 
 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:CMSInitiatingOccupancyFraction=50 
 -XX:CMSMaxAbortablePrecleanTime=6000
 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc 
 -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
 -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
 -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log
 -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks 
 -Duser.timezone=UTC -Dsolr.solr.home=/opt/solr/data 
 -Dsolr.install.dir=/usr/local/solr
 -Dlog4j.configuration=file:/opt/solr/log4j.properties
 -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh
 8983 /opt/solr/logs OPTIONS=default,rewrite

 So I'd 

Re: Derive suggestions across multiple fields

2015-06-03 Thread Alessandro Benedetti
I can see a lot of confusion in the configuration!

Few suggestions :
- read carefully the document and try to apply the suggesting guidance
- currently there is no need to use spellcheck for suggestions, now they
are separated things
- i see text used to derive suggestions, I would prefer there to see the
copy field specifically used to contain the interesting fields
- Yes you need to build the suggester the first time to see suggestions
- Yes , if you add a copy field yo need to re-index to see it filled !

Cheers

2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

 This is my suggester configuration:

   searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
   str name=namesuggest/str
   str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str

 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
   str name=fieldtext/str  !-- the indexed field to derive
 suggestions from --
   float name=threshold0.005/float
   str name=buildOnCommittrue/str
 /lst
   /searchComponent
   requestHandler class=org.apache.solr.handler.component.SearchHandler
 name=/suggest
 lst name=defaults
str name=echoParamsexplicit/str
   str name=defTypeedismax/str
int name=rows10/int
str name=wtjson/str
str name=indenttrue/str
   str name=dftext/str

   str name=spellchecktrue/str
   str name=spellcheck.dictionarysuggest/str
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.count5/str
   str name=spellcheck.collatetrue/str
 /lst
 arr name=components
   strsuggest/str
 /arr
   /requestHandler


 Yes, I've read the guide. I've found out that there is a need to do
 re-indexing if I'm creating a new copyField. It works when I used the
 copyField that's created before the indexing is done.

 As I'm using the spellcheck dictionary as my suggester, so does that mean I
 just need to build the spellcheck dictionary?


 Regards,
 Edwin


 On 3 June 2015 at 17:36, Alessandro Benedetti benedetti.ale...@gmail.com
 wrote:

  Can you share you suggester configurations ?
  Have you read the guide I linked ?
  Has the suggestion index/fst has been built ? ( you need to build the
  suggester)
 
  Cheers
 
  2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
 
   Thank you for your explanation.
  
   I'll not need to care where the suggestions are coming from. All the
   suggestions from different fields can be consolidate and display
  together.
  
   I've tried to put those field into a new Suggestion copy field, but no
   suggestion is shown when I set:
   str name=fieldSuggestion/str  !-- the indexed field to derive
   suggestions from --
  
   Is there a need to re-index the documents in order for this to work?
  
   Regards,
   Edwin
  
  
  
   On 2 June 2015 at 17:25, Alessandro Benedetti 
  benedetti.ale...@gmail.com
   wrote:
  
Hi Edwin,
I have worked extensively recently in Suggester and the blog I feel
 to
suggest is Erick's one.
It's really detailed and good for a beginner and expert as well. [1]
   
Apart that let's see you particular use case :
   
1) Do you want to be able to get also where the suggestions are
 coming
   from
?
e.g.
suggestion1 from field1
suggestion2 from field2 ?
In this case I would try with multiple dictionaries but I am not sure
   Solr
allows you to use them concurrently.
But can be a really nice extension to develop.
   
2) If you don't care where the suggestions are coming from, just use
 a
   copy
field, where you copy the content of the interesting fields.
The suggestions will come from the fields you have copied in the copy
field, without distinction.
   
Hope this helps you
   
Cheers
   
   
[1] http://lucidworks.com/blog/solr-suggester/
   
2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com
 :
   
 Hi,

 Does anyone knows if we can derive suggestions across multiple
  fields?

 I tried to set something like this in my field in suggest
searchComponents
 in solrconfig.xml, but nothing is returned. It only works when I
 set
  a
 single field, and not multiple field.

   searchComponent class=solr.SpellCheckComponent name=suggest
 lst name=spellchecker
   str name=namesuggest/str
   str
 name=classnameorg.apache.solr.spelling.suggest.Suggester/str
   str


   
  
 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
   str name=fieldContent, Summary/str  !-- the indexed field
 to
 derive suggestions from --
   float name=threshold0.005/float
   str name=buildOnCommittrue/str
 /lst
   /searchComponent

 I'm using solr 5.1.

 Regards,
 Edwin

   
   
   
--
--
   
Benedetti Alessandro

Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller
File a JIRA issue please. That OOM Exception is getting wrapped in a
RuntimeException it looks. Bug.

- Mark

On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch
wrote:

 Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available
 for Solr.

 I am seeing the following OOMs:
 ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
 org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.OutOfMemoryError: Java heap space
 WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
 org.eclipse.jetty.servlet.ServletHandler; Error for
 /solr/customer-1-de_CH_1/suggest_phrase
 java.lang.OutOfMemoryError: Java heap space

 The full commandline is
 /usr/local/java/bin/java -server -Xss256k -Xms16G
 -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
 -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
 -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc
 -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
 -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
 -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log
 -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
 -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr
 -Dlog4j.configuration=file:/opt/solr/log4j.properties
 -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983
 /opt/solr/logs OPTIONS=default,rewrite

 So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this
 does not seem to happen. What am I missing? Is it o to pull a heapdump
 from Solr before killing/rebooting in oom_solr.sh?

 Also I would like to know what query parameters were sent to
 /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he
 OOM ...


 --
- Mark
about.me/markrmiller


How to tell when Collector finishes collect loop?

2015-06-03 Thread adfel70
Hi guys, need your help (again):
I have a search handler which need to override solr's scoring. I chose to
implement it with RankQuery API, so when getTopDocsCollector() gets called
it instantiates my TopDocsCollector instance, and every dicId gets its own
score:

public class MyScorerrankQuet extends RankQuery {
...

@Override
public TopDocsCollector getTopDocsCollector(int i,
SolrIndexerSearcher.QueryCommand cmd, IndexSearcher searcher) {
...
return new MyCollector(...)
}
}

public class MyCollector  extends TopDocsCollector{
//Initialized in constrctor 
MyScorer scorer;

public MyCollector(){
scorer = new MyScorer();
scorer.start(); //the scorer's API needs to call 
start() before every
query and close() at the end of the query
}

@Override
public void collect(int id){
//1. get specific field from the doc using DocValues and 
calculate score
using my scorer
//2. add docId and score (ScoreDoc object) into PriorityQueue.
}
}

My problem is that I cant find a place to call scorer.close(), which need to
be executed when the query ends (after we calculated score for each docID).
I saw the DeligatingCollector has finish() method which is called after
collector is done, but I cannot extend both TopDocsCollector and
DeligatingCollector...





--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-tell-when-Collector-finishes-collect-loop-tp4209447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Mark Miller
We will have to a find a way to deal with this long term. Browsing the code
I can see a variety of places where problem exception handling has been
introduced since this all was fixed.

- Mark

On Wed, Jun 3, 2015 at 8:19 AM Mark Miller markrmil...@gmail.com wrote:

 File a JIRA issue please. That OOM Exception is getting wrapped in a
 RuntimeException it looks. Bug.

 - Mark


 On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch
 wrote:

 Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G available
 for Solr.

 I am seeing the following OOMs:
 ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
 org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.OutOfMemoryError: Java heap space
 WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
 org.eclipse.jetty.servlet.ServletHandler; Error for
 /solr/customer-1-de_CH_1/suggest_phrase
 java.lang.OutOfMemoryError: Java heap space

 The full commandline is
 /usr/local/java/bin/java -server -Xss256k -Xms16G
 -Xmx16G -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark
 -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly
 -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000
 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc
 -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
 -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
 -XX:+PrintGCApplicationStoppedTime -Xloggc:/opt/solr/logs/solr_gc.log
 -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC
 -Dsolr.solr.home=/opt/solr/data -Dsolr.install.dir=/usr/local/solr
 -Dlog4j.configuration=file:/opt/solr/log4j.properties
 -jar start.jar -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh
 8983 /opt/solr/logs OPTIONS=default,rewrite

 So I'd expect /usr/local/solr/bin/oom_solr.sh tob e triggered. But this
 does not seem to happen. What am I missing? Is it o to pull a heapdump
 from Solr before killing/rebooting in oom_solr.sh?

 Also I would like to know what query parameters were sent to
 /solr/customer-1-de_CH_1/suggest_phrase (which may be the reason fort he
 OOM ...


 --
 - Mark
 

Re: Solr Atomic Updates

2015-06-03 Thread Ксения Баталова
Hi!

Thanks for your quick reply.

The problem that all my index is consists of several parts (several cores)

and while updating I don't know in advance in which part updated id is
lying (in which core the document with specified id is lying).

For example, I have two cores (*Core1 *and *Core2*) and I want to
update the document with id *Id1 *and I don't know where this document
is lying.

So, I have to do two select-queries to my cores to know where it is.

And then generate update-query to necessary core.

What am I doing wrong?

I remind that I'm using SOLR 4.4.0.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Best regards,
Batalova Kseniya


What exactly is the problem? And why do you care about cores, per se -
other than to send the update to the core/collection you are trying to
update? You should specify the core/collection name in the URL.

You should also be using the Solr reference guide rather than the (old)
wiki:
https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


-- Jack Krupansky

On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
wrote:

 Hi!

 I'm using *SOLR 4.4.0* for searching in my project.
 Now I am facing a problem of atomic updates in multiple cores.
 From wiki:

 curl *http://localhost:8983/solr/update
 http://localhost:8983/solr/update *-H
 'Content-type:application/json' -d '
 [
  {
   *id*: *TestDoc1*,
   title : {set:test1},
   revision  : {inc:3},
   publisher : {add:TestPublisher}
  },
  {
   id: TestDoc2,
   publisher : {add:TestPublisher}
  }
 ]'

 As well as I understand, this means that the document, for example, with id
 *TestDoc1*, will be searched for updating *only in one core*.
 And if there is no any document with id *TestDoc1*, the document will be
 created.
 Can I somehow to specify the* list of cores* for searching and then
 updating necessary document with specific id?

 It's something like *shards *parameter in *select* query.
 From wiki:

 #now do a distributed search across both servers with your browser or curl
 curl '
 http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
 '

 Or is it planned in the future?

 Thanks in advance.

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

 Best regards,
 Batalova Kseniya



Could not find configName for collection client_active found:nul

2015-06-03 Thread David McReynolds
I’m helping someone with this but my zookeeper experience is limited (as in
none). They have purportedly followed the instruction from the wiki.



https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble





Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader
updateClusterState

INFO: Updating cloud state from ZooKeeper...

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController
createCollectionZkNode

INFO: Check for collection zkNode:client_active

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
updateState

INFO: Update state numShards=null message={

  operation:state,

  state:down,

  base_url:http://10.10.1.178:8983/solr;,

  core:client_active,

  roles:null,

  node_name:10.10.1.178:8983_solr,

  shard:null,

  collection:client_active,

  numShards:null,

  core_node_name:10.10.1.178:8983_solr_client_active}

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController
createCollectionZkNode

INFO: Creating collection in ZooKeeper:client_active

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
updateState

INFO: shard=shard1 is already registered

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName

INFO: Looking for collection configName

Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 1

Jun 02, 2015 2:40:37 PM
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process

INFO: LatchChildWatcher fired on path: /overseer/queue state: SyncConnected
type NodeChildrenChanged

Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader$2 process

INFO: A cluster state change: WatchedEvent state:SyncConnected
type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
(live nodes size: 1)

Jun 02, 2015 2:40:40 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 2

Jun 02, 2015 2:40:43 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 3

Jun 02, 2015 2:40:46 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 4

Jun 02, 2015 2:40:49 PM org.apache.solr.cloud.ZkController getConfName

INFO: Could not find collection configName - pausing for 3 seconds and
trying again - try: 5

Jun 02, 2015 2:40:52 PM org.apache.solr.cloud.ZkController getConfName

SEVERE: Could not find configName for collection client_active

Jun 02, 2015 2:40:52 PM org.apache.solr.core.CoreContainer recordAndThrow

SEVERE: Unable to create core: client_active

org.apache.solr.common.cloud.ZooKeeperException: Could not find configName
for collection client_active found:null

-- 
--
*Mi aerodeslizador está lleno de anguilas.*


Re: AW: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Shawn Heisey
On 6/3/2015 1:41 AM, Clemens Wyss DEV wrote:
 The oom script just kills Solr with the KILL signal (-9) and logs the kill.  
 I know. But my feeling is, that not even this happens, i.e. the script is 
 not being executed. At least I see no solr_oom_killer-$SOLR_PORT-$NOW.log 
 file ...
 
 Btw:
 Who re-starts solr after it's been killed?

I'm not sure what to think here.  I wonder if the Java you are using is
broken?

Restarting would most likely need to be handled by you.  You could put
Solr under the management of something that keeps it running, like
heartbeat, pacemaker, or the supervisor program that comes with qmail,
whose name I can never remember.

You could add something to send you an email every time the OOM script
is run, so you know it has been killed.  If you have sized the memory
appropriately, the OOM killer will never run.

 I don't know if a heap dump on OOM is compatible with the OOM script.
 If Java chooses to run the OOM script before the heap dump is done, the 
 process 
 will be killed before the heap finishes dumping.
 What if I did a jmap-call in the oom-script before killing the process?

You very likely could add that, but be aware of how much time anything
you add will take.  If your cloud is sufficiently redundant, then it
probably won't matter if one node is down for several minutes.

Thanks,
Shawn



Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not triggered

2015-06-03 Thread Erick Erickson
bq: what exactly should I file? What needs to be added/appended to the issue?

Just what Mark said, title it something like
OOM exception wrapped in runtime exception

Include your original post and that you were asked to open the JIRA
after discussion on the user's list. Don't worry too much, the title 
etc. can be changed after as things become clearer.

Best,
Erick

On Wed, Jun 3, 2015 at 5:58 AM, Clemens Wyss DEV clemens...@mysign.ch wrote:
 Hi Mark,
 what exactly should I file? What needs to be added/appended to the issue?

 Regards
 Clemens

 -Ursprüngliche Nachricht-
 Von: Mark Miller [mailto:markrmil...@gmail.com]
 Gesendet: Mittwoch, 3. Juni 2015 14:23
 An: solr-user@lucene.apache.org
 Betreff: Re: Solr OutOfMemory but no heap and dump and oo_solr.sh is not 
 triggered

 We will have to a find a way to deal with this long term. Browsing the code I 
 can see a variety of places where problem exception handling has been 
 introduced since this all was fixed.

 - Mark

 On Wed, Jun 3, 2015 at 8:19 AM Mark Miller markrmil...@gmail.com wrote:

 File a JIRA issue please. That OOM Exception is getting wrapped in a
 RuntimeException it looks. Bug.

 - Mark


 On Wed, Jun 3, 2015 at 2:20 AM Clemens Wyss DEV clemens...@mysign.ch
 wrote:

 Context: Lucene 5.1, Java 8 on debian. 24G of RAM whereof 16G
 available for Solr.

 I am seeing the following OOMs:
 ERROR - 2015-06-03 05:17:13.317; [   customer-1-de_CH_1]
 org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
 at
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
 at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
 at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
 at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
 at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
 at org.eclipse.jetty.server.Server.handle(Server.java:368)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
 at
 org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
 at
 org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
 at
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
 at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
 at
 org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.OutOfMemoryError: Java heap space
 WARN  - 2015-06-03 05:17:13.319; [   customer-1-de_CH_1]
 org.eclipse.jetty.servlet.ServletHandler; Error for
 /solr/customer-1-de_CH_1/suggest_phrase
 java.lang.OutOfMemoryError: Java heap space

 The full commandline is
 /usr/local/java/bin/java -server -Xss256k -Xms16G -Xmx16G
 -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90
 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
 -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4
 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m
 -XX:+UseCMSInitiatingOccupancyOnly
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:CMSMaxAbortablePrecleanTime=6000
 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc
 -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
 

Re: Could not find configName for collection client_active found:nul

2015-06-03 Thread Erick Erickson
It's not entirely clear what you're trying to do when this is pushed
out, but I'm guessing it's create a collection. If that's so, then
this is your problem:

Could not find configName for collection client_active

You've set up Zookeeper correctly. But _before_ you create a
collection, you have to upload a configset to Zookeeper. This is
actually just a Solr conf directory, where thngs like schema.xml,
solrconfig.xml and all that live.

If you use the startup scripts with '-c -z zkaddress -e cloud'
options, you'll be guided through this process. Otherwise, you'll need
to push a configuration up to Zookeeper with the command line options,
see: https://cwiki.apache.org/confluence/display/solr/Command+Line+Utilities

Note that when creating a collection, if you _don't_ specify
collection.configName when you create a collection, Solr will assume
that there is a configset with the same name as your collection.

But just to check that your ZK is set up, take a look at the Solr
admin UIcloud. If you see things like livenodes that shows Solr
(expand the triangle), then Zookeeper is running just fine and Solr
can talk to it.

Best,
Erick

On Wed, Jun 3, 2015 at 5:36 AM, David McReynolds
david.mcreyno...@gmail.com wrote:
 I’m helping someone with this but my zookeeper experience is limited (as in
 none). They have purportedly followed the instruction from the wiki.



 https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble





 Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader
 updateClusterState

 INFO: Updating cloud state from ZooKeeper...

 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController
 createCollectionZkNode

 INFO: Check for collection zkNode:client_active

 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
 updateState

 INFO: Update state numShards=null message={

   operation:state,

   state:down,

   base_url:http://10.10.1.178:8983/solr;,

   core:client_active,

   roles:null,

   node_name:10.10.1.178:8983_solr,

   shard:null,

   collection:client_active,

   numShards:null,

   core_node_name:10.10.1.178:8983_solr_client_active}

 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController
 createCollectionZkNode

 INFO: Creating collection in ZooKeeper:client_active

 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.Overseer$ClusterStateUpdater
 updateState

 INFO: shard=shard1 is already registered

 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName

 INFO: Looking for collection configName

 Jun 02, 2015 2:40:37 PM org.apache.solr.cloud.ZkController getConfName

 INFO: Could not find collection configName - pausing for 3 seconds and
 trying again - try: 1

 Jun 02, 2015 2:40:37 PM
 org.apache.solr.cloud.DistributedQueue$LatchChildWatcher process

 INFO: LatchChildWatcher fired on path: /overseer/queue state: SyncConnected
 type NodeChildrenChanged

 Jun 02, 2015 2:40:37 PM org.apache.solr.common.cloud.ZkStateReader$2 process

 INFO: A cluster state change: WatchedEvent state:SyncConnected
 type:NodeDataChanged path:/clusterstate.json, has occurred - updating...
 (live nodes size: 1)

 Jun 02, 2015 2:40:40 PM org.apache.solr.cloud.ZkController getConfName

 INFO: Could not find collection configName - pausing for 3 seconds and
 trying again - try: 2

 Jun 02, 2015 2:40:43 PM org.apache.solr.cloud.ZkController getConfName

 INFO: Could not find collection configName - pausing for 3 seconds and
 trying again - try: 3

 Jun 02, 2015 2:40:46 PM org.apache.solr.cloud.ZkController getConfName

 INFO: Could not find collection configName - pausing for 3 seconds and
 trying again - try: 4

 Jun 02, 2015 2:40:49 PM org.apache.solr.cloud.ZkController getConfName

 INFO: Could not find collection configName - pausing for 3 seconds and
 trying again - try: 5

 Jun 02, 2015 2:40:52 PM org.apache.solr.cloud.ZkController getConfName

 SEVERE: Could not find configName for collection client_active

 Jun 02, 2015 2:40:52 PM org.apache.solr.core.CoreContainer recordAndThrow

 SEVERE: Unable to create core: client_active

 org.apache.solr.common.cloud.ZooKeeperException: Could not find configName
 for collection client_active found:null

 --
 --
 *Mi aerodeslizador está lleno de anguilas.*


Re: Derive suggestions across multiple fields

2015-06-03 Thread Zheng Lin Edwin Yeo
Thank you for your suggestions.
Will try that out and update on the results again.

Regards,
Edwin


On 3 June 2015 at 21:13, Alessandro Benedetti benedetti.ale...@gmail.com
wrote:

 I can see a lot of confusion in the configuration!

 Few suggestions :
 - read carefully the document and try to apply the suggesting guidance
 - currently there is no need to use spellcheck for suggestions, now they
 are separated things
 - i see text used to derive suggestions, I would prefer there to see the
 copy field specifically used to contain the interesting fields
 - Yes you need to build the suggester the first time to see suggestions
 - Yes , if you add a copy field yo need to re-index to see it filled !

 Cheers

 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  This is my suggester configuration:
 
searchComponent class=solr.SpellCheckComponent name=suggest
  lst name=spellchecker
str name=namesuggest/str
str
  name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
 
 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
str name=fieldtext/str  !-- the indexed field to derive
  suggestions from --
float name=threshold0.005/float
str name=buildOnCommittrue/str
  /lst
/searchComponent
requestHandler class=org.apache.solr.handler.component.SearchHandler
  name=/suggest
  lst name=defaults
 str name=echoParamsexplicit/str
str name=defTypeedismax/str
 int name=rows10/int
 str name=wtjson/str
 str name=indenttrue/str
str name=dftext/str
 
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatetrue/str
  /lst
  arr name=components
strsuggest/str
  /arr
/requestHandler
 
 
  Yes, I've read the guide. I've found out that there is a need to do
  re-indexing if I'm creating a new copyField. It works when I used the
  copyField that's created before the indexing is done.
 
  As I'm using the spellcheck dictionary as my suggester, so does that
 mean I
  just need to build the spellcheck dictionary?
 
 
  Regards,
  Edwin
 
 
  On 3 June 2015 at 17:36, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  wrote:
 
   Can you share you suggester configurations ?
   Have you read the guide I linked ?
   Has the suggestion index/fst has been built ? ( you need to build the
   suggester)
  
   Cheers
  
   2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
Thank you for your explanation.
   
I'll not need to care where the suggestions are coming from. All the
suggestions from different fields can be consolidate and display
   together.
   
I've tried to put those field into a new Suggestion copy field, but
 no
suggestion is shown when I set:
str name=fieldSuggestion/str  !-- the indexed field to derive
suggestions from --
   
Is there a need to re-index the documents in order for this to work?
   
Regards,
Edwin
   
   
   
On 2 June 2015 at 17:25, Alessandro Benedetti 
   benedetti.ale...@gmail.com
wrote:
   
 Hi Edwin,
 I have worked extensively recently in Suggester and the blog I feel
  to
 suggest is Erick's one.
 It's really detailed and good for a beginner and expert as well.
 [1]

 Apart that let's see you particular use case :

 1) Do you want to be able to get also where the suggestions are
  coming
from
 ?
 e.g.
 suggestion1 from field1
 suggestion2 from field2 ?
 In this case I would try with multiple dictionaries but I am not
 sure
Solr
 allows you to use them concurrently.
 But can be a really nice extension to develop.

 2) If you don't care where the suggestions are coming from, just
 use
  a
copy
 field, where you copy the content of the interesting fields.
 The suggestions will come from the fields you have copied in the
 copy
 field, without distinction.

 Hope this helps you

 Cheers


 [1] http://lucidworks.com/blog/solr-suggester/

 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
  :

  Hi,
 
  Does anyone knows if we can derive suggestions across multiple
   fields?
 
  I tried to set something like this in my field in suggest
 searchComponents
  in solrconfig.xml, but nothing is returned. It only works when I
  set
   a
  single field, and not multiple field.
 
searchComponent class=solr.SpellCheckComponent
 name=suggest
  lst name=spellchecker
str name=namesuggest/str
str
  name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
 
 

   
  
 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
str 

Re: Sorting in Solr

2015-06-03 Thread Chris Hostetter
: 
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-ThesortParameter
: 
: I think we may have an omission from the docs -- docValues can also be
: used for sorting, and may also offer a performance advantage.

I added a note about that.

-Hoss
http://www.lucidworks.com/


retrieving large number of docs

2015-06-03 Thread Robust Links
Hi

I have a set of document IDs from one core and i want to query another core
using the ids retrieved from the first core...the constraint is that the
size of doc ID set can be very large. I want to:

1) retrieve these docs from the 2nd index
2) facet on the results

I can think of 3 solutions:

1) boolean query
2) terms fq
3) use a DB rather than Solr

I am trying to keep latencies down so prefer to not use (3). The problem
with (1) is maxBooleanclauses is hardwired and I am not sure when I will
hit the exception. Option (2) seems to also hit limits.. so if I do

select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
f=id}LONG_LIST_OF_IDS

solr just goes blank. I have tried adding cost=200 to try to run the query
first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could
be a solution but the problem then is that the faceting results correspond
to the paged IDs and not the global set.

My filter cache spec is as follows

  filterCache class=solr.FastLRUCache
 size=100
 initialSize=100
 autowarmCount=10/


What would be the best way for me to solve this problem?

thank you


Re: How to identify field names from the suggested values in multiple fields

2015-06-03 Thread Walter Underwood
Configure two suggesters, one based on each field. Use both of them and you’ll 
get separate suggestions from each.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan dhan...@hifx.co.in wrote:

 Hi
 Anyone help me to  build a suggester auto complete based on multiple fields?
 There are two fields in my schema. Category and Subcategory and I'm trying
 to build  suggester based on these 2 fields. When the suggestions result,
 how can I distinguish from which filed it come from?
 
 I used a copyfields to combine multiple fields into single field and use
 that field in suggester
 But this will  return the combined result of category and subcategory. I
 can't differentiate the results that fetch from which field
 
 These are the copyfields for autocomplete
 copyField source=category dest=businessAutoComplete/
 copyField source=subcategory dest=businessAutoComplete/
 
 Suggestions should know from which field its from.
 For Eg my suggester returns 5 results for the keyword schools. In that
 result  2 from the category field and 3 from the subcategory field.
 
 Schools (Category)
 Primary Schools (Subcategory)
 Driving Schools (Subcategory)
 Day care and play school (Subcategory)
 Day Care/Play School (Category)
 
 
 Is there any way to build like this ??
 
 
 -- 
 [image: hifx_logo] http://hifx.in/
 *dhanesh s.R *
 Team Lead
 t: (+91) 484 4011750 (ext. 712) | m: ​(+91) 99 4  703
 e: dhan...@hifx.in | w: www.hifx.in
 https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets
 https://www.linkedin.com/company/2889649
 https://plus.google.com/104259935226993895226/about
 
 -- 
 
 --
 IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its 
 content are confidential to the intended recipient. If you are not the 
 intended recipient, be advised that you have received this e-mail in error 
 and that any use, dissemination, forwarding, printing or copying of this 
 e-mail is strictly prohibited. It may not be disclosed to or used by anyone 
 other than its intended recipient, nor may it be copied in any way. If 
 received in error, please email a reply to the sender, then delete it from 
 your system. 
 
 Although this e-mail has been scanned for viruses, HiFX cannot ultimately 
 accept any responsibility for viruses and it is your responsibility to scan 
 attachments (if any).
 
 ​
 Before you print this email or attachments, please consider the negative 
 environmental impacts associated with printing.



Re: Solr Atomic Updates

2015-06-03 Thread Ксения Баталова
Upayavira,

I'm using stand-alone Solr instances.

I've not learnt SolrCloud yet.

Please, give me some advice when SolrCloud is better then stand-alone
Solr instances.

Or when it is worth to choose SolrCloud.

_ _ _

Batalova Kseniya


If you are using stand-alone Solr instances, then it is your
responsibility to decide which node a document resides in, and thus to
which core you will send your update request.

If, however, you used SolrCloud, it would handle that for you - deciding
which node should contain a document, and directing the update their all
behind the scenes for you.

Upayavira

On Wed, Jun 3, 2015, at 08:15 AM, Ксения Баталова wrote:
 Hi!

 Thanks for your quick reply.

 The problem that all my index is consists of several parts (several
 cores)

 and while updating I don't know in advance in which part updated id is
 lying (in which core the document with specified id is lying).

 For example, I have two cores (*Core1 *and *Core2*) and I want to
 update the document with id *Id1 *and I don't know where this document
 is lying.

 So, I have to do two select-queries to my cores to know where it is.

 And then generate update-query to necessary core.

 What am I doing wrong?

 I remind that I'm using SOLR 4.4.0.

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 Best regards,
 Batalova Kseniya


 What exactly is the problem? And why do you care about cores, per se -
 other than to send the update to the core/collection you are trying to
 update? You should specify the core/collection name in the URL.

 You should also be using the Solr reference guide rather than the (old)
 wiki:
 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


 -- Jack Krupansky

 On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
 wrote:

  Hi!
 
  I'm using *SOLR 4.4.0* for searching in my project.
  Now I am facing a problem of atomic updates in multiple cores.
  From wiki:
 
  curl *http://localhost:8983/solr/update
  http://localhost:8983/solr/update *-H
  'Content-type:application/json' -d '
  [
   {
*id*: *TestDoc1*,
title : {set:test1},
revision  : {inc:3},
publisher : {add:TestPublisher}
   },
   {
id: TestDoc2,
publisher : {add:TestPublisher}
   }
  ]'
 
  As well as I understand, this means that the document, for example, with id
  *TestDoc1*, will be searched for updating *only in one core*.
  And if there is no any document with id *TestDoc1*, the document will be
  created.
  Can I somehow to specify the* list of cores* for searching and then
  updating necessary document with specific id?
 
  It's something like *shards *parameter in *select* query.
  From wiki:
 
  #now do a distributed search across both servers with your browser or curl
  curl '
  http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
  '
 
  Or is it planned in the future?
 
  Thanks in advance.
 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
  Best regards,
  Batalova Kseniya
 


Re: retrieving large number of docs

2015-06-03 Thread Robust Links
what would be a custom solution?


On Wed, Jun 3, 2015 at 1:58 PM, Joel Bernstein joels...@gmail.com wrote:

 You may have to do something custom to meet your needs.

 10,000 DocID's is not huge but you're latency requirement are pretty low.

 Are your DocID's by any chance integers? This can make custom PostFilters
 run much faster.

 You should also be aware of the Streaming API in Solr 5.1 which will give
 you fast Map/Reduce approaches (
 http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
 ).

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com
 wrote:

  Hey Joel
 
  see below
 
  On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com
 wrote:
 
   A few questions for you:
  
   How large can the list of filtering ID's be?
  
 
   10k
 
 
  
   What's your expectation on latency?
  
 
  10 latency 100
 
 
  
   What version of Solr are you using?
  
 
  5.0.0
 
 
  
   SolrCloud or not?
  
 
  not
 
 
 
  
   Joel Bernstein
   http://joelsolr.blogspot.com/
  
   On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com
   wrote:
  
Hi
   
I have a set of document IDs from one core and i want to query
 another
   core
using the ids retrieved from the first core...the constraint is that
  the
size of doc ID set can be very large. I want to:
   
1) retrieve these docs from the 2nd index
2) facet on the results
   
I can think of 3 solutions:
   
1) boolean query
2) terms fq
3) use a DB rather than Solr
   
I am trying to keep latencies down so prefer to not use (3). The
  problem
with (1) is maxBooleanclauses is hardwired and I am not sure when I
  will
hit the exception. Option (2) seems to also hit limits.. so if I do
   
select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
f=id}LONG_LIST_OF_IDS
   
solr just goes blank. I have tried adding cost=200 to try to run the
   query
first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
   could
be a solution but the problem then is that the faceting results
   correspond
to the paged IDs and not the global set.
   
My filter cache spec is as follows
   
  filterCache class=solr.FastLRUCache
 size=100
 initialSize=100
 autowarmCount=10/
   
   
What would be the best way for me to solve this problem?
   
thank you
   
  
 



Re: retrieving large number of docs

2015-06-03 Thread Joel Bernstein
Erick makes a great point, if they are in the same VM try the cross-core
join first. It might be fast enough for you.

A custom solution would be to build a custom query or post filter that
works with your specific scenario. For example if the docID's are integers
you could build a fast PostFilter using data structures best suited for
integer filters.

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 3, 2015 at 2:23 PM, Robust Links pey...@robustlinks.com wrote:

 what would be a custom solution?


 On Wed, Jun 3, 2015 at 1:58 PM, Joel Bernstein joels...@gmail.com wrote:

  You may have to do something custom to meet your needs.
 
  10,000 DocID's is not huge but you're latency requirement are pretty low.
 
  Are your DocID's by any chance integers? This can make custom PostFilters
  run much faster.
 
  You should also be aware of the Streaming API in Solr 5.1 which will give
  you fast Map/Reduce approaches (
 
 http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html
  ).
 
  Joel Bernstein
  http://joelsolr.blogspot.com/
 
  On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com
  wrote:
 
   Hey Joel
  
   see below
  
   On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com
  wrote:
  
A few questions for you:
   
How large can the list of filtering ID's be?
   
  
10k
  
  
   
What's your expectation on latency?
   
  
   10 latency 100
  
  
   
What version of Solr are you using?
   
  
   5.0.0
  
  
   
SolrCloud or not?
   
  
   not
  
  
  
   
Joel Bernstein
http://joelsolr.blogspot.com/
   
On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com
 
wrote:
   
 Hi

 I have a set of document IDs from one core and i want to query
  another
core
 using the ids retrieved from the first core...the constraint is
 that
   the
 size of doc ID set can be very large. I want to:

 1) retrieve these docs from the 2nd index
 2) facet on the results

 I can think of 3 solutions:

 1) boolean query
 2) terms fq
 3) use a DB rather than Solr

 I am trying to keep latencies down so prefer to not use (3). The
   problem
 with (1) is maxBooleanclauses is hardwired and I am not sure when I
   will
 hit the exception. Option (2) seems to also hit limits.. so if I do

 select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
 f=id}LONG_LIST_OF_IDS

 solr just goes blank. I have tried adding cost=200 to try to run
 the
query
 first fq={!terms f=id cost=200} but still no good. Paging on doc
 IDs
could
 be a solution but the problem then is that the faceting results
correspond
 to the paged IDs and not the global set.

 My filter cache spec is as follows

   filterCache class=solr.FastLRUCache
  size=100
  initialSize=100
  autowarmCount=10/


 What would be the best way for me to solve this problem?

 thank you

   
  
 



Re: Derive suggestions across multiple fields

2015-06-03 Thread Zheng Lin Edwin Yeo
My previous suggester configuration is derived from this page:
https://wiki.apache.org/solr/Suggester

Does it mean that what is written there is outdated?

Regards,
Edwin



On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Thank you for your suggestions.
 Will try that out and update on the results again.

 Regards,
 Edwin


 On 3 June 2015 at 21:13, Alessandro Benedetti benedetti.ale...@gmail.com
 wrote:

 I can see a lot of confusion in the configuration!

 Few suggestions :
 - read carefully the document and try to apply the suggesting guidance
 - currently there is no need to use spellcheck for suggestions, now they
 are separated things
 - i see text used to derive suggestions, I would prefer there to see the
 copy field specifically used to contain the interesting fields
 - Yes you need to build the suggester the first time to see suggestions
 - Yes , if you add a copy field yo need to re-index to see it filled !

 Cheers

 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  This is my suggester configuration:
 
searchComponent class=solr.SpellCheckComponent name=suggest
  lst name=spellchecker
str name=namesuggest/str
str
  name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
 
 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
str name=fieldtext/str  !-- the indexed field to derive
  suggestions from --
float name=threshold0.005/float
str name=buildOnCommittrue/str
  /lst
/searchComponent
requestHandler
 class=org.apache.solr.handler.component.SearchHandler
  name=/suggest
  lst name=defaults
 str name=echoParamsexplicit/str
str name=defTypeedismax/str
 int name=rows10/int
 str name=wtjson/str
 str name=indenttrue/str
str name=dftext/str
 
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatetrue/str
  /lst
  arr name=components
strsuggest/str
  /arr
/requestHandler
 
 
  Yes, I've read the guide. I've found out that there is a need to do
  re-indexing if I'm creating a new copyField. It works when I used the
  copyField that's created before the indexing is done.
 
  As I'm using the spellcheck dictionary as my suggester, so does that
 mean I
  just need to build the spellcheck dictionary?
 
 
  Regards,
  Edwin
 
 
  On 3 June 2015 at 17:36, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  wrote:
 
   Can you share you suggester configurations ?
   Have you read the guide I linked ?
   Has the suggestion index/fst has been built ? ( you need to build the
   suggester)
  
   Cheers
  
   2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
Thank you for your explanation.
   
I'll not need to care where the suggestions are coming from. All the
suggestions from different fields can be consolidate and display
   together.
   
I've tried to put those field into a new Suggestion copy field, but
 no
suggestion is shown when I set:
str name=fieldSuggestion/str  !-- the indexed field to derive
suggestions from --
   
Is there a need to re-index the documents in order for this to work?
   
Regards,
Edwin
   
   
   
On 2 June 2015 at 17:25, Alessandro Benedetti 
   benedetti.ale...@gmail.com
wrote:
   
 Hi Edwin,
 I have worked extensively recently in Suggester and the blog I
 feel
  to
 suggest is Erick's one.
 It's really detailed and good for a beginner and expert as well.
 [1]

 Apart that let's see you particular use case :

 1) Do you want to be able to get also where the suggestions are
  coming
from
 ?
 e.g.
 suggestion1 from field1
 suggestion2 from field2 ?
 In this case I would try with multiple dictionaries but I am not
 sure
Solr
 allows you to use them concurrently.
 But can be a really nice extension to develop.

 2) If you don't care where the suggestions are coming from, just
 use
  a
copy
 field, where you copy the content of the interesting fields.
 The suggestions will come from the fields you have copied in the
 copy
 field, without distinction.

 Hope this helps you

 Cheers


 [1] http://lucidworks.com/blog/solr-suggester/

 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
  :

  Hi,
 
  Does anyone knows if we can derive suggestions across multiple
   fields?
 
  I tried to set something like this in my field in suggest
 searchComponents
  in solrconfig.xml, but nothing is returned. It only works when I
  set
   a
  single field, and not multiple field.
 
searchComponent class=solr.SpellCheckComponent
 name=suggest
  lst name=spellchecker

How to identify field names from the suggested values in multiple fields

2015-06-03 Thread Dhanesh Radhakrishnan
Hi
Anyone help me to  build a suggester auto complete based on multiple fields?
There are two fields in my schema. Category and Subcategory and I'm trying
to build  suggester based on these 2 fields. When the suggestions result,
how can I distinguish from which filed it come from?

I used a copyfields to combine multiple fields into single field and use
that field in suggester
But this will  return the combined result of category and subcategory. I
can't differentiate the results that fetch from which field

These are the copyfields for autocomplete
copyField source=category dest=businessAutoComplete/
copyField source=subcategory dest=businessAutoComplete/

Suggestions should know from which field its from.
For Eg my suggester returns 5 results for the keyword schools. In that
result  2 from the category field and 3 from the subcategory field.

Schools (Category)
Primary Schools (Subcategory)
Driving Schools (Subcategory)
Day care and play school (Subcategory)
Day Care/Play School (Category)


Is there any way to build like this ??


-- 
 [image: hifx_logo] http://hifx.in/
*dhanesh s.R *
Team Lead
t: (+91) 484 4011750 (ext. 712) | m: ​(+91) 99 4  703
e: dhan...@hifx.in | w: www.hifx.in
https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets
https://www.linkedin.com/company/2889649
https://plus.google.com/104259935226993895226/about

-- 

--
IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its 
content are confidential to the intended recipient. If you are not the 
intended recipient, be advised that you have received this e-mail in error 
and that any use, dissemination, forwarding, printing or copying of this 
e-mail is strictly prohibited. It may not be disclosed to or used by anyone 
other than its intended recipient, nor may it be copied in any way. If 
received in error, please email a reply to the sender, then delete it from 
your system. 

Although this e-mail has been scanned for viruses, HiFX cannot ultimately 
accept any responsibility for viruses and it is your responsibility to scan 
attachments (if any).

​
Before you print this email or attachments, please consider the negative 
environmental impacts associated with printing.


Re: Derive suggestions across multiple fields

2015-06-03 Thread Erick Erickson
This may be helpful: http://lucidworks.com/blog/solr-suggester/

Note that there are a series of fixes in various versions of Solr,
particularly buildOnStartup=false and working on multivalued fields.

Best,
Erick

On Wed, Jun 3, 2015 at 8:04 PM, Zheng Lin Edwin Yeo
edwinye...@gmail.com wrote:
 My previous suggester configuration is derived from this page:
 https://wiki.apache.org/solr/Suggester

 Does it mean that what is written there is outdated?

 Regards,
 Edwin



 On 3 June 2015 at 23:44, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

 Thank you for your suggestions.
 Will try that out and update on the results again.

 Regards,
 Edwin


 On 3 June 2015 at 21:13, Alessandro Benedetti benedetti.ale...@gmail.com
 wrote:

 I can see a lot of confusion in the configuration!

 Few suggestions :
 - read carefully the document and try to apply the suggesting guidance
 - currently there is no need to use spellcheck for suggestions, now they
 are separated things
 - i see text used to derive suggestions, I would prefer there to see the
 copy field specifically used to contain the interesting fields
 - Yes you need to build the suggester the first time to see suggestions
 - Yes , if you add a copy field yo need to re-index to see it filled !

 Cheers

 2015-06-03 11:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:

  This is my suggester configuration:
 
searchComponent class=solr.SpellCheckComponent name=suggest
  lst name=spellchecker
str name=namesuggest/str
str
  name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
 
 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookupFactory/str
str name=fieldtext/str  !-- the indexed field to derive
  suggestions from --
float name=threshold0.005/float
str name=buildOnCommittrue/str
  /lst
/searchComponent
requestHandler
 class=org.apache.solr.handler.component.SearchHandler
  name=/suggest
  lst name=defaults
 str name=echoParamsexplicit/str
str name=defTypeedismax/str
 int name=rows10/int
 str name=wtjson/str
 str name=indenttrue/str
str name=dftext/str
 
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.collatetrue/str
  /lst
  arr name=components
strsuggest/str
  /arr
/requestHandler
 
 
  Yes, I've read the guide. I've found out that there is a need to do
  re-indexing if I'm creating a new copyField. It works when I used the
  copyField that's created before the indexing is done.
 
  As I'm using the spellcheck dictionary as my suggester, so does that
 mean I
  just need to build the spellcheck dictionary?
 
 
  Regards,
  Edwin
 
 
  On 3 June 2015 at 17:36, Alessandro Benedetti 
 benedetti.ale...@gmail.com
  wrote:
 
   Can you share you suggester configurations ?
   Have you read the guide I linked ?
   Has the suggestion index/fst has been built ? ( you need to build the
   suggester)
  
   Cheers
  
   2015-06-03 4:07 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com:
  
Thank you for your explanation.
   
I'll not need to care where the suggestions are coming from. All the
suggestions from different fields can be consolidate and display
   together.
   
I've tried to put those field into a new Suggestion copy field, but
 no
suggestion is shown when I set:
str name=fieldSuggestion/str  !-- the indexed field to derive
suggestions from --
   
Is there a need to re-index the documents in order for this to work?
   
Regards,
Edwin
   
   
   
On 2 June 2015 at 17:25, Alessandro Benedetti 
   benedetti.ale...@gmail.com
wrote:
   
 Hi Edwin,
 I have worked extensively recently in Suggester and the blog I
 feel
  to
 suggest is Erick's one.
 It's really detailed and good for a beginner and expert as well.
 [1]

 Apart that let's see you particular use case :

 1) Do you want to be able to get also where the suggestions are
  coming
from
 ?
 e.g.
 suggestion1 from field1
 suggestion2 from field2 ?
 In this case I would try with multiple dictionaries but I am not
 sure
Solr
 allows you to use them concurrently.
 But can be a really nice extension to develop.

 2) If you don't care where the suggestions are coming from, just
 use
  a
copy
 field, where you copy the content of the interesting fields.
 The suggestions will come from the fields you have copied in the
 copy
 field, without distinction.

 Hope this helps you

 Cheers


 [1] http://lucidworks.com/blog/solr-suggester/

 2015-06-02 4:22 GMT+01:00 Zheng Lin Edwin Yeo 
 edwinye...@gmail.com
  :

  Hi,
 
  Does anyone knows if we can derive suggestions across multiple
   fields?
 
  I tried to set 

Re: How to identify field names from the suggested values in multiple fields

2015-06-03 Thread Dhanesh Radhakrishnan
Thank you for the quick response.
If I use 2 suggesters, can I get the result in a single request?
http://192.17.80.99:8983/solr/core1/suggest?suggest=truesuggest.dictionary=mySuggesterwt=xmlsuggest.q=school
Is there any helping document to build multiple suggesters??


On Thu, Jun 4, 2015 at 10:40 AM, Walter Underwood wun...@wunderwood.org
wrote:

 Configure two suggesters, one based on each field. Use both of them and
 you’ll get separate suggestions from each.

 wunder
 Walter Underwood
 wun...@wunderwood.org
 http://observer.wunderwood.org/  (my blog)


 On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan dhan...@hifx.co.in
 wrote:

  Hi
  Anyone help me to  build a suggester auto complete based on multiple
 fields?
  There are two fields in my schema. Category and Subcategory and I'm
 trying
  to build  suggester based on these 2 fields. When the suggestions result,
  how can I distinguish from which filed it come from?
 
  I used a copyfields to combine multiple fields into single field and use
  that field in suggester
  But this will  return the combined result of category and subcategory. I
  can't differentiate the results that fetch from which field
 
  These are the copyfields for autocomplete
  copyField source=category dest=businessAutoComplete/
  copyField source=subcategory dest=businessAutoComplete/
 
  Suggestions should know from which field its from.
  For Eg my suggester returns 5 results for the keyword schools. In that
  result  2 from the category field and 3 from the subcategory field.
 
  Schools (Category)
  Primary Schools (Subcategory)
  Driving Schools (Subcategory)
  Day care and play school (Subcategory)
  Day Care/Play School (Category)
 
 
  Is there any way to build like this ??
 
 
  --
  [image: hifx_logo] http://hifx.in/
  *dhanesh s.R *
  Team Lead
  t: (+91) 484 4011750 (ext. 712) | m: ​(+91) 99 4  703
  e: dhan...@hifx.in | w: www.hifx.in
  https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets
  https://www.linkedin.com/company/2889649
  https://plus.google.com/104259935226993895226/about
 
  --
 
  --
  IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its
  content are confidential to the intended recipient. If you are not the
  intended recipient, be advised that you have received this e-mail in
 error
  and that any use, dissemination, forwarding, printing or copying of this
  e-mail is strictly prohibited. It may not be disclosed to or used by
 anyone
  other than its intended recipient, nor may it be copied in any way. If
  received in error, please email a reply to the sender, then delete it
 from
  your system.
 
  Although this e-mail has been scanned for viruses, HiFX cannot ultimately
  accept any responsibility for viruses and it is your responsibility to
 scan
  attachments (if any).
 
  ​
  Before you print this email or attachments, please consider the negative
  environmental impacts associated with printing.




-- 
 [image: hifx_logo] http://hifx.in/
*dhanesh s.R *
Team Lead
t: (+91) 484 4011750 (ext. 712) | m: ​(+91) 99 4  703
e: dhan...@hifx.in | w: www.hifx.in
https://www.facebook.com/HiFXIT https://twitter.com/HiFXTweets
https://www.linkedin.com/company/2889649
https://plus.google.com/104259935226993895226/about

-- 

--
IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its 
content are confidential to the intended recipient. If you are not the 
intended recipient, be advised that you have received this e-mail in error 
and that any use, dissemination, forwarding, printing or copying of this 
e-mail is strictly prohibited. It may not be disclosed to or used by anyone 
other than its intended recipient, nor may it be copied in any way. If 
received in error, please email a reply to the sender, then delete it from 
your system. 

Although this e-mail has been scanned for viruses, HiFX cannot ultimately 
accept any responsibility for viruses and it is your responsibility to scan 
attachments (if any).

​
Before you print this email or attachments, please consider the negative 
environmental impacts associated with printing.


Re: retrieving large number of docs

2015-06-03 Thread Erick Erickson
Are these indexes on different machines? Because if they're in the
same JVM, you might be able to use cross-core joins. Be aware, though,
that joining on high-cardinality fields (which, by definition, docID
probably is) is where pseudo joins perform worst.

Have you considered flattening the data and including whatever
information you have in your from index in your main index? Because
 100ms response is probably not going to be tough if you have to have
two indexes/cores.

Best,
Erick

On Wed, Jun 3, 2015 at 10:58 AM, Joel Bernstein joels...@gmail.com wrote:
 You may have to do something custom to meet your needs.

 10,000 DocID's is not huge but you're latency requirement are pretty low.

 Are your DocID's by any chance integers? This can make custom PostFilters
 run much faster.

 You should also be aware of the Streaming API in Solr 5.1 which will give
 you fast Map/Reduce approaches (
 http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html).

 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote:

 Hey Joel

 see below

 On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote:

  A few questions for you:
 
  How large can the list of filtering ID's be?
 

  10k


 
  What's your expectation on latency?
 

 10 latency 100


 
  What version of Solr are you using?
 

 5.0.0


 
  SolrCloud or not?
 

 not



 
  Joel Bernstein
  http://joelsolr.blogspot.com/
 
  On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com
  wrote:
 
   Hi
  
   I have a set of document IDs from one core and i want to query another
  core
   using the ids retrieved from the first core...the constraint is that
 the
   size of doc ID set can be very large. I want to:
  
   1) retrieve these docs from the 2nd index
   2) facet on the results
  
   I can think of 3 solutions:
  
   1) boolean query
   2) terms fq
   3) use a DB rather than Solr
  
   I am trying to keep latencies down so prefer to not use (3). The
 problem
   with (1) is maxBooleanclauses is hardwired and I am not sure when I
 will
   hit the exception. Option (2) seems to also hit limits.. so if I do
  
   select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
   f=id}LONG_LIST_OF_IDS
  
   solr just goes blank. I have tried adding cost=200 to try to run the
  query
   first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
  could
   be a solution but the problem then is that the faceting results
  correspond
   to the paged IDs and not the global set.
  
   My filter cache spec is as follows
  
 filterCache class=solr.FastLRUCache
size=100
initialSize=100
autowarmCount=10/
  
  
   What would be the best way for me to solve this problem?
  
   thank you
  
 



Re: Solr Atomic Updates

2015-06-03 Thread Erick Erickson
I have to ask then why you're not using SolrCloud with multiple shards? It
seems to me that that gives you the indexing throughput you need (be sure to
use CloudSolrServer from your client). At 300M complex documents, you
pretty much certainly will need to shard anyway so in some sense you're
re-inventing the wheel here.

You can host multiple shards on the same machine, and these _are_ separate
Solr cores under the covers so you problem with atomic updates disappears.

Although I would consider upgrading to Solr 4.10.3 or even 5.2 (which is being
voted on even now and should be out in a week or so barring problems).

Best,
Erick

On Wed, Jun 3, 2015 at 11:04 AM, Ксения Баталова batalova...@gmail.com wrote:
 Jack,

 Decision of using several cores was made to increase indexing and
 searching performance (experimentally).

 In my project index is about 300-500 millions documents (each document
 has rather difficult structure) and it may be larger.

 So, while indexing the documents are being added in different cores by
 some amount of threads.

 In other words, each thread collect nessesary information for list of
 documents and generate create-documents query to specific core.

 At this moment it doesn't matter (and it can't be found out) which
 document in which core will be.

 And now there is necessary to update (atomic update) this index.

 Something like this..

 _ _

 Batalova Kseniya


 Explain a little about why you have separate cores, and how you decide
 which core a new document should reside in. Your scenario still seems a bit
 odd, so help us understand.


 -- Jack Krupansky

 On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова batalova...@gmail.com
 wrote:

 Hi!

 Thanks for your quick reply.

 The problem that all my index is consists of several parts (several cores)

 and while updating I don't know in advance in which part updated id is
 lying (in which core the document with specified id is lying).

 For example, I have two cores (*Core1 *and *Core2*) and I want to
 update the document with id *Id1 *and I don't know where this document
 is lying.

 So, I have to do two select-queries to my cores to know where it is.

 And then generate update-query to necessary core.

 What am I doing wrong?

 I remind that I'm using SOLR 4.4.0.

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 Best regards,
 Batalova Kseniya


 What exactly is the problem? And why do you care about cores, per se -
 other than to send the update to the core/collection you are trying to
 update? You should specify the core/collection name in the URL.

 You should also be using the Solr reference guide rather than the (old)
 wiki:

 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


 -- Jack Krupansky

 On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
 wrote:

  Hi!
 
  I'm using *SOLR 4.4.0* for searching in my project.
  Now I am facing a problem of atomic updates in multiple cores.
  From wiki:
 
  curl *http://localhost:8983/solr/update
  http://localhost:8983/solr/update *-H
  'Content-type:application/json' -d '
  [
   {
*id*: *TestDoc1*,
title : {set:test1},
revision  : {inc:3},
publisher : {add:TestPublisher}
   },
   {
id: TestDoc2,
publisher : {add:TestPublisher}
   }
  ]'
 
  As well as I understand, this means that the document, for example, with
 id
  *TestDoc1*, will be searched for updating *only in one core*.
  And if there is no any document with id *TestDoc1*, the document will be
  created.
  Can I somehow to specify the* list of cores* for searching and then
  updating necessary document with specific id?
 
  It's something like *shards *parameter in *select* query.
  From wiki:
 
  #now do a distributed search across both servers with your browser or
 curl
  curl '
 
 http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
  '
 
  Or is it planned in the future?
 
  Thanks in advance.
 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
  Best regards,
  Batalova Kseniya
 



Re: retrieving large number of docs

2015-06-03 Thread Joel Bernstein
A few questions for you:

How large can the list of filtering ID's be?

What's your expectation on latency?

What version of Solr are you using?

SolrCloud or not?

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com wrote:

 Hi

 I have a set of document IDs from one core and i want to query another core
 using the ids retrieved from the first core...the constraint is that the
 size of doc ID set can be very large. I want to:

 1) retrieve these docs from the 2nd index
 2) facet on the results

 I can think of 3 solutions:

 1) boolean query
 2) terms fq
 3) use a DB rather than Solr

 I am trying to keep latencies down so prefer to not use (3). The problem
 with (1) is maxBooleanclauses is hardwired and I am not sure when I will
 hit the exception. Option (2) seems to also hit limits.. so if I do

 select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
 f=id}LONG_LIST_OF_IDS

 solr just goes blank. I have tried adding cost=200 to try to run the query
 first fq={!terms f=id cost=200} but still no good. Paging on doc IDs could
 be a solution but the problem then is that the faceting results correspond
 to the paged IDs and not the global set.

 My filter cache spec is as follows

   filterCache class=solr.FastLRUCache
  size=100
  initialSize=100
  autowarmCount=10/


 What would be the best way for me to solve this problem?

 thank you



Re: retrieving large number of docs

2015-06-03 Thread Robust Links
Hey Joel

see below

On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote:

 A few questions for you:

 How large can the list of filtering ID's be?


 10k



 What's your expectation on latency?


10 latency 100



 What version of Solr are you using?


5.0.0



 SolrCloud or not?


not




 Joel Bernstein
 http://joelsolr.blogspot.com/

 On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com
 wrote:

  Hi
 
  I have a set of document IDs from one core and i want to query another
 core
  using the ids retrieved from the first core...the constraint is that the
  size of doc ID set can be very large. I want to:
 
  1) retrieve these docs from the 2nd index
  2) facet on the results
 
  I can think of 3 solutions:
 
  1) boolean query
  2) terms fq
  3) use a DB rather than Solr
 
  I am trying to keep latencies down so prefer to not use (3). The problem
  with (1) is maxBooleanclauses is hardwired and I am not sure when I will
  hit the exception. Option (2) seems to also hit limits.. so if I do
 
  select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
  f=id}LONG_LIST_OF_IDS
 
  solr just goes blank. I have tried adding cost=200 to try to run the
 query
  first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
 could
  be a solution but the problem then is that the faceting results
 correspond
  to the paged IDs and not the global set.
 
  My filter cache spec is as follows
 
filterCache class=solr.FastLRUCache
   size=100
   initialSize=100
   autowarmCount=10/
 
 
  What would be the best way for me to solve this problem?
 
  thank you
 



Re: retrieving large number of docs

2015-06-03 Thread Joel Bernstein
You may have to do something custom to meet your needs.

10,000 DocID's is not huge but you're latency requirement are pretty low.

Are your DocID's by any chance integers? This can make custom PostFilters
run much faster.

You should also be aware of the Streaming API in Solr 5.1 which will give
you fast Map/Reduce approaches (
http://joelsolr.blogspot.com/2015/04/the-streaming-api-solrjio-basics.html).

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jun 3, 2015 at 1:46 PM, Robust Links pey...@robustlinks.com wrote:

 Hey Joel

 see below

 On Wed, Jun 3, 2015 at 1:43 PM, Joel Bernstein joels...@gmail.com wrote:

  A few questions for you:
 
  How large can the list of filtering ID's be?
 

  10k


 
  What's your expectation on latency?
 

 10 latency 100


 
  What version of Solr are you using?
 

 5.0.0


 
  SolrCloud or not?
 

 not



 
  Joel Bernstein
  http://joelsolr.blogspot.com/
 
  On Wed, Jun 3, 2015 at 1:23 PM, Robust Links pey...@robustlinks.com
  wrote:
 
   Hi
  
   I have a set of document IDs from one core and i want to query another
  core
   using the ids retrieved from the first core...the constraint is that
 the
   size of doc ID set can be very large. I want to:
  
   1) retrieve these docs from the 2nd index
   2) facet on the results
  
   I can think of 3 solutions:
  
   1) boolean query
   2) terms fq
   3) use a DB rather than Solr
  
   I am trying to keep latencies down so prefer to not use (3). The
 problem
   with (1) is maxBooleanclauses is hardwired and I am not sure when I
 will
   hit the exception. Option (2) seems to also hit limits.. so if I do
  
   select?fl=*q=*:*facet=truefacet.field=titlefq={!terms
   f=id}LONG_LIST_OF_IDS
  
   solr just goes blank. I have tried adding cost=200 to try to run the
  query
   first fq={!terms f=id cost=200} but still no good. Paging on doc IDs
  could
   be a solution but the problem then is that the faceting results
  correspond
   to the paged IDs and not the global set.
  
   My filter cache spec is as follows
  
 filterCache class=solr.FastLRUCache
size=100
initialSize=100
autowarmCount=10/
  
  
   What would be the best way for me to solve this problem?
  
   thank you
  
 



Re: Solr Atomic Updates

2015-06-03 Thread Ксения Баталова
Jack,

Decision of using several cores was made to increase indexing and
searching performance (experimentally).

In my project index is about 300-500 millions documents (each document
has rather difficult structure) and it may be larger.

So, while indexing the documents are being added in different cores by
some amount of threads.

In other words, each thread collect nessesary information for list of
documents and generate create-documents query to specific core.

At this moment it doesn't matter (and it can't be found out) which
document in which core will be.

And now there is necessary to update (atomic update) this index.

Something like this..

_ _

Batalova Kseniya


Explain a little about why you have separate cores, and how you decide
which core a new document should reside in. Your scenario still seems a bit
odd, so help us understand.


-- Jack Krupansky

On Wed, Jun 3, 2015 at 3:15 AM, Ксения Баталова batalova...@gmail.com
wrote:

 Hi!

 Thanks for your quick reply.

 The problem that all my index is consists of several parts (several cores)

 and while updating I don't know in advance in which part updated id is
 lying (in which core the document with specified id is lying).

 For example, I have two cores (*Core1 *and *Core2*) and I want to
 update the document with id *Id1 *and I don't know where this document
 is lying.

 So, I have to do two select-queries to my cores to know where it is.

 And then generate update-query to necessary core.

 What am I doing wrong?

 I remind that I'm using SOLR 4.4.0.

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 Best regards,
 Batalova Kseniya


 What exactly is the problem? And why do you care about cores, per se -
 other than to send the update to the core/collection you are trying to
 update? You should specify the core/collection name in the URL.

 You should also be using the Solr reference guide rather than the (old)
 wiki:

 https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents


 -- Jack Krupansky

 On Tue, Jun 2, 2015 at 10:15 AM, Ксения Баталова batalova...@gmail.com
 wrote:

  Hi!
 
  I'm using *SOLR 4.4.0* for searching in my project.
  Now I am facing a problem of atomic updates in multiple cores.
  From wiki:
 
  curl *http://localhost:8983/solr/update
  http://localhost:8983/solr/update *-H
  'Content-type:application/json' -d '
  [
   {
*id*: *TestDoc1*,
title : {set:test1},
revision  : {inc:3},
publisher : {add:TestPublisher}
   },
   {
id: TestDoc2,
publisher : {add:TestPublisher}
   }
  ]'
 
  As well as I understand, this means that the document, for example, with
 id
  *TestDoc1*, will be searched for updating *only in one core*.
  And if there is no any document with id *TestDoc1*, the document will be
  created.
  Can I somehow to specify the* list of cores* for searching and then
  updating necessary document with specific id?
 
  It's something like *shards *parameter in *select* query.
  From wiki:
 
  #now do a distributed search across both servers with your browser or
 curl
  curl '
 
 http://localhost:8983/solr/*select*?*shards*=localhost:8983/solr,localhost:7574/solrindent=trueq=ipod+solr
  '
 
  Or is it planned in the future?
 
  Thanks in advance.
 
  _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
 
  Best regards,
  Batalova Kseniya