Re: Replication and soft commits for NRT searches

2015-10-13 Thread MOIS Martin (MORPHO)
Hello,

thank you for the detailed answer.

If a timeout between shard leader and replica can lead to a smaller rf value 
(because replication has timed out), is it possible to increase this timeout in 
the configuration?

Best Regards,
Martin Mois

Comments inline:

On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
> created with
replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am 
using autoCommit/maxDocs=1
and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the 
> documentation
(https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
 I
cannot enforce that an update gets replicated to all replicas, but I can only 
get the achieved
replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the 
> replica has
written the update to its transaction log? Or has the replica also performed 
the soft commit
as configured with autoSoftCommits/maxDocs=1? The answer is important for me, 
as if the update
would only get written to the transaction log, I could not search for it 
reliable, as the
replica may not have added it to the searchable index.

rf=2 means that the update was successfully replicated to and
acknowledged by two replicas (including the leader). The rf only deals
with the durability of the update and has no relation to visibility of
the update to searchers. The auto(soft)commit settings are applied
asynchronously and do not block an update request.

>
> My second question is, does rf=1 mean that the update was definitely not 
> successful on
the replica or could it also represent a timeout of the replication request 
from the shard
leader? If it could also represent a timeout, then there would be a small 
chance that the
replication was successfully despite of the timeout.

Well, rf=1 implies that the update was only applied on the leader's
index + tlog and either replicas weren't available or returned an
error or the request timed out. So yes, you are right that it can
represent a timeout and as such there is a chance that the replication
was indeed successful despite of the timeout.

>
> Is there a way to retrieve the replication factor for a specific document 
> after the update
in order to check if replication was successful in the meantime?
>

No, there is no way to do that.

> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information.
If you are not the intended recipient, you are notified that any dissemination, 
copying of
this e-mail and any attachments thereto or use of their contents by any means 
whatsoever is
strictly prohibited. If you have received this e-mail in error, please advise 
the sender immediately
and delete this e-mail and all attached documents from your computer system."
> #



--
Regards,
Shalin Shekhar Mangar.

#
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#


Re: Request for Wiki edit right

2015-10-13 Thread Arcadius Ahouansou
Thank you very much Erick.

Arcadius.

On 13 October 2015 at 22:04, Erick Erickson  wrote:

> Just added you to the Solr Wiki contributors group, if you need to
> access the Lucene Wiki let us know.
>
> Best,
> Erick
>
> On Tue, Oct 13, 2015 at 1:57 PM, Arcadius Ahouansou
>  wrote:
> > Hello Erick.
> > Thank you for the detailed info.
> > My username is arcadius.
> >
> > Thanks.
> >
> >
> > On 13 October 2015 at 16:58, Erick Erickson 
> wrote:
> >
> >> Create a user on the Wiki (anyone can), then tell us the user name
> >> you've created and we'll add you to the auth lists. There are separate
> >> lists for Solr and Lucene. We had to lock these down because we were
> >> getting a lot of spam pages created.
> >>
> >> The reference guide (CWiki) is restricted to committers though.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Oct 13, 2015 at 6:30 AM, Arcadius Ahouansou
> >>  wrote:
> >> > Hello.
> >> >
> >> > Please, can I have the right to edit the Wiki?
> >> >
> >> > Thanks.
> >> >
> >> > Arcadius.
> >>
> >
> >
> >
> > --
> > Arcadius Ahouansou
> > Menelic Ltd | Information is Power
> > M: 07908761999
> > W: www.menelic.com
> > ---
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Re: AutoComplete Feature in Solr

2015-10-13 Thread William Bell
We want to use suggester but also want to show those results closest to my
lat,long... Kinda combine suggester and bq=geodist()

On Mon, Oct 12, 2015 at 2:24 PM, Salman Ansari 
wrote:

> Hi,
>
> I have been trying to get the autocomplete feature in Solr working with no
> luck up to now. First I read that "suggest component" is the recommended
> way as in the below article (and this is the exact functionality I am
> looking for, which is to autocomplete multiple words)
>
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
>
> Then I tried implementing suggest as described in the following articles in
> this order
> 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
> 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
> implemented suggesting phrases)
> 3)
>
> http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms
>
> With no luck, after implementing each article when I run my query as
> http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack
>
>
>
> I get
> 
> 
> 0
> 0
> 
> 
>
>  Although I have an entry for Barack Obama in my index. I am posting my
> Solr configuration as well
>
> 
>  
>   suggest
>   org.apache.solr.spelling.suggest.Suggester
>name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
>   entity_autocomplete
> true
>  
> 
>
>   class="org.apache.solr.handler.component.SearchHandler">
>  
>   true
>   suggest
>   10
> true
> false
>  
>  
>   suggest
>  
> 
>
> It looks like a very simple job, but even after following so many articles,
> I could not get it right. Any comment will be appreciated!
>
> Regards,
> Salman
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Indexing Solr in production

2015-10-13 Thread Zheng Lin Edwin Yeo
Thank you Alessandro and Erick.

Will try out the SolrJ methond.

Regards,
Edwin


On 14 October 2015 at 00:00, Erick Erickson  wrote:

> Here's a sample:
> https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/
>
> On Tue, Oct 13, 2015 at 4:18 AM, Alessandro Benedetti
>  wrote:
> > The most robust and simple way to go is building your own Indexer.
> > You can decide the platform you want, Solr has plenty of client API
> > libraries.
> >
> > For example if you want to write your Indexer app in Java, you can use
> > SolrJ..
> > Each client library will give you all the flexibility you need to index
> > solr in a robust way.
> >
> > [1] https://cwiki.apache.org/confluence/display/solr/Client+APIs
> > Cheers
> >
> > On 13 October 2015 at 09:35, Zheng Lin Edwin Yeo 
> > wrote:
> >
> >> Hi,
> >>
> >> What is the best practice to do indexing in Solr for production
> system.I'm
> >> using Solr 5.3.0.
> >>
> >> I understand that post.jar does not have things like robustness checks
> and
> >> retires, which is important in production, as sometimes certain records
> >> might failed during the indexing, and we need to re-try the indexing for
> >> those records that fails.
> >>
> >> Normally, do we need to write a new custom handler in order to achieve
> all
> >> these?
> >> Want to find out what most people did before I decide on a method and
> >> proceed on to the next step.
> >>
> >> Thank you.
> >>
> >> Regards,
> >> Edwin
> >>
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>


Re: solr cloud recovery and search

2015-10-13 Thread Rallavagu

Great. Thanks Erick.

On 10/13/15 5:39 PM, Erick Erickson wrote:

More than expected, guaranteed. As long as at least one replica in a
shard is active, all queries should succeed. Maybe more slowly, but
they should succeed.

Best,
Erick



On Tue, Oct 13, 2015 at 4:25 PM, Rallavagu  wrote:

It appears that when a node that is in "recovery" mode queried it would
defer the query to leader instead of serving from locally. Is this the
expected behavior? Thanks.


Re: solr cloud recovery and search

2015-10-13 Thread Erick Erickson
More than expected, guaranteed. As long as at least one replica in a
shard is active, all queries should succeed. Maybe more slowly, but
they should succeed.

Best,
Erick



On Tue, Oct 13, 2015 at 4:25 PM, Rallavagu  wrote:
> It appears that when a node that is in "recovery" mode queried it would
> defer the query to leader instead of serving from locally. Is this the
> expected behavior? Thanks.


solr cloud recovery and search

2015-10-13 Thread Rallavagu
It appears that when a node that is in "recovery" mode queried it would 
defer the query to leader instead of serving from locally. Is this the 
expected behavior? Thanks.


Re: Solr cross core join special condition

2015-10-13 Thread Yonik Seeley
On Wed, Oct 7, 2015 at 9:42 AM, Ryan Josal  wrote:
> I developed a join transformer plugin that did that (although it didn't
> flatten the results like that).  The one thing that was painful about it is
> that the TextResponseWriter has references to both the IndexSchema and
> SolrReturnFields objects for the primary core.  So when you add a
> SolrDocument from another core it returned the wrong fields.

We've made some progress on this front in trunk:

* SOLR-7957: internal/expert - ResultContext was significantly changed
and expanded
  to allow for multiple full query results (DocLists) per Solr request.
  TransformContext was rendered redundant and was removed. (yonik)

So ResultContext now has it's own searcher, ReturnFields, etc.

-Yonik


Re: Request for Wiki edit right

2015-10-13 Thread Erick Erickson
Just added you to the Solr Wiki contributors group, if you need to
access the Lucene Wiki let us know.

Best,
Erick

On Tue, Oct 13, 2015 at 1:57 PM, Arcadius Ahouansou
 wrote:
> Hello Erick.
> Thank you for the detailed info.
> My username is arcadius.
>
> Thanks.
>
>
> On 13 October 2015 at 16:58, Erick Erickson  wrote:
>
>> Create a user on the Wiki (anyone can), then tell us the user name
>> you've created and we'll add you to the auth lists. There are separate
>> lists for Solr and Lucene. We had to lock these down because we were
>> getting a lot of spam pages created.
>>
>> The reference guide (CWiki) is restricted to committers though.
>>
>> Best,
>> Erick
>>
>> On Tue, Oct 13, 2015 at 6:30 AM, Arcadius Ahouansou
>>  wrote:
>> > Hello.
>> >
>> > Please, can I have the right to edit the Wiki?
>> >
>> > Thanks.
>> >
>> > Arcadius.
>>
>
>
>
> --
> Arcadius Ahouansou
> Menelic Ltd | Information is Power
> M: 07908761999
> W: www.menelic.com
> ---


Re: Request for Wiki edit right

2015-10-13 Thread Arcadius Ahouansou
Hello Erick.
Thank you for the detailed info.
My username is arcadius.

Thanks.


On 13 October 2015 at 16:58, Erick Erickson  wrote:

> Create a user on the Wiki (anyone can), then tell us the user name
> you've created and we'll add you to the auth lists. There are separate
> lists for Solr and Lucene. We had to lock these down because we were
> getting a lot of spam pages created.
>
> The reference guide (CWiki) is restricted to committers though.
>
> Best,
> Erick
>
> On Tue, Oct 13, 2015 at 6:30 AM, Arcadius Ahouansou
>  wrote:
> > Hello.
> >
> > Please, can I have the right to edit the Wiki?
> >
> > Thanks.
> >
> > Arcadius.
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


Re: Grouping facets: Possible to get facet results for each Group?

2015-10-13 Thread Peter Sturge
Hi,
Thanks for your response.
I did have a look at pivots, and they could work in a way. We're still on
Solr 4.3, so I'll have to wait for sub-facets - but they sure look pretty
cool!
Peter


On Tue, Oct 13, 2015 at 12:30 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Can you model your business domain with Solr nested Docs ? In the case you
> can use Yonik article about nested facets.
>
> Cheers
>
> On 13 October 2015 at 05:05, Alexandre Rafalovitch 
> wrote:
>
> > Could you use the new nested facets syntax?
> > http://yonik.com/solr-subfacets/
> >
> > Regards,
> >Alex.
> > 
> > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > http://www.solr-start.com/
> >
> > On 11 October 2015 at 09:51, Peter Sturge 
> wrote:
> > > Been trying to coerce Group faceting to give some faceting back for
> each
> > > group, but maybe this use case isn't catered for in Grouping? :
> > >
> > > So the Use Case is this:
> > > Let's say I do a grouped search that returns say, 9 distinct groups,
> and
> > in
> > > these groups are various numbers of unique field values that need
> > faceting
> > > - but the faceting needs to be within each group:
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Help me read Thread

2015-10-13 Thread Rallavagu
The main reason is that the updates are coming from some client 
applications and it is not a controlled indexing process. The controlled 
indexing process works fine (after spending some time to tune it). Will 
definitely look into throttling incoming updates requests and reduce the 
number of connections per host. Thanks for the insight.


On 10/13/15 9:17 AM, Erick Erickson wrote:

How heavy is heavy? The proverbial smoking gun here will be messages in any
logs referring to "leader initiated recovery". (note, that's the
message I remember seeing,
it may not be exact).

There's no particular work-around here except to back off the indexing
load. Certainly increasing the
thread pool size allowed this to surface. Also 5.2 has some
significant improvements in this area, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

And a lot depends on how you're indexing, batching up updates is a
good thing. If you go to a
multi-shard setup, using SolrJ and CloudSolrServer (CloudSolrClient in
5.x) would help. More
shards would help as well,  but I'd first take a look at the indexing
process and be sure you're
batching up updates.

It's also possible if indexing is a once-a-day process and it fits
with your SLAs to shut off the replicas,
index to the leader, then turn the replicas back on. That's not all
that satisfactory, but I've seen it used.

But with a single shard setup, I really have to ask why indexing at
such a furious rate is
required that you're hitting this. Are you unable to reduce the indexing rate?

Best,
Erick

On Tue, Oct 13, 2015 at 9:08 AM, Rallavagu  wrote:

Also, we have increased number of connections per host from default (20) to
100 for http thread pool to communicate with other nodes. Could this have
caused the issues as it can now spin many threads to send updates?


On 10/13/15 8:56 AM, Erick Erickson wrote:


Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu  wrote:


Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat
with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
  at __lll_lock_wait+34(:0)@0x382ba0e262
  at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
  at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
  at _L_unlock_16+44(:0)@0x382ba0f710
  at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
  at

org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
  at

org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
  at

org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
  at

org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
  at

org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
  at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
  at

org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
  at

org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
  at

org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
  at

org/apache/catalina

Re: Help me read Thread

2015-10-13 Thread Erick Erickson
How heavy is heavy? The proverbial smoking gun here will be messages in any
logs referring to "leader initiated recovery". (note, that's the
message I remember seeing,
it may not be exact).

There's no particular work-around here except to back off the indexing
load. Certainly increasing the
thread pool size allowed this to surface. Also 5.2 has some
significant improvements in this area, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

And a lot depends on how you're indexing, batching up updates is a
good thing. If you go to a
multi-shard setup, using SolrJ and CloudSolrServer (CloudSolrClient in
5.x) would help. More
shards would help as well,  but I'd first take a look at the indexing
process and be sure you're
batching up updates.

It's also possible if indexing is a once-a-day process and it fits
with your SLAs to shut off the replicas,
index to the leader, then turn the replicas back on. That's not all
that satisfactory, but I've seen it used.

But with a single shard setup, I really have to ask why indexing at
such a furious rate is
required that you're hitting this. Are you unable to reduce the indexing rate?

Best,
Erick

On Tue, Oct 13, 2015 at 9:08 AM, Rallavagu  wrote:
> Also, we have increased number of connections per host from default (20) to
> 100 for http thread pool to communicate with other nodes. Could this have
> caused the issues as it can now spin many threads to send updates?
>
>
> On 10/13/15 8:56 AM, Erick Erickson wrote:
>>
>> Is this under a very heavy indexing load? There were some
>> inefficiencies that caused followers to work a lot harder than the
>> leader, but the leader had to spin off a bunch of threads to send
>> update to followers. That's fixed int he 5.2 release.
>>
>> Best,
>> Erick
>>
>> On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu  wrote:
>>>
>>> Please help me understand what is going on with this thread.
>>>
>>> Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat
>>> with
>>> 500 threads.
>>>
>>>
>>> There are 47 threads overall and designated leader becomes unresponsive
>>> though shows "green" from cloud perspective. This is causing issues.
>>>
>>> particularly,
>>>
>>> "   at
>>>
>>> org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
>>>  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
>>>  ^-- Holding lock:
>>> org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
>>>  ^-- Holding lock:
>>> org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"
>>>
>>>
>>>
>>> "http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
>>> native_blocked, daemon
>>>  at __lll_lock_wait+34(:0)@0x382ba0e262
>>>  at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
>>>  at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
>>>  at _L_unlock_16+44(:0)@0x382ba0f710
>>>  at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
>>>  at
>>>
>>> org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
>>>  at
>>>
>>> org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
>>>  at
>>>
>>> org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
>>>  at
>>>
>>> org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
>>>  at
>>>
>>> org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
>>>  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
>>>  ^-- Holding lock:
>>> org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
>>>  ^-- Holding lock:
>>> org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
>>>  at
>>>
>>> org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
>>>  at
>>>
>>> org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
>>>  at
>>> org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
>>>  at
>>>
>>> org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
>>>  at
>>>
>>> org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
>>>  at
>>>
>>> org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
>>>  at
>>>
>>> org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
>>>  at
>>>
>>> org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
>>>  at
>>>
>>> org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
>>>  at
>>>
>>

Re: Help me read Thread

2015-10-13 Thread Rallavagu
Also, we have increased number of connections per host from default (20) 
to 100 for http thread pool to communicate with other nodes. Could this 
have caused the issues as it can now spin many threads to send updates?


On 10/13/15 8:56 AM, Erick Erickson wrote:

Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu  wrote:

Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
 at __lll_lock_wait+34(:0)@0x382ba0e262
 at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
 at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
 at _L_unlock_16+44(:0)@0x382ba0f710
 at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
 at
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
 at
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
 at
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
 at
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
 at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
 at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
 at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
 at
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
 at
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
 at
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
 at
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
 at
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
 at
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
 at
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
 at
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
 at
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
 at
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
 at
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
 at
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
 ^-- Holding lock:
org/apache/tomcat/util/net/SocketWrapper@0x2ee6e4aa8[thin lock]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]
 at java/lang/Thread.run(Thread.java:682)[optimized]
 at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Help me read Thread

2015-10-13 Thread Rallavagu
The heavy load of indexing is true. During this time, all other nodes 
are under "recovery" mode and search queries are referred to leader and 
it times out. Is there a temporary work around for this? Thanks.


On 10/13/15 8:56 AM, Erick Erickson wrote:

Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu  wrote:

Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
 at __lll_lock_wait+34(:0)@0x382ba0e262
 at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
 at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
 at _L_unlock_16+44(:0)@0x382ba0f710
 at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
 at
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
 at
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
 at
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
 at
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
 at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
 at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
 at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
 at
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
 at
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
 at
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
 at
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
 at
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
 at
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
 at
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
 at
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
 at
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
 at
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
 at
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
 at
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
 ^-- Holding lock:
org/apache/tomcat/util/net/SocketWrapper@0x2ee6e4aa8[thin lock]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]
 at java/lang/Thread.run(Thread.java:682)[optimized]
 at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Help me read Thread

2015-10-13 Thread Erick Erickson
Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu  wrote:
> Please help me understand what is going on with this thread.
>
> Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat with
> 500 threads.
>
>
> There are 47 threads overall and designated leader becomes unresponsive
> though shows "green" from cloud perspective. This is causing issues.
>
> particularly,
>
> "   at
> org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
> ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
> ^-- Holding lock:
> org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
> ^-- Holding lock:
> org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"
>
>
>
> "http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
> native_blocked, daemon
> at __lll_lock_wait+34(:0)@0x382ba0e262
> at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
> at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
> at _L_unlock_16+44(:0)@0x382ba0f710
> at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
> at
> org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
> at
> org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
> at
> org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
> at
> org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
> at
> org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
> ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
> ^-- Holding lock:
> org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
> ^-- Holding lock:
> org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
> at
> org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
> at
> org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
> at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
> at
> org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
> at
> org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
> at
> org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
> at
> org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
> at
> org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
> at
> org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
> at
> org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
> at
> org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
> at
> org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
> at
> org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
> at
> org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
> at
> org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
> at
> org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
> at
> org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
> at
> org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
> ^-- Holding lock:
> org/apache/tomcat/util/net/SocketWrapper@0x2ee6e4aa8[thin lock]
> at
> java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
> at
> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]
> at java/lang/Thread.run(Thread.java:682)[optimized]
> at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Indexing Solr in production

2015-10-13 Thread Erick Erickson
Here's a sample:
https://lucidworks.com/blog/2012/02/14/indexing-with-solrj/

On Tue, Oct 13, 2015 at 4:18 AM, Alessandro Benedetti
 wrote:
> The most robust and simple way to go is building your own Indexer.
> You can decide the platform you want, Solr has plenty of client API
> libraries.
>
> For example if you want to write your Indexer app in Java, you can use
> SolrJ..
> Each client library will give you all the flexibility you need to index
> solr in a robust way.
>
> [1] https://cwiki.apache.org/confluence/display/solr/Client+APIs
> Cheers
>
> On 13 October 2015 at 09:35, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi,
>>
>> What is the best practice to do indexing in Solr for production system.I'm
>> using Solr 5.3.0.
>>
>> I understand that post.jar does not have things like robustness checks and
>> retires, which is important in production, as sometimes certain records
>> might failed during the indexing, and we need to re-try the indexing for
>> those records that fails.
>>
>> Normally, do we need to write a new custom handler in order to achieve all
>> these?
>> Want to find out what most people did before I decide on a method and
>> proceed on to the next step.
>>
>> Thank you.
>>
>> Regards,
>> Edwin
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England


Re: Request for Wiki edit right

2015-10-13 Thread Erick Erickson
Create a user on the Wiki (anyone can), then tell us the user name
you've created and we'll add you to the auth lists. There are separate
lists for Solr and Lucene. We had to lock these down because we were
getting a lot of spam pages created.

The reference guide (CWiki) is restricted to committers though.

Best,
Erick

On Tue, Oct 13, 2015 at 6:30 AM, Arcadius Ahouansou
 wrote:
> Hello.
>
> Please, can I have the right to edit the Wiki?
>
> Thanks.
>
> Arcadius.


Help me read Thread

2015-10-13 Thread Rallavagu

Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat 
with 500 threads.



There are 47 threads overall and designated leader becomes unresponsive 
though shows "green" from cloud perspective. This is causing issues.


particularly,

"   at 
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]

^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"




"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive, 
native_blocked, daemon

at __lll_lock_wait+34(:0)@0x382ba0e262
at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
at _L_unlock_16+44(:0)@0x382ba0f710
at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
at 
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
at 
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
at 
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
at 
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
at 
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]

^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
at 
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
at 
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]

at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
at 
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
at 
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
at 
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
at 
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
at 
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
at 
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
at 
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
at 
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
at 
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
at 
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
at 
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
at 
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
at 
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
^-- Holding lock: 
org/apache/tomcat/util/net/SocketWrapper@0x2ee6e4aa8[thin lock]
at 
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
at 
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]

at java/lang/Thread.run(Thread.java:682)[optimized]
at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: catchall fields or multiple fields

2015-10-13 Thread Jack Krupansky
Performing a sequence of queries can help too. For example, if users
commonly search for a product name, you could do an initial query on just
the product name field which should be much faster than searching the text
of all product descriptions, and highlighting would be less problematic. If
that initial query comes up empty, then you could move on to the next
highest most likely field, maybe product title (short one line
description), and query voluminous fields like detailed product
descriptions, specifications, and user comments/reviews only as a last
resort.

-- Jack Krupansky

On Tue, Oct 13, 2015 at 6:17 AM, elisabeth benoit  wrote:

> Thanks to you all for those informed advices.
>
> Thanks Trey for your very detailed point of view. This is now very clear to
> me how a search on multiple fields can grow slower than a search on a
> catchall field.
>
> Our actual search model is problematic: we search on a catchall field, but
> need to know which fields match, so we do highlighting on multi fields (not
> indexed, but stored). To improve performance, we want to get rid of
> highlighting and use the solr explain output. To get the explain output on
> those fields, we need to do a search on those fields.
>
> So I guess we have to test if removing highlighting and adding multi fields
> search will improve performances or not.
>
> Best regards,
> Elisabeth
>
>
>
> 2015-10-12 17:55 GMT+02:00 Jack Krupansky :
>
> > I think it may all depend on the nature of your application and how much
> > commonality there is between fields.
> >
> > One interesting area is auto-suggest, where you can certainly suggest
> from
> > the union of all fields, you may want to give priority to suggestions
> from
> > preferred fields. For example, for actual product names or important
> > keywords rather than random words from the English language that happen
> to
> > occur in descriptions, all of which would occur in a catchall.
> >
> > -- Jack Krupansky
> >
> > On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit <
> > elisaelisael...@gmail.com
> > > wrote:
> >
> > > Hello,
> > >
> > > We're using solr 4.10 and storing all data in a catchall field. It
> seems
> > to
> > > me that one good reason for using a catchall field is when using
> scoring
> > > with idf (with idf, a word might not have same score in all fields). We
> > got
> > > rid of idf and are now considering using multiple fields. I remember
> > > reading somewhere that using a catchall field might speed up searching
> > > time. I was wondering if some of you have any opinion (or experience)
> > > related to this subject.
> > >
> > > Best regards,
> > > Elisabeth
> > >
> >
>


Re: Selective field query

2015-10-13 Thread Colin Hunter
Thanks Alessandro,
Certainly the use of the Analysis tool, along with debug query supplies a
lot of useful information.
I've found that a combination of using the ngram field, (as detailed
previously), along with the qf param of the edismax parser seems to be
working well.
>From there I can dynamically create the relevant search queries as required.

Certailnly there's a lot more to get on board here to properly maximise the
value of using solr.
So further suggestions, advice, etc. remain very welcome.

Appreciated
Colin

On Tue, Oct 13, 2015 at 12:00 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> The first thing I would suggest you is the use of the Analysis tool, to
> explore your analysis at query and index time.
> This will be the first step to understand if you are actually tokenising
> and token filtering as expected.
>
> Then you should play with different fields ( in the case the original field
> is single value, you are not going to lose the relation) .
> Then you can provide the search you expect , for example :
>
> Service Name : Ngram token filtered ( or whatever you need)
> Service id: keywordTokenizer ( to keep only one token) .
>
> Can you give additional details ?
>
> Cheers
>
> On 13 October 2015 at 10:36, Colin Hunter  wrote:
>
> > Thanks Scot.
> > That is definitely moving things in the right direction
> >
> > I have another question that relates to this. It is also requested to
> > implement a partial word search on the service name field.
> > However, each service also has a unique identifier (string). This field
> > requires exact string matching.
> > I have attempted making a copy field for Service Name using the
> > NGramTokenizerFactory, as below.
> >
> > 
> >  > positionIncrementGap="100">
> >   
> >  > minGramSize="3" maxGramSize="7"/>
> > 
> >   
> >   
> > 
> > 
> >   
> > 
> >
> > While the debugQuery info showed the _ngram results, I was having issue
> > building the query that would return these results along with regular
> > search. (Your previous response may well clarify this).
> > When I set this to return on all fields, then the full string match
> > required for the service UI no longer works.
> >
> > I certainly have to explore further re the eDisMax parser.
> > However, any advice that can be offered, regarding meeting these
> different
> > requirements in a single query would be very helpful.
> >
> > Many Thanks
> > Colin
> >
> > On Tue, Oct 13, 2015 at 5:49 AM, Scott Stults <
> > sstu...@opensourceconnections.com> wrote:
> >
> > > Colin,
> > >
> > > The other thing you'll want to keep in mind (and you'll find this out
> > with
> > > debugQuery) is that the query parser is going to take your
> > > ServiceName:(Search Service) and turn it into two queries --
> > > ServiceName:(Search) ServiceName:(Service). That's because the query
> > parser
> > > breaks on whitespace. My bet is you have a lot of entries with a name
> of
> > "X
> > > Service" and the second part of your query is hitting them. Phrase
> Field
> > > might be your friend here:
> > >
> > > https://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29
> > >
> > >
> > > -Scott
> > >
> > > On Mon, Oct 12, 2015 at 4:15 AM, Colin Hunter 
> > > wrote:
> > >
> > > > Thanks Erick, I'm sure this will be valuable in implementing ngram
> > filter
> > > > factory
> > > >
> > > > On Fri, Oct 9, 2015 at 4:38 PM, Erick Erickson <
> > erickerick...@gmail.com>
> > > > wrote:
> > > >
> > > > > Colin:
> > > > >
> > > > > Adding &debug=all to your query is your friend here, the
> > > > > parsed_query.toString will show you exactly what
> > > > > is searched against.
> > > > >
> > > > > Best,
> > > > > Erick
> > > > >
> > > > > On Fri, Oct 9, 2015 at 2:09 AM, Colin Hunter  >
> > > > wrote:
> > > > > > Ah ha...   the copy field...  makes sense.
> > > > > > Thank You.
> > > > > >
> > > > > > On Fri, Oct 9, 2015 at 10:04 AM, Upayavira 
> wrote:
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Oct 9, 2015, at 09:54 AM, Colin Hunter wrote:
> > > > > >> > Hi
> > > > > >> >
> > > > > >> > I am working on a complex search utility with an index created
> > via
> > > > > data
> > > > > >> > import from an extensive MySQL database.
> > > > > >> > There are many ways in which the index is searched. One of the
> > > > utility
> > > > > >> > input fields searches only on a Service Name. However, if I
> > target
> > > > the
> > > > > >> > query as q=ServiceName:"Searched service", this only returns
> an
> > > > exact
> > > > > >> > string match. If q=Searched Service, the query still returns
> > > results
> > > > > from
> > > > > >> > all indexed data.
> > > > > >> >
> > > > > >> > Is there a way to construct a query to only return results
> from
> > > one
> > > > > field
> > > > > >> > of a doc ?
> > > > > >> > I have tried setting index=false, stored=true on unwanted
> > fields,
> > > > but
> > > > > >> > these
> > > > > >> > appear to have still

Request for Wiki edit right

2015-10-13 Thread Arcadius Ahouansou
Hello.

Please, can I have the right to edit the Wiki?

Thanks.

Arcadius.


RE: File-based Spelling

2015-10-13 Thread Dyer, James
Mark,

The older spellcheck implementations create an n-gram sidecar index, which is 
why you're seeing your name split into 2-grams like this.  See the IR Book by 
Manning et al, section 3.3.4 for more information.  Based on the results you're 
getting, I think it is loading your file correctly.  You should now try a query 
against this spelling index, using words *not* in the file you loaded that are 
within 1 or 2 edits from something that is in the dictionary.  If it doesn't 
yield suggestions, then post the relevant sections of the solrconfig.xml, 
schema.xml and also the query string you are trying.

James Dyer
Ingram Content Group


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov] 
Sent: Monday, October 12, 2015 2:38 PM
To: Solr User Group
Subject: File-based Spelling

Greetings!

I'm attempting to use a file-based spell checker.  My sourceLocation is 
/usr/share/dict/linux.words, and my spellcheckIndexDir is set to 
./data/spFile.  BuildOnStartup is set to true, and I see nothing to 
suggest any sort of problem/error in solr.log.  However, in my 
./data/spFile/ directory, there are only two files: segments_2 with only 
71 bytes in it, and a zero-byte write.lock file.  For a source 
dictionary having 480,000 words in it, I was expecting a bit more 
substance in the ./data/spFile directory.  Something doesn't seem right 
with this.

Moreover, I ran a query on the word Fenbers, which isn't listed in the 
linux.words file, but there are several similar words.  The results I 
got back were odd, and suggestions included the following:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

But I expected suggestions like fenders, embers, and fenberry, etc. I 
also ran a query on Mark (which IS listed in linux.words) and got back 
two suggestions in a similar format.  I played with configurables like 
changing the fieldType from text_en to string and the characterEncoding 
from UTF-8 to ASCII, etc., but nothing seemed to yield any different 
results.

Can anyone offer suggestions as to what I'm doing wrong?  I've been 
struggling with this for more than 40 hours now!  I'm surprised my 
persistence has lasted this long!

Thanks,
Mark


Re: are there any SolrCloud supervisors?

2015-10-13 Thread Jean-Sebastien Vachon
I would be interested in seeing it in action. Do you have any documentation 
available on what it does and how?

Thanks


From: r b 
Sent: Friday, October 2, 2015 3:09 PM
To: solr-user@lucene.apache.org
Subject: are there any SolrCloud supervisors?

I've been working on something that just monitors ZooKeeper to add and
remove nodes from collections. the use case being I put SolrCloud in
an autoscaling group on EC2 and as instances go up and down, I need
them added to the collection. It's something I've built for work and
could clean up to share on GitHub if there is much interest.

I asked in the IRC about a SolrCloud supervisor utility but wanted to
extend that question to this list. are there any more "full featured"
supervisors out there?


-renning


Re: are there any SolrCloud supervisors?

2015-10-13 Thread Susheel Kumar
Sounds interesting...

On Tue, Oct 13, 2015 at 12:58 AM, Trey Grainger  wrote:

> I'd be very interested in taking a look if you post the code.
>
> Trey Grainger
> Co-Author, Solr in Action
> Director of Engineering, Search & Recommendations @ CareerBuilder
>
> On Fri, Oct 2, 2015 at 3:09 PM, r b  wrote:
>
> > I've been working on something that just monitors ZooKeeper to add and
> > remove nodes from collections. the use case being I put SolrCloud in
> > an autoscaling group on EC2 and as instances go up and down, I need
> > them added to the collection. It's something I've built for work and
> > could clean up to share on GitHub if there is much interest.
> >
> > I asked in the IRC about a SolrCloud supervisor utility but wanted to
> > extend that question to this list. are there any more "full featured"
> > supervisors out there?
> >
> >
> > -renning
> >
>


Re: Replication and soft commits for NRT searches

2015-10-13 Thread Shalin Shekhar Mangar
Comments inline:

On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
> created with replicationFactor=2, i.e. I have one replica for each shard. 
> Beyond that I am using autoCommit/maxDocs=1 and autoSoftCommits/maxDocs=1 
> in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the 
> documentation 
> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>  I cannot enforce that an update gets replicated to all replicas, but I can 
> only get the achieved replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the 
> replica has written the update to its transaction log? Or has the replica 
> also performed the soft commit as configured with autoSoftCommits/maxDocs=1? 
> The answer is important for me, as if the update would only get written to 
> the transaction log, I could not search for it reliable, as the replica may 
> not have added it to the searchable index.

rf=2 means that the update was successfully replicated to and
acknowledged by two replicas (including the leader). The rf only deals
with the durability of the update and has no relation to visibility of
the update to searchers. The auto(soft)commit settings are applied
asynchronously and do not block an update request.

>
> My second question is, does rf=1 mean that the update was definitely not 
> successful on the replica or could it also represent a timeout of the 
> replication request from the shard leader? If it could also represent a 
> timeout, then there would be a small chance that the replication was 
> successfully despite of the timeout.

Well, rf=1 implies that the update was only applied on the leader's
index + tlog and either replicas weren't available or returned an
error or the request timed out. So yes, you are right that it can
represent a timeout and as such there is a chance that the replication
was indeed successful despite of the timeout.

>
> Is there a way to retrieve the replication factor for a specific document 
> after the update in order to check if replication was successful in the 
> meantime?
>

No, there is no way to do that.

> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information. If you are not the intended recipient, you are 
> notified that any dissemination, copying of this e-mail and any attachments 
> thereto or use of their contents by any means whatsoever is strictly 
> prohibited. If you have received this e-mail in error, please advise the 
> sender immediately and delete this e-mail and all attached documents from 
> your computer system."
> #



-- 
Regards,
Shalin Shekhar Mangar.


Re: AutoComplete Feature in Solr

2015-10-13 Thread Salman Ansari
Thanks guys, I was able to make it work using your articles. The key point
was mentioned in one of the articles which was that suggestion component is
preconfigured in techproducts sample. I started my work from there and
tweaked it to suit my needs. Thanks a lot!

One thing still remaining, I don't find the support for "suggest" is
Solr.NET. What I found is that we should use Spell check but that is not
the recommended option as per the articles. Spell Check component in
Solr.NET will use /spell component while I have configured suggestions
using /suggest component. It is easy to handle it myself as well but I was
just wondering if Solr.NET supports suggest component somehow.

Regards,
Salman

On Tue, Oct 13, 2015 at 2:39 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> As Erick suggested you are reading a really old way to provide the
> autocomplete feature !
> Please take a read to the docs Erick linked and to my blog as well.
> It will definitely give you more insight about the Autocomplete world !
>
> Cheers
>
> [1] http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html
>
> On 12 October 2015 at 21:24, Salman Ansari 
> wrote:
>
> > Hi,
> >
> > I have been trying to get the autocomplete feature in Solr working with
> no
> > luck up to now. First I read that "suggest component" is the recommended
> > way as in the below article (and this is the exact functionality I am
> > looking for, which is to autocomplete multiple words)
> >
> >
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> >
> > Then I tried implementing suggest as described in the following articles
> in
> > this order
> > 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
> > 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
> > implemented suggesting phrases)
> > 3)
> >
> >
> http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms
> >
> > With no luck, after implementing each article when I run my query as
> > http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack
> >
> >
> >
> > I get
> > 
> > 
> > 0
> > 0
> > 
> > 
> >
> >  Although I have an entry for Barack Obama in my index. I am posting my
> > Solr configuration as well
> >
> > 
> >  
> >   suggest
> >   org.apache.solr.spelling.suggest.Suggester
> >> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
> >   entity_autocomplete
> > true
> >  
> > 
> >
> >   > class="org.apache.solr.handler.component.SearchHandler">
> >  
> >   true
> >   suggest
> >   10
> > true
> > false
> >  
> >  
> >   suggest
> >  
> > 
> >
> > It looks like a very simple job, but even after following so many
> articles,
> > I could not get it right. Any comment will be appreciated!
> >
> > Regards,
> > Salman
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Spell Check and Privacy

2015-10-13 Thread Alessandro Benedetti
We had the very exact issue and we solved as James suggested :)

To answer Susheel, the requirement is to provide users with the only
suggestions he should see.
It can seem a paranoid request but can happen that we don't want to show
any of the indexed data for different users.
In enterprise search you are able to see only the documents you expect to
see, and the same is valid for autocompletion and spellchecking.
Time ago I was thinking to provide a filter query approach for
spellchecking and autocomplete, maybe I will return to think about it later.

Cheers

On 12 October 2015 at 15:36, Susheel Kumar  wrote:

> Hi Arnon,
>
> I couldn't fully understood your use case regarding Privacy. Are you
> concerned that SpellCheck may reveal user names part of suggestions which
> could have belonged to different organizations / ACLS OR after providing
> suggestions you are concerned that user may be able to click and view other
> organization users?
>
> Please provide some details on your concern for Privacy with Spell Checker.
>
> Thanks,
> Susheel
>
> On Mon, Oct 12, 2015 at 9:45 AM, Dyer, James  >
> wrote:
>
> > Arnon,
> >
> > Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to
> a
> > non-zero value.  This will give you re-written queries that are
> guaranteed
> > to return hits, given the original query and filters.  If you are using
> an
> > "mm" value other than 100%, you also will want specify "
> > spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use
> > "spellcheck.collateParam.q.op=AND")
> >
> > Of course, the first section of the spellcheck result will still show
> > every possible suggestion, so your client needs to discard these and not
> > divulge them to the user.  If you need to know word-by-word how the
> > collations were constructed, then specify
> > "spellcheck.collateExtendedResults=true".  Use the extended collation
> > results for this information and not the first section of the spellcheck
> > results.
> >
> > This is all fairly well-documented on the old solr wiki:
> > https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate
> >
> > James Dyer
> > Ingram Content Group
> >
> > -Original Message-
> > From: Arnon Yogev [mailto:arn...@il.ibm.com]
> > Sent: Monday, October 12, 2015 2:33 AM
> > To: solr-user@lucene.apache.org
> > Subject: Spell Check and Privacy
> >
> > Hi,
> >
> > Our system supports many users from different organizations and with
> > different ACLs.
> > We consider adding a spell check ("did you mean") functionality using
> > DirectSolrSpellChecker. However, a privacy concern was raised, as this
> > might lead to private information being revealed between users via the
> > suggested terms. Using the FileBasedSpellChecker is another option, but
> > naturally a static list of terms is not optimal.
> >
> > Is there a best practice or a suggested method for these kind of cases?
> >
> > Thanks,
> > Arnon
> >
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: AutoComplete Feature in Solr

2015-10-13 Thread Alessandro Benedetti
As Erick suggested you are reading a really old way to provide the
autocomplete feature !
Please take a read to the docs Erick linked and to my blog as well.
It will definitely give you more insight about the Autocomplete world !

Cheers

[1] http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html

On 12 October 2015 at 21:24, Salman Ansari  wrote:

> Hi,
>
> I have been trying to get the autocomplete feature in Solr working with no
> luck up to now. First I read that "suggest component" is the recommended
> way as in the below article (and this is the exact functionality I am
> looking for, which is to autocomplete multiple words)
>
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
>
> Then I tried implementing suggest as described in the following articles in
> this order
> 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
> 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
> implemented suggesting phrases)
> 3)
>
> http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms
>
> With no luck, after implementing each article when I run my query as
> http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack
>
>
>
> I get
> 
> 
> 0
> 0
> 
> 
>
>  Although I have an entry for Barack Obama in my index. I am posting my
> Solr configuration as well
>
> 
>  
>   suggest
>   org.apache.solr.spelling.suggest.Suggester
>name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
>   entity_autocomplete
> true
>  
> 
>
>   class="org.apache.solr.handler.component.SearchHandler">
>  
>   true
>   suggest
>   10
> true
> false
>  
>  
>   suggest
>  
> 
>
> It looks like a very simple job, but even after following so many articles,
> I could not get it right. Any comment will be appreciated!
>
> Regards,
> Salman
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Grouping facets: Possible to get facet results for each Group?

2015-10-13 Thread Alessandro Benedetti
Can you model your business domain with Solr nested Docs ? In the case you
can use Yonik article about nested facets.

Cheers

On 13 October 2015 at 05:05, Alexandre Rafalovitch 
wrote:

> Could you use the new nested facets syntax?
> http://yonik.com/solr-subfacets/
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
> On 11 October 2015 at 09:51, Peter Sturge  wrote:
> > Been trying to coerce Group faceting to give some faceting back for each
> > group, but maybe this use case isn't catered for in Grouping? :
> >
> > So the Use Case is this:
> > Let's say I do a grouped search that returns say, 9 distinct groups, and
> in
> > these groups are various numbers of unique field values that need
> faceting
> > - but the faceting needs to be within each group:
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Indexing Solr in production

2015-10-13 Thread Alessandro Benedetti
The most robust and simple way to go is building your own Indexer.
You can decide the platform you want, Solr has plenty of client API
libraries.

For example if you want to write your Indexer app in Java, you can use
SolrJ..
Each client library will give you all the flexibility you need to index
solr in a robust way.

[1] https://cwiki.apache.org/confluence/display/solr/Client+APIs
Cheers

On 13 October 2015 at 09:35, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> What is the best practice to do indexing in Solr for production system.I'm
> using Solr 5.3.0.
>
> I understand that post.jar does not have things like robustness checks and
> retires, which is important in production, as sometimes certain records
> might failed during the indexing, and we need to re-try the indexing for
> those records that fails.
>
> Normally, do we need to write a new custom handler in order to achieve all
> these?
> Want to find out what most people did before I decide on a method and
> proceed on to the next step.
>
> Thank you.
>
> Regards,
> Edwin
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: NullPointerException

2015-10-13 Thread Alessandro Benedetti
Generally it is highly discouraged to build the spellcheck on startup.
In the case of big suggestion file, you are going to build the suggester
data structures ( basically FST in memory and then in disk) for a long
time, on startup.
You should build your spellchecker only when you change the file source of
the suggestions,

Checking the snippet, first I see you add a field to the FileBased
Spellchecker config, which is useless.
Anyway should not be the problem.
Can you give us the source of suggestions ?
A snippet of the file ?

Cheers

On 13 October 2015 at 10:02, Duck Geraint (ext) GBJH <
geraint.d...@syngenta.com> wrote:

> How odd, though I'm afraid this is reaching the limit of my knowledge at
> this point (and I still can't find where that box is within the Admin UI!).
>
> The only thing I'd say is to check that "logtext" is a defined named field
> within your schema, and to double check how it's field type is defined.
>
> Also, try without the "text_en"
> definition (I believe this should be implicit as the filed type of
> "logtext" above).
>
> Geraint
>
> Geraint Duck
> Data Scientist
> Toxicology and Health Sciences
> Syngenta UK
> Email: geraint.d...@syngenta.com
>
>
> -Original Message-
> From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
> Sent: 12 October 2015 12:14
> To: solr-user@lucene.apache.org
> Subject: Re: NullPointerException
>
> On 10/12/2015 5:38 AM, Duck Geraint (ext) GBJH wrote:
> > "When I use the Admin UI (v5.3.0), and check the spellcheck.build box"
> > Out of interest, where is this option within the Admin UI? I can't find
> anything like it in mine...
> This is in the expanded options that open up once I put a checkmark in the
> "spellcheck" box.
> > Do you get the same issue by submitting the build command directly with
> something like this instead:
> > http://localhost:8983/solr//ELspell?spellcheck.build=true
> > ?
> Yes, I do.
> > It'll be reasonably obvious if the dictionary has actually built or not
> by the file size of your speller store:
> > /localapps/dev/EventLog/solr/EventLog2/data/spFile
> >
> >
> > Otherwise, (temporarily) try adding...
> > true
> > ...to your spellchecker search component config, you might find it'll
> log a more useful error message that way.
> Interesting!  The index builds successfully using this method and I get no
> stacktrace error.  Hurray!  But why??
>
> So now, I tried running a query, so I typed Fenbers into the spellcheck.q
> box, and I get the following 9 suggestions:
> fenber
> f en be r
> f e nb er
> f en b er
> f e n be r
> f en b e r
> f e nb e r
> f e n b er
> f e n b e r
>
> I find this very odd because I commented out all references to the
> wordbreak checker in solrconfig.xml.  What do I configure so that Solr will
> give me sensible suggestions like:
>fenders
>embers
>fenberry
> and so on?
>
> Mark
>
> 
>
>
> Syngenta Limited, Registered in England No 2710846;Registered Office :
> Syngenta Limited, European Regional Centre, Priestley Road, Surrey Research
> Park, Guildford, Surrey, GU2 7YH, United Kingdom
> 
>  This message may contain confidential information. If you are not the
> designated recipient, please notify the sender immediately, and delete the
> original and any copies. Any use of the message by you is prohibited.
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Selective field query

2015-10-13 Thread Alessandro Benedetti
The first thing I would suggest you is the use of the Analysis tool, to
explore your analysis at query and index time.
This will be the first step to understand if you are actually tokenising
and token filtering as expected.

Then you should play with different fields ( in the case the original field
is single value, you are not going to lose the relation) .
Then you can provide the search you expect , for example :

Service Name : Ngram token filtered ( or whatever you need)
Service id: keywordTokenizer ( to keep only one token) .

Can you give additional details ?

Cheers

On 13 October 2015 at 10:36, Colin Hunter  wrote:

> Thanks Scot.
> That is definitely moving things in the right direction
>
> I have another question that relates to this. It is also requested to
> implement a partial word search on the service name field.
> However, each service also has a unique identifier (string). This field
> requires exact string matching.
> I have attempted making a copy field for Service Name using the
> NGramTokenizerFactory, as below.
>
> 
>  positionIncrementGap="100">
>   
>  minGramSize="3" maxGramSize="7"/>
> 
>   
>   
> 
> 
>   
> 
>
> While the debugQuery info showed the _ngram results, I was having issue
> building the query that would return these results along with regular
> search. (Your previous response may well clarify this).
> When I set this to return on all fields, then the full string match
> required for the service UI no longer works.
>
> I certainly have to explore further re the eDisMax parser.
> However, any advice that can be offered, regarding meeting these different
> requirements in a single query would be very helpful.
>
> Many Thanks
> Colin
>
> On Tue, Oct 13, 2015 at 5:49 AM, Scott Stults <
> sstu...@opensourceconnections.com> wrote:
>
> > Colin,
> >
> > The other thing you'll want to keep in mind (and you'll find this out
> with
> > debugQuery) is that the query parser is going to take your
> > ServiceName:(Search Service) and turn it into two queries --
> > ServiceName:(Search) ServiceName:(Service). That's because the query
> parser
> > breaks on whitespace. My bet is you have a lot of entries with a name of
> "X
> > Service" and the second part of your query is hitting them. Phrase Field
> > might be your friend here:
> >
> > https://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29
> >
> >
> > -Scott
> >
> > On Mon, Oct 12, 2015 at 4:15 AM, Colin Hunter 
> > wrote:
> >
> > > Thanks Erick, I'm sure this will be valuable in implementing ngram
> filter
> > > factory
> > >
> > > On Fri, Oct 9, 2015 at 4:38 PM, Erick Erickson <
> erickerick...@gmail.com>
> > > wrote:
> > >
> > > > Colin:
> > > >
> > > > Adding &debug=all to your query is your friend here, the
> > > > parsed_query.toString will show you exactly what
> > > > is searched against.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Fri, Oct 9, 2015 at 2:09 AM, Colin Hunter 
> > > wrote:
> > > > > Ah ha...   the copy field...  makes sense.
> > > > > Thank You.
> > > > >
> > > > > On Fri, Oct 9, 2015 at 10:04 AM, Upayavira  wrote:
> > > > >
> > > > >>
> > > > >>
> > > > >> On Fri, Oct 9, 2015, at 09:54 AM, Colin Hunter wrote:
> > > > >> > Hi
> > > > >> >
> > > > >> > I am working on a complex search utility with an index created
> via
> > > > data
> > > > >> > import from an extensive MySQL database.
> > > > >> > There are many ways in which the index is searched. One of the
> > > utility
> > > > >> > input fields searches only on a Service Name. However, if I
> target
> > > the
> > > > >> > query as q=ServiceName:"Searched service", this only returns an
> > > exact
> > > > >> > string match. If q=Searched Service, the query still returns
> > results
> > > > from
> > > > >> > all indexed data.
> > > > >> >
> > > > >> > Is there a way to construct a query to only return results from
> > one
> > > > field
> > > > >> > of a doc ?
> > > > >> > I have tried setting index=false, stored=true on unwanted
> fields,
> > > but
> > > > >> > these
> > > > >> > appear to have still been returned in results.
> > > > >>
> > > > >> q=ServiceName:(Searched Service)
> > > > >>
> > > > >> That'll look in just one field.
> > > > >>
> > > > >> Remember changing indexed to false doesn't impact the stuff
> already
> > in
> > > > >> your index. And the reason you are likely getting all that stuff
> is
> > > > >> because you have a copyField that copies it over into the 'text'
> > > field.
> > > > >> If you'll never want to search on some fields, switch them to
> > > > >> index=false, make sure you aren't doing a copyField on them, and
> > then
> > > > >> reindex.
> > > > >>
> > > > >> Upayavira
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > www.gfc.uk.net
> > > >
> > >
> > >
> > >
> > > --
> > > www.gfc.uk.net
> > >
> >
> >
> >
> > --
> > Scott Stults | Founder & Solutions Architect | OpenSource Connections,
> LLC
> > | 434.409.2780
> > http://www.opens

Re: catchall fields or multiple fields

2015-10-13 Thread elisabeth benoit
Thanks to you all for those informed advices.

Thanks Trey for your very detailed point of view. This is now very clear to
me how a search on multiple fields can grow slower than a search on a
catchall field.

Our actual search model is problematic: we search on a catchall field, but
need to know which fields match, so we do highlighting on multi fields (not
indexed, but stored). To improve performance, we want to get rid of
highlighting and use the solr explain output. To get the explain output on
those fields, we need to do a search on those fields.

So I guess we have to test if removing highlighting and adding multi fields
search will improve performances or not.

Best regards,
Elisabeth



2015-10-12 17:55 GMT+02:00 Jack Krupansky :

> I think it may all depend on the nature of your application and how much
> commonality there is between fields.
>
> One interesting area is auto-suggest, where you can certainly suggest from
> the union of all fields, you may want to give priority to suggestions from
> preferred fields. For example, for actual product names or important
> keywords rather than random words from the English language that happen to
> occur in descriptions, all of which would occur in a catchall.
>
> -- Jack Krupansky
>
> On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Hello,
> >
> > We're using solr 4.10 and storing all data in a catchall field. It seems
> to
> > me that one good reason for using a catchall field is when using scoring
> > with idf (with idf, a word might not have same score in all fields). We
> got
> > rid of idf and are now considering using multiple fields. I remember
> > reading somewhere that using a catchall field might speed up searching
> > time. I was wondering if some of you have any opinion (or experience)
> > related to this subject.
> >
> > Best regards,
> > Elisabeth
> >
>


Re: Selective field query

2015-10-13 Thread Colin Hunter
Thanks Scot.
That is definitely moving things in the right direction

I have another question that relates to this. It is also requested to
implement a partial word search on the service name field.
However, each service also has a unique identifier (string). This field
requires exact string matching.
I have attempted making a copy field for Service Name using the
NGramTokenizerFactory, as below.



  


  
  


  


While the debugQuery info showed the _ngram results, I was having issue
building the query that would return these results along with regular
search. (Your previous response may well clarify this).
When I set this to return on all fields, then the full string match
required for the service UI no longer works.

I certainly have to explore further re the eDisMax parser.
However, any advice that can be offered, regarding meeting these different
requirements in a single query would be very helpful.

Many Thanks
Colin

On Tue, Oct 13, 2015 at 5:49 AM, Scott Stults <
sstu...@opensourceconnections.com> wrote:

> Colin,
>
> The other thing you'll want to keep in mind (and you'll find this out with
> debugQuery) is that the query parser is going to take your
> ServiceName:(Search Service) and turn it into two queries --
> ServiceName:(Search) ServiceName:(Service). That's because the query parser
> breaks on whitespace. My bet is you have a lot of entries with a name of "X
> Service" and the second part of your query is hitting them. Phrase Field
> might be your friend here:
>
> https://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29
>
>
> -Scott
>
> On Mon, Oct 12, 2015 at 4:15 AM, Colin Hunter 
> wrote:
>
> > Thanks Erick, I'm sure this will be valuable in implementing ngram filter
> > factory
> >
> > On Fri, Oct 9, 2015 at 4:38 PM, Erick Erickson 
> > wrote:
> >
> > > Colin:
> > >
> > > Adding &debug=all to your query is your friend here, the
> > > parsed_query.toString will show you exactly what
> > > is searched against.
> > >
> > > Best,
> > > Erick
> > >
> > > On Fri, Oct 9, 2015 at 2:09 AM, Colin Hunter 
> > wrote:
> > > > Ah ha...   the copy field...  makes sense.
> > > > Thank You.
> > > >
> > > > On Fri, Oct 9, 2015 at 10:04 AM, Upayavira  wrote:
> > > >
> > > >>
> > > >>
> > > >> On Fri, Oct 9, 2015, at 09:54 AM, Colin Hunter wrote:
> > > >> > Hi
> > > >> >
> > > >> > I am working on a complex search utility with an index created via
> > > data
> > > >> > import from an extensive MySQL database.
> > > >> > There are many ways in which the index is searched. One of the
> > utility
> > > >> > input fields searches only on a Service Name. However, if I target
> > the
> > > >> > query as q=ServiceName:"Searched service", this only returns an
> > exact
> > > >> > string match. If q=Searched Service, the query still returns
> results
> > > from
> > > >> > all indexed data.
> > > >> >
> > > >> > Is there a way to construct a query to only return results from
> one
> > > field
> > > >> > of a doc ?
> > > >> > I have tried setting index=false, stored=true on unwanted fields,
> > but
> > > >> > these
> > > >> > appear to have still been returned in results.
> > > >>
> > > >> q=ServiceName:(Searched Service)
> > > >>
> > > >> That'll look in just one field.
> > > >>
> > > >> Remember changing indexed to false doesn't impact the stuff already
> in
> > > >> your index. And the reason you are likely getting all that stuff is
> > > >> because you have a copyField that copies it over into the 'text'
> > field.
> > > >> If you'll never want to search on some fields, switch them to
> > > >> index=false, make sure you aren't doing a copyField on them, and
> then
> > > >> reindex.
> > > >>
> > > >> Upayavira
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > www.gfc.uk.net
> > >
> >
> >
> >
> > --
> > www.gfc.uk.net
> >
>
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com
>



-- 
www.gfc.uk.net


Highlighting content field problem when using JiebaTokenizerFactory

2015-10-13 Thread Zheng Lin Edwin Yeo
Hi,

I'm trying to use the JiebaTokenizerFactory to index Chinese characters in
Solr. It works fine with the segmentation when I'm using
the Analysis function on the Solr Admin UI.

However, when I tried to do the highlighting in Solr, it is not
highlighting in the correct place. For example, when I search of 自然环境与企业本身,
it highlight 认为自然环境与企业本身的

Even when I search for English character like  responsibility, it highlight
  *responsibilit*y.

Basically, the highlighting goes off by 1 character/space consistently.

This problem only happens in content field, and not in any other fields.
Does anyone knows what could be causing the issue?

I'm using jieba-analysis-1.0.0, Solr 5.3.0 and Lucene 5.3.0.


Regards,
Edwin


RE: NullPointerException

2015-10-13 Thread Duck Geraint (ext) GBJH
How odd, though I'm afraid this is reaching the limit of my knowledge at this 
point (and I still can't find where that box is within the Admin UI!).

The only thing I'd say is to check that "logtext" is a defined named field 
within your schema, and to double check how it's field type is defined.

Also, try without the "text_en" 
definition (I believe this should be implicit as the filed type of "logtext" 
above).

Geraint

Geraint Duck
Data Scientist
Toxicology and Health Sciences
Syngenta UK
Email: geraint.d...@syngenta.com


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov]
Sent: 12 October 2015 12:14
To: solr-user@lucene.apache.org
Subject: Re: NullPointerException

On 10/12/2015 5:38 AM, Duck Geraint (ext) GBJH wrote:
> "When I use the Admin UI (v5.3.0), and check the spellcheck.build box"
> Out of interest, where is this option within the Admin UI? I can't find 
> anything like it in mine...
This is in the expanded options that open up once I put a checkmark in the 
"spellcheck" box.
> Do you get the same issue by submitting the build command directly with 
> something like this instead:
> http://localhost:8983/solr//ELspell?spellcheck.build=true
> ?
Yes, I do.
> It'll be reasonably obvious if the dictionary has actually built or not by 
> the file size of your speller store:
> /localapps/dev/EventLog/solr/EventLog2/data/spFile
>
>
> Otherwise, (temporarily) try adding...
> true
> ...to your spellchecker search component config, you might find it'll log a 
> more useful error message that way.
Interesting!  The index builds successfully using this method and I get no 
stacktrace error.  Hurray!  But why??

So now, I tried running a query, so I typed Fenbers into the spellcheck.q box, 
and I get the following 9 suggestions:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

I find this very odd because I commented out all references to the wordbreak 
checker in solrconfig.xml.  What do I configure so that Solr will give me 
sensible suggestions like:
   fenders
   embers
   fenberry
and so on?

Mark




Syngenta Limited, Registered in England No 2710846;Registered Office : Syngenta 
Limited, European Regional Centre, Priestley Road, Surrey Research Park, 
Guildford, Surrey, GU2 7YH, United Kingdom

 This message may contain confidential information. If you are not the 
designated recipient, please notify the sender immediately, and delete the 
original and any copies. Any use of the message by you is prohibited.


Indexing Solr in production

2015-10-13 Thread Zheng Lin Edwin Yeo
Hi,

What is the best practice to do indexing in Solr for production system.I'm
using Solr 5.3.0.

I understand that post.jar does not have things like robustness checks and
retires, which is important in production, as sometimes certain records
might failed during the indexing, and we need to re-try the indexing for
those records that fails.

Normally, do we need to write a new custom handler in order to achieve all
these?
Want to find out what most people did before I decide on a method and
proceed on to the next step.

Thank you.

Regards,
Edwin