RE: including a minus sign "-" in the token

2017-06-09 Thread Phil Scadden
So, the field I am using for search has type of:
  

  
  
  


  
  
  
  

  

You are saying "wainui-8" will indexed as one token? But I should add a 
worddelimiterfilter to the analyser to prevent it being split? Or I guess the 
Worddelimitergraphfilter.

Ideally I want "inter-montane" say, to be treated as hyphenated, but hyphen 
followed by a number to NOT be treated as a hyphenated. That would mean 
catenateWords:1 but catenateNumbers:0???
What would it do with Wainui-10A?

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Saturday, 10 June 2017 12:43 a.m.
To: solr-user@lucene.apache.org
Subject: Re: including a minus sign "-" in the token

On 6/8/2017 8:39 PM, Phil Scadden wrote:
> We have important entities referenced in indexed documents which have
> convention naming of geographicname-number. Eg Wainui-8 I want the tokenizer 
> to treat it as Wainui-8 when indexing, and when I search I want to a q of 
> Wainui-8 (must it be specified as Wainui\-8 ??) to return docs with Wainui-8 
> but not with Wainui-9 or plain Wainui.
>
> Docs are pdfs, and I have using tika to extract text.
>
> How do I set up solr for queries like this?

At indexing time, Solr does not treat the hyphen as a special character like it 
does at query time.  Many analysis components do, though.  If your analysis 
chain includes certain components (the standard tokenizer, the ICU tokenizer, 
and WordDelimeterFilter are on that list), then the hypen may be treated as a 
word break character and the analysis could remove it.

At query time, a hyphen in the middle of a word is not treated as a special 
character.  It would need to be at the beginning of the query text or after a 
space for the query parser to treat it as a negation.
So Wainui-8 would not be a problem, but -7 would, and you'd need to specify it 
as \-7 for it to work like you want.

Thanks,
Shawn

Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: Highlighter not working on some documents

2017-06-09 Thread Phil Scadden
Tried hard to find difference between pdfs returning no highlighter and ones 
that do for same search term.  Includes pdfs that have been OCRed and ones that 
were text to begin with. Head scratching to me.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 10 June 2017 6:22 a.m.
To: solr-user 
Subject: Re: Highlighter not working on some documents

Need lots more information. I.e. schema definitions, query you use, handler 
configuration and the like. Note that highlighted fields must have 
stored="true" set and likely the _text_ field doesn't. At least in the default 
schemas stored is set to false for the catch-all field.
And you don't want to store that information anyway since it's usually the 
destination of copyField directives and you'd highlight _those_ fields.

Best,
Erick

On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden  wrote:
> Do a search with:
> fl=id,title,datasource=true=unified=50=1=pre
> ssure+AND+testing=50=0=json
>
> and I get back a good list of documents. However, some documents are 
> returning empty fields in the highlighter. Eg, in the highlight array have:
> "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
>
> Getting this well up the list of results with good highlighted matchers above 
> and below this entry. Why would the highlighter be failing?
>
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


RE: Highlighter not working on some documents

2017-06-09 Thread Phil Scadden
Managed-schema attached (not a default) and the solrconfig.xml. _text_ is 
stored. (not sure how else highlighting could work??).  The indexer puts the 
body text of the pdf into _text_ field. What the value be in putting it into a 
different field and then using copyField??
 Ie
 SolrInputDocument up = new SolrInputDocument();
 String content = textHandler.toString();
 up.addField("_text_",content);

 solr.add(up);

The puzzling thing for me is why are some documents producing highlights and 
others not. The highlighters in the documents that work are pulling body text 
fragments, not things stored in some other field.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Saturday, 10 June 2017 6:22 a.m.
To: solr-user 
Subject: Re: Highlighter not working on some documents

Need lots more information. I.e. schema definitions, query you use, handler 
configuration and the like. Note that highlighted fields must have 
stored="true" set and likely the _text_ field doesn't. At least in the default 
schemas stored is set to false for the catch-all field.
And you don't want to store that information anyway since it's usually the 
destination of copyField directives and you'd highlight _those_ fields.

Best,
Erick

On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden  wrote:
> Do a search with:
> fl=id,title,datasource=true=unified=50=1=pre
> ssure+AND+testing=50=0=json
>
> and I get back a good list of documents. However, some documents are 
> returning empty fields in the highlighter. Eg, in the highlight array have:
> "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
>
> Getting this well up the list of results with good highlighted matchers above 
> and below this entry. Why would the highlighter be failing?
>
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


solrconfig.xml
Description: solrconfig.xml


Re: Failure to load shards

2017-06-09 Thread John Bickerstaff
Thanks Eric!

It's very likely that the auto scale groups spinning up new Solrs hit
zookeeper harder than our initial deploy...  Just due to the way things get
staggered during the deploy.

Unfortunately, I don't think there's a way to stagger the auto scale
group's work of bringing up Solr boxes (although I need to check)

I appreciate the hint to check the Overseer queue - I'll be doing that for
sure...



On Fri, Jun 9, 2017 at 12:19 PM, Erick Erickson 
wrote:

> John:
>
> First place I'd look is the ZooKeeper Overseer queue. Prior to 6.6
> there were some inefficiencies in how those messages were processed
> and that queue would get very, very large when lots of replicas came
> up all at once, and that would gum up the works. See: SOLR-10524.
>
> The quick check would be to bring up your nodes a few at a time and
> monitor the Overseer work queue(s) in ZK. Bring up, say, 5 nodes, wait
> for the Overseer queue to settle down, bring up 5 more. Rinse, repeat.
> If you can bring everything up and index and the like, that's probably
> the issue.
>
> Purely keying off of your statements "The code used is exactly the
> same code that successfully spun up the first 3 or 4 solr boxes"
> and "When we scale up, 40 to 80 solr nodes will spin up at one time",
> so may be way off base.
>
> If I'm guessing correctly, then Solr 6.6 or the patch above (and
> perhaps associated) or bringing up boxes more slowly are indicated. I
> do know of installations with over 100K replicas, so Solr works at
> your scale.
>
> Best,
> Erick
>
> On Fri, Jun 9, 2017 at 11:03 AM, John Bickerstaff
>  wrote:
> > Hi all,
> >
> > Here's my situation...
> >
> > In AWS with zookeeper / solr.
> >
> > When trying to spin up additional Solr boxes from an "auto scaling
> group" I
> > get this failure.
> >
> > The code used is exactly the same code that successfully spun up the
> first
> > 3 or 4 solr boxes in each "auto scaling group"
> >
> > Below is a copy of my email to some of my compatriots within the company
> > who also use solr/zookeeper
> >
> > I'm looking for any advice on what _might_ be the cause of this
> failure...
> > Overload on Zookeeper in some way is our best guess.
> >
> > I know this isn't a zookeeper forum - - just hoping someone out there has
> > some experience troubleshooting similar issues.
> >
> > Many thanks in advance...
> >
> > =
> >
> > We have 6 zookeepers. (3 of them are observers).
> >
> > They are not under a load balancer
> >
> > How do I check if zookeeper nodes are under heavy load?
> >
> >
> > The problem arises when we try to scale up with more solr nodes. Current
> > setup we have 160 nodes connected to zookeeper. Each node with 40 cores,
> so
> > around 6400 cores. When we scale up, 40 to 80 solr nodes will spin up at
> > one time.
> >
> > And we are getting errors like these that stops the index distribution
> > process:
> >
> > 2017-06-05 20:06:34.357 ERROR [pool-3-thread-2] o.a.s.c.CoreContainer -
> > Error creating core [p44_b1_s37]: Could not get shard id for core:
> > p44_b1_s37
> >
> >
> > org.apache.solr.common.SolrException: Could not get shard id for core:
> > p44_b1_s37
> >
> > at org.apache.solr.cloud.ZkController.waitForShardId(
> ZkController.java:1496)
> >
> > at
> > org.apache.solr.cloud.ZkController.doGetShardIdAndNodeNameProcess
> (ZkController.java:1438)
> >
> > at org.apache.solr.cloud.ZkController.preRegister(
> ZkController.java:1548)
> >
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)
> >
> > at org.apache.solr.core.CoreContainer.create(CoreContainer.java:757)
> >
> > at com.ancestry.solr.servlet.AcomServlet.indexTransfer(
> AcomServlet.java:319)
> >
> > at
> > com.ancestry.solr.servlet.AcomServlet.lambda$indexTransferStart$1(
> AcomServlet.java:303)
> >
> > at
> > com.ancestry.solr.service.IndexTransferWorker.run(
> IndexTransferWorker.java:78)
> >
> > at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> >
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> >
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> > Which we predict has to do with zookeeper not responding fast enough.
>


Re: Highlighter not working on some documents

2017-06-09 Thread Erick Erickson
Need lots more information. I.e. schema definitions, query you use,
handler configuration and the like. Note that highlighted fields must
have stored="true" set and likely the _text_ field doesn't. At least
in the default schemas stored is set to false for the catch-all field.
And you don't want to store that information anyway since it's usually
the destination of copyField directives and you'd highlight _those_
fields.

Best,
Erick

On Thu, Jun 8, 2017 at 8:37 PM, Phil Scadden  wrote:
> Do a search with:
> fl=id,title,datasource=true=unified=50=1=pressure+AND+testing=50=0=json
>
> and I get back a good list of documents. However, some documents are 
> returning empty fields in the highlighter. Eg, in the highlight array have:
> "W:\\Reports\\OCR\\4272.pdf":{"_text_":[]}
>
> Getting this well up the list of results with good highlighted matchers above 
> and below this entry. Why would the highlighter be failing?
>
> Notice: This email and any attachments are confidential and may not be used, 
> published or redistributed without the prior written consent of the Institute 
> of Geological and Nuclear Sciences Limited (GNS Science). If received in 
> error please destroy and immediately notify GNS Science. Do not copy or 
> disclose the contents.


Re: Failure to load shards

2017-06-09 Thread Erick Erickson
John:

First place I'd look is the ZooKeeper Overseer queue. Prior to 6.6
there were some inefficiencies in how those messages were processed
and that queue would get very, very large when lots of replicas came
up all at once, and that would gum up the works. See: SOLR-10524.

The quick check would be to bring up your nodes a few at a time and
monitor the Overseer work queue(s) in ZK. Bring up, say, 5 nodes, wait
for the Overseer queue to settle down, bring up 5 more. Rinse, repeat.
If you can bring everything up and index and the like, that's probably
the issue.

Purely keying off of your statements "The code used is exactly the
same code that successfully spun up the first 3 or 4 solr boxes"
and "When we scale up, 40 to 80 solr nodes will spin up at one time",
so may be way off base.

If I'm guessing correctly, then Solr 6.6 or the patch above (and
perhaps associated) or bringing up boxes more slowly are indicated. I
do know of installations with over 100K replicas, so Solr works at
your scale.

Best,
Erick

On Fri, Jun 9, 2017 at 11:03 AM, John Bickerstaff
 wrote:
> Hi all,
>
> Here's my situation...
>
> In AWS with zookeeper / solr.
>
> When trying to spin up additional Solr boxes from an "auto scaling group" I
> get this failure.
>
> The code used is exactly the same code that successfully spun up the first
> 3 or 4 solr boxes in each "auto scaling group"
>
> Below is a copy of my email to some of my compatriots within the company
> who also use solr/zookeeper
>
> I'm looking for any advice on what _might_ be the cause of this failure...
> Overload on Zookeeper in some way is our best guess.
>
> I know this isn't a zookeeper forum - - just hoping someone out there has
> some experience troubleshooting similar issues.
>
> Many thanks in advance...
>
> =
>
> We have 6 zookeepers. (3 of them are observers).
>
> They are not under a load balancer
>
> How do I check if zookeeper nodes are under heavy load?
>
>
> The problem arises when we try to scale up with more solr nodes. Current
> setup we have 160 nodes connected to zookeeper. Each node with 40 cores, so
> around 6400 cores. When we scale up, 40 to 80 solr nodes will spin up at
> one time.
>
> And we are getting errors like these that stops the index distribution
> process:
>
> 2017-06-05 20:06:34.357 ERROR [pool-3-thread-2] o.a.s.c.CoreContainer -
> Error creating core [p44_b1_s37]: Could not get shard id for core:
> p44_b1_s37
>
>
> org.apache.solr.common.SolrException: Could not get shard id for core:
> p44_b1_s37
>
> at org.apache.solr.cloud.ZkController.waitForShardId(ZkController.java:1496)
>
> at
> org.apache.solr.cloud.ZkController.doGetShardIdAndNodeNameProcess(ZkController.java:1438)
>
> at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1548)
>
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)
>
> at org.apache.solr.core.CoreContainer.create(CoreContainer.java:757)
>
> at com.ancestry.solr.servlet.AcomServlet.indexTransfer(AcomServlet.java:319)
>
> at
> com.ancestry.solr.servlet.AcomServlet.lambda$indexTransferStart$1(AcomServlet.java:303)
>
> at
> com.ancestry.solr.service.IndexTransferWorker.run(IndexTransferWorker.java:78)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> Which we predict has to do with zookeeper not responding fast enough.


Re: IgnoreCommitOptimizeUpdateProcessorFactory

2017-06-09 Thread Erick Erickson
Not that I see. I think you can specify ignoreOptimizeOnly when you
configure it, but that would still allow hard commits through.

On Fri, Jun 9, 2017 at 3:25 AM, Neeraj Bhatt  wrote:
> Hi All
>
> We want to use IgnoreCommitOptimizeUpdateProcessorFactory so that we can
> ignore commits from client
>
> But we want to give freedom to client to do a soft commit instead
>
> So is there a way so that IgnoreCommitOptimizeUpdateProcessorFactory works
> only on hard commit
>
> Thanks
> Neeraj


Failure to load shards

2017-06-09 Thread John Bickerstaff
Hi all,

Here's my situation...

In AWS with zookeeper / solr.

When trying to spin up additional Solr boxes from an "auto scaling group" I
get this failure.

The code used is exactly the same code that successfully spun up the first
3 or 4 solr boxes in each "auto scaling group"

Below is a copy of my email to some of my compatriots within the company
who also use solr/zookeeper

I'm looking for any advice on what _might_ be the cause of this failure...
Overload on Zookeeper in some way is our best guess.

I know this isn't a zookeeper forum - - just hoping someone out there has
some experience troubleshooting similar issues.

Many thanks in advance...

=

We have 6 zookeepers. (3 of them are observers).

They are not under a load balancer

How do I check if zookeeper nodes are under heavy load?


The problem arises when we try to scale up with more solr nodes. Current
setup we have 160 nodes connected to zookeeper. Each node with 40 cores, so
around 6400 cores. When we scale up, 40 to 80 solr nodes will spin up at
one time.

And we are getting errors like these that stops the index distribution
process:

2017-06-05 20:06:34.357 ERROR [pool-3-thread-2] o.a.s.c.CoreContainer -
Error creating core [p44_b1_s37]: Could not get shard id for core:
p44_b1_s37


org.apache.solr.common.SolrException: Could not get shard id for core:
p44_b1_s37

at org.apache.solr.cloud.ZkController.waitForShardId(ZkController.java:1496)

at
org.apache.solr.cloud.ZkController.doGetShardIdAndNodeNameProcess(ZkController.java:1438)

at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1548)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:815)

at org.apache.solr.core.CoreContainer.create(CoreContainer.java:757)

at com.ancestry.solr.servlet.AcomServlet.indexTransfer(AcomServlet.java:319)

at
com.ancestry.solr.servlet.AcomServlet.lambda$indexTransferStart$1(AcomServlet.java:303)

at
com.ancestry.solr.service.IndexTransferWorker.run(IndexTransferWorker.java:78)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)


Which we predict has to do with zookeeper not responding fast enough.


Re: Solr 6.6 UNLOAD core broken?

2017-06-09 Thread Erick Erickson
Digging. Likely a result of SOLR-10007 and associated.

On Fri, Jun 9, 2017 at 7:30 AM, Mikhail Khludnev  wrote:
> Hello,
>
> Reproduced and raised https://issues.apache.org/jira/browse/SOLR-10857
> Have no workaround, besides of request delete index dir on UNLOAD.
> Thanks for head up.
>
> On Fri, Jun 9, 2017 at 4:05 PM, simon  wrote:
>
>> I'm seeing the same behavior. The CoreAdminAPI call (as generated by the
>> Admin UI) looks correct, and the core.properties file is removed.
>>
>> I don't see anything in the CHANGES.txt for this release which would imply
>> such a change in behavior, nor anything in the 6.6 reference guide, so it
>> looks like a bug.
>>
>> -Simon
>>
>> On Fri, Jun 9, 2017 at 5:14 AM, Andreas Hubold <
>> andreas.hub...@coremedia.com
>> > wrote:
>>
>> > Hi,
>> >
>> > I just tried to update from Solr 6.5.1 to Solr 6.6.0 and observed a
>> > changed behaviour with regard to unloading cores in Solr standalone mode.
>> >
>> > After unloading a core using the CoreAdmin API (e.g. via Admin UI), I
>> > still get search results for that core. It seems, the search request
>> > triggers automatic recreation of the core now. With Solr 6.5.1 search
>> > requests to unloaded cores were answered with 404 as expected.
>> >
>> > Can you confirm this being a bug in 6.6.0 or is this an intended change?
>> >
>> > Cheers,
>> > Andreas
>> >
>> >
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev


Re: international characters in facet.prefix

2017-06-09 Thread arik
Thanks for the guidance.  I have a reasonable "middle ground" blend of
client-side and server side tweaks working now.  In solr I copied my field
into a duplicate field sans folding filters, so that I essentially have
"myfield_raw" and "myfield_analyzed".  Then on the client side include both
these fields in my facet query.  Then finally I prefer results from
myfield_analyzed when they exist, and fallback to the myfield_raw results
when the analyzed one turns up empty, which is what happens when foreign
characters are in the facet.prefix.

So I still get all my results in a single solr call.  Capitalization is
lost, all results are lowercased (I kept the lowercase analyzer in the raw)
but that's ok for my needs.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/international-characters-in-facet-prefix-tp4339415p4339877.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Bringing down ZK without Solr

2017-06-09 Thread Vivek Pathak
Why need to bring down.  How about bring down network access e.g. by adding a 
temp firewall rule. 

Or just send a stop signal to zookeeper process.  On test done send a continue 

Sent from my iPhone

> On Jun 9, 2017, at 9:33 AM, Venkateswarlu Bommineni  wrote:
> 
> Thanks for your reply Eric.
> 
> The use case is We have a script that will send a mail when Solr and ZK
> don't talk to each other.
> 
> so we want to just replicate the issue and test that script.
> 
> but actually, we don't want to bring down Solr and ZK nodes but want to
> just try by disconnecting both of them.
> 
> Thanks,
> Venkat.
> 
> Thanks,
> Venkat.
> 
> 
> 
> On Thu, Jun 8, 2017 at 10:16 PM, Erick Erickson 
> wrote:
> 
>> Well, it depends on what you mean by "impacting".
>> 
>> When ZK drops below quorum you will no longer be able to send indexing
>> requests to Solr, they'll all fail. At least they better ;).
>> 
>> _Queries_ should continue to work, but you're in somewhat uncharted
>> territory, nobody I know runs that way very long ;).
>> 
>> The other thing I'd be sure to test is how robust reconnection is from
>> Solr to ZK when you bring the nodes back up.
>> 
>> bq:  Solr tolly depends in ZK for all I/O
>> 
>> This is a common misunderstanding. Solr depends on ZK for all changes
>> in cluster state, i.e. nodes going up/down/changing state (down,
>> recovering, active etc). Those changes generate traffic between ZK and
>> Solr.
>> 
>> For a normal I/O request, each Solr node has been notified by ZK of
>> the current state of the collection already and that information
>> cached locally. So each node knows everything it needs to know to
>> service the index or query request without talking to ZooKeeper at
>> all.
>> 
>> I know of installations indexing 100s of K documents each second.
>> Actually the record I know of is over 1M docs/second. If each of those
>> requests had to touch ZK to complete ZK could never keep up
>> 
>> Best,
>> Erick
>> 
>> On Thu, Jun 8, 2017 at 8:49 AM, Venkateswarlu Bommineni
>>  wrote:
>>> Hi Team,
>>> 
>>> Is there any way we can bring down ZK without impacting Solr ?
>>> 
>>> I know it might be a silly question as Solr tolly depends in ZK for all
>> I/O
>>> operations and configuration changes.
>>> 
>>> Thanks,
>>> Venkat.
>> 


Re: Solr 6.6 UNLOAD core broken?

2017-06-09 Thread Mikhail Khludnev
Hello,

Reproduced and raised https://issues.apache.org/jira/browse/SOLR-10857
Have no workaround, besides of request delete index dir on UNLOAD.
Thanks for head up.

On Fri, Jun 9, 2017 at 4:05 PM, simon  wrote:

> I'm seeing the same behavior. The CoreAdminAPI call (as generated by the
> Admin UI) looks correct, and the core.properties file is removed.
>
> I don't see anything in the CHANGES.txt for this release which would imply
> such a change in behavior, nor anything in the 6.6 reference guide, so it
> looks like a bug.
>
> -Simon
>
> On Fri, Jun 9, 2017 at 5:14 AM, Andreas Hubold <
> andreas.hub...@coremedia.com
> > wrote:
>
> > Hi,
> >
> > I just tried to update from Solr 6.5.1 to Solr 6.6.0 and observed a
> > changed behaviour with regard to unloading cores in Solr standalone mode.
> >
> > After unloading a core using the CoreAdmin API (e.g. via Admin UI), I
> > still get search results for that core. It seems, the search request
> > triggers automatic recreation of the core now. With Solr 6.5.1 search
> > requests to unloaded cores were answered with 404 as expected.
> >
> > Can you confirm this being a bug in 6.6.0 or is this an intended change?
> >
> > Cheers,
> > Andreas
> >
> >
>



-- 
Sincerely yours
Mikhail Khludnev


Re: Bringing down ZK without Solr

2017-06-09 Thread Venkateswarlu Bommineni
Thanks for your reply Eric.

The use case is We have a script that will send a mail when Solr and ZK
don't talk to each other.

so we want to just replicate the issue and test that script.

but actually, we don't want to bring down Solr and ZK nodes but want to
just try by disconnecting both of them.

Thanks,
Venkat.

Thanks,
Venkat.



On Thu, Jun 8, 2017 at 10:16 PM, Erick Erickson 
wrote:

> Well, it depends on what you mean by "impacting".
>
> When ZK drops below quorum you will no longer be able to send indexing
> requests to Solr, they'll all fail. At least they better ;).
>
> _Queries_ should continue to work, but you're in somewhat uncharted
> territory, nobody I know runs that way very long ;).
>
> The other thing I'd be sure to test is how robust reconnection is from
> Solr to ZK when you bring the nodes back up.
>
> bq:  Solr tolly depends in ZK for all I/O
>
> This is a common misunderstanding. Solr depends on ZK for all changes
> in cluster state, i.e. nodes going up/down/changing state (down,
> recovering, active etc). Those changes generate traffic between ZK and
> Solr.
>
> For a normal I/O request, each Solr node has been notified by ZK of
> the current state of the collection already and that information
> cached locally. So each node knows everything it needs to know to
> service the index or query request without talking to ZooKeeper at
> all.
>
> I know of installations indexing 100s of K documents each second.
> Actually the record I know of is over 1M docs/second. If each of those
> requests had to touch ZK to complete ZK could never keep up
>
> Best,
> Erick
>
> On Thu, Jun 8, 2017 at 8:49 AM, Venkateswarlu Bommineni
>  wrote:
> > Hi Team,
> >
> > Is there any way we can bring down ZK without impacting Solr ?
> >
> > I know it might be a silly question as Solr tolly depends in ZK for all
> I/O
> > operations and configuration changes.
> >
> > Thanks,
> > Venkat.
>


Re: Solr 6.6 UNLOAD core broken?

2017-06-09 Thread simon
I'm seeing the same behavior. The CoreAdminAPI call (as generated by the
Admin UI) looks correct, and the core.properties file is removed.

I don't see anything in the CHANGES.txt for this release which would imply
such a change in behavior, nor anything in the 6.6 reference guide, so it
looks like a bug.

-Simon

On Fri, Jun 9, 2017 at 5:14 AM, Andreas Hubold  wrote:

> Hi,
>
> I just tried to update from Solr 6.5.1 to Solr 6.6.0 and observed a
> changed behaviour with regard to unloading cores in Solr standalone mode.
>
> After unloading a core using the CoreAdmin API (e.g. via Admin UI), I
> still get search results for that core. It seems, the search request
> triggers automatic recreation of the core now. With Solr 6.5.1 search
> requests to unloaded cores were answered with 404 as expected.
>
> Can you confirm this being a bug in 6.6.0 or is this an intended change?
>
> Cheers,
> Andreas
>
>


Re: including a minus sign "-" in the token

2017-06-09 Thread Shawn Heisey
On 6/8/2017 8:39 PM, Phil Scadden wrote:
> We have important entities referenced in indexed documents which have 
> convention naming of geographicname-number. Eg Wainui-8
> I want the tokenizer to treat it as Wainui-8 when indexing, and when I search 
> I want to a q of Wainui-8 (must it be specified as Wainui\-8 ??) to return 
> docs with Wainui-8 but not with Wainui-9 or plain Wainui.
>
> Docs are pdfs, and I have using tika to extract text.
>
> How do I set up solr for queries like this?

At indexing time, Solr does not treat the hyphen as a special character
like it does at query time.  Many analysis components do, though.  If
your analysis chain includes certain components (the standard tokenizer,
the ICU tokenizer, and WordDelimeterFilter are on that list), then the
hypen may be treated as a word break character and the analysis could
remove it.

At query time, a hyphen in the middle of a word is not treated as a
special character.  It would need to be at the beginning of the query
text or after a space for the query parser to treat it as a negation. 
So Wainui-8 would not be a problem, but -7 would, and you'd need to
specify it as \-7 for it to work like you want.

Thanks,
Shawn



Re: including a minus sign "-" in the token

2017-06-09 Thread Susheel Kumar
Hi Phil,

The WordDelimiterFilterFactory (
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory)
can be used to avoid splitting at hypen etc along with
WhiteSpaceTokenizerFactory.  Use  generateWordParts="0"...

Thnx

On Thu, Jun 8, 2017 at 10:39 PM, Phil Scadden  wrote:

> We have important entities referenced in indexed documents which have
> convention naming of geographicname-number. Eg Wainui-8
> I want the tokenizer to treat it as Wainui-8 when indexing, and when I
> search I want to a q of Wainui-8 (must it be specified as Wainui\-8 ??) to
> return docs with Wainui-8 but not with Wainui-9 or plain Wainui.
>
> Docs are pdfs, and I have using tika to extract text.
>
> How do I set up solr for queries like this?
>
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>


Re: Will Solr support google like organic search ?

2017-06-09 Thread Toke Eskildsen
On Fri, 2017-06-09 at 03:51 -0700, Geepalem wrote:
> We have been asked by client to implement Organic Search like Google.
[...]
> Could you please answer below questions ASAP?
> • How Organic Search is different from the Free text search
> • Provide an example organic search what you mean.

I have never heard that designation, but I can type "Organic Search" in
Google and get https://en.wikipedia.org/wiki/Organic_search
which describes "Organic Search" as relevance based search, as opposed
to advertisement-affected search. The pure search result, one could
say.

Out-of-the-box Solr is pure relevance ranked. By the definition in the
Wikipedia-article, it is already Organic Search. I think you need to go
back to your client and ask what the client thinks "Organic Search" is.
-- 
Toke Eskildsen, Royal Danish Library


Will Solr support google like organic search ?

2017-06-09 Thread Geepalem
Hi Guys,

We have been asked by client to implement Organic Search like Google. As, we
are using Solr for Search implementation in project, I am trying to find out
whether its possible to implement Organic Search with Solr or not. Could you
please answer below questions ASAP?

•   Will Solr support google like organic search ?
•   How Organic Search is different from the Free text search
•   Provide an example organic search what you mean.
•   What are the projects you know have organic search implementation.


Thanks,
G. Naresh Kumar



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Will-Solr-support-google-like-organic-search-tp4339816.html
Sent from the Solr - User mailing list archive at Nabble.com.


IgnoreCommitOptimizeUpdateProcessorFactory

2017-06-09 Thread Neeraj Bhatt
Hi All

We want to use IgnoreCommitOptimizeUpdateProcessorFactory so that we can
ignore commits from client

But we want to give freedom to client to do a soft commit instead

So is there a way so that IgnoreCommitOptimizeUpdateProcessorFactory works
only on hard commit

Thanks
Neeraj


Solr 6.6 UNLOAD core broken?

2017-06-09 Thread Andreas Hubold

Hi,

I just tried to update from Solr 6.5.1 to Solr 6.6.0 and observed a 
changed behaviour with regard to unloading cores in Solr standalone mode.


After unloading a core using the CoreAdmin API (e.g. via Admin UI), I 
still get search results for that core. It seems, the search request 
triggers automatic recreation of the core now. With Solr 6.5.1 search 
requests to unloaded cores were answered with 404 as expected.


Can you confirm this being a bug in 6.6.0 or is this an intended change?

Cheers,
Andreas



Re: JMX property keys

2017-06-09 Thread Emir Arnautovic

Hi Ari,

It is common that way app is reporting metric is not monitoring 
friendly. It is not just how it is named but also some metrics require 
you to create statefull monitoring agent in order to be able to display 
them on time axis.


I am not aware that this can be overridden for Solr, but you can try 
some of existing Solr monitoring tools. One such tool is Sematext Cloud 
(http://sematext.com/spm/) with OOTB agent and charts for Solr Cloud. 
You can check if it meets your needs or use to see what can be done with 
data from Solr JMX.


HTH,
Emir


On 09.06.2017 05:50, Aristedes Maniatis wrote:

I want to monitor my Solr instances using JMX and graph performance. Using 
Zabbix notation, I end up with a key that looks like this:

jmx["solr/suburbs-1547_shard1_replica1:type=standard,id=org.apache.solr.handler.component.SearchHandler","5minRateReqsPerSecond"]


My problem here is that the key contains the replica id "_replica1". But this 
of course changes across all the hosts in the Solr Cloud, so monitoring is a real pain as 
I roll out nodes. I need to know which replica is running on which host.

Why is this so? Is there a way to override how the Solr cores expose themselves 
to JMX?

Please cc me since I'm not subscribed here.

Cheers
Ari





--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Configuration of parallel indexing threads

2017-06-09 Thread gigo314
Thanks a lot!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuration-of-parallel-indexing-threads-tp4338466p4339792.html
Sent from the Solr - User mailing list archive at Nabble.com.