'solr start' cause 'unsupported TCP/TPI info selection' error when execute lsof command

2016-06-15 Thread scott.chu

The lsof command on our CentOS 5.4 64bit server is v4.78. It doesn't support 
`lsof -PniTCP:$SOLR_PORT -sTCP:LISTEN' format in solr script, so when run 'solr 
start' it show a lot of message, mainly 'unsupported TCP/TPI info selection'. I 
google and find this link:

 [SOLR-7998] Solr start/stop script is currently incompatible with SUSE 
11 - ASF JIRA
https://issues.apache.org/jira/browse/SOLR-7998

So not only SUSE has this prolbem. Running 'yum install lsof' but it says 
already latest version. I know lsof 4.86 under CentOS 6 is ok. Is there any way 
to upgrade lsof under CentOS 5 other than using yum? Or any other way to work 
around this problem?

scott.chu,scott@udngroup.com
2016/6/16 (週四)


Regarding threadPoolSize for cross center data replication

2016-06-15 Thread Bharath Kumar
Hi,

I was trying to find what would be the best number of thread pool size that
needs to be configured on the source site in solrconfig.xml for cross
datacenter replication. We have one target replica and one shard, is it
recommended to have more than one thread?

If we have more than 1 thread, will the updates not get un-ordered on the
target site? Can you please let me know?

  
8
1000
128
  

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"


Re: Limit Solr to search for only 500 records based on the search criteria

2016-06-15 Thread Erick Erickson
This is pretty much logically impossible. I'd also suggest that your
response times
are tunable, even a very common word such as "AND" shouldn't be taking 18
seconds for 10M docs.

Say you're returning the top 100 doc. You can't know whether the last document
scored should be in the top 100 until you score it. So to tell Solr to
"stop searching
after 100 docs" would not return the best 100.

FWIW,
Erick

On Wed, Jun 15, 2016 at 5:14 PM, Thrinadh  Kuppili
 wrote:
> Hi,
>
> When i am trying to search in Solr based on given search criteria it is
> searching all the records which is taking some massive time to complete the
> Query.
>
> I want a solution where i can restrict search to find only till it find 500
> records and then solr should stop the querying.
>
> Since there are more than 10 million records and when searching for text
> like "AND" which is very common word. It is taking 18000 milliseconds some
> time even more but i need only the top 500 records which has contains "AND"
> in it
>
> Appreciate the help in advance.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Limit-Solr-to-search-for-only-500-records-based-on-the-search-criteria-tp4282519.html
> Sent from the Solr - User mailing list archive at Nabble.com.


ConcurrentMergeScheduler options not exposed

2016-06-15 Thread Shawn Heisey
On the IRC channel, I ran into somebody who was having problems with
optimizes on their Solr indexes taking a really long time.  When
investigating, they found that during the optimize, *reads* were
happening on their SSD disk at over 800MB/s, but *writes* were
proceeding at only 20 MB/s.

Looking into ConcurrentMergeScheduler, I discovered that it does indeed
have a default write throttle of only 20 MB/s.  I saw code that would
sometimes set the speed to unlimited, but had a hard time figuring out
what circumstances will result in the different settings, so based on
the user experience, I assume that the 20MB/s throttle must be applied
for Solr optimizes.

>From what I can see in the code, there's currently no way in
solrconfig.xml to configure scheduler options like the maximum write
speed.  Before I an open an issue to add additional configuration
options for the merge scheduler, I thought it might be a good idea to
just double-check with everyone here to see whether there's something I
missed.

This is likely even affecting people who are not using SSD storage. 
Most modern magnetic disks can easily exceed 20MB/s on both reads and
writes.  Some RAID arrays can write REALLY fast.

Thanks,
Shawn



Limit Solr to search for only 500 records based on the search criteria

2016-06-15 Thread Thrinadh Kuppili
Hi,

When i am trying to search in Solr based on given search criteria it is
searching all the records which is taking some massive time to complete the
Query. 

I want a solution where i can restrict search to find only till it find 500
records and then solr should stop the querying.

Since there are more than 10 million records and when searching for text
like "AND" which is very common word. It is taking 18000 milliseconds some
time even more but i need only the top 500 records which has contains "AND"
in it

Appreciate the help in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Limit-Solr-to-search-for-only-500-records-based-on-the-search-criteria-tp4282519.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Long STW GCs with Solr Cloud

2016-06-15 Thread Cas Rusnov
Hey Shawn! Thanks for replying.

Yes I meant HugePages not HugeTable, brain fart. I will give the
transparent off option a go.

I have attempted to use your CMS configs as is and also the default
settings and the cluster dies under our load (basically a node will get a
35-60s GC STW and then the others in the shard will take the load, and they
will in turn get long STWs until the shard dies), which is why basically in
a fit of desperation I tried out ParallelGC and found it to be half-way
acceptable. I will run a test using your configs (and the defaults) again
just to be sure (since I'm certain the machine config has changed since we
used your unaltered settings).

Thanks!
Cas


On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey  wrote:

> On 6/15/2016 3:05 PM, Cas Rusnov wrote:
> > After trying many of the off the shelf configurations (including CMS
> > configurations but excluding G1GC, which we're still taking the
> > warnings about seriously), numerous tweaks, rumors, various instance
> > sizes, and all the rest, most of which regardless of heap size and
> > newspace size resulted in frequent 30+ second STW GCs, we settled on
> > the following configuration which leads to occasional high GCs but
> > mostly stays between 10-20 second STWs every few minutes (which is
> > almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
> > -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
> > -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
> > -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
> > -Xmx31000m
>
> You mentioned something called "HugeTable" ... I assume you're talking
> about huge pages.  If that's what you're talking about, have you also
> turned off transparent huge pages?  If you haven't, you might want to
> completely disable huge pages in your OS.  There's evidence that the
> transparent option can affect performance.
>
> I assume you've probably looked at my GC info at the following URL:
>
> http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr
>
> The parallel collector is most definitely not a good choice.  It does
> not optimize for latency.  It's my understanding that it actually
> prefers full GCs, because it is optimized for throughput.  Solr thrives
> on good latency, throughput doesn't matter very much.
>
> If you want to continue avoiding G1, you should definitely be using
> CMS.  My recommendation right now would be to try the G1 settings on my
> wiki page under the heading "Current experiments" or the CMS settings
> just below that.
>
> The out-of-the-box GC tuning included with Solr 6 is probably a better
> option than the parallel collector you've got configured now.
>
> Thanks,
> Shawn
>
>


-- 

Cas Rusnov,

Engineer
[image: Manzama Logo] 

Visit our Resource Center .

US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
<+610293266264>

LinkedIn  | Twitter
| Facebook
| Google +
|
YouTube  |
Pinterest 


Re: Long STW GCs with Solr Cloud

2016-06-15 Thread Shawn Heisey
On 6/15/2016 3:05 PM, Cas Rusnov wrote:
> After trying many of the off the shelf configurations (including CMS
> configurations but excluding G1GC, which we're still taking the
> warnings about seriously), numerous tweaks, rumors, various instance
> sizes, and all the rest, most of which regardless of heap size and
> newspace size resulted in frequent 30+ second STW GCs, we settled on
> the following configuration which leads to occasional high GCs but
> mostly stays between 10-20 second STWs every few minutes (which is
> almost acceptable): -XX:+AggressiveOpts -XX:+UnlockDiagnosticVMOptions
> -XX:+UseAdaptiveSizePolicy -XX:+UseLargePages -XX:+UseParallelGC
> -XX:+UseParallelOldGC -XX:MaxGCPauseMillis=15000 -XX:MaxNewSize=12000m
> -XX:ParGCCardsPerStrideChunk=4096 -XX:ParallelGCThreads=16 -Xms31000m
> -Xmx31000m

You mentioned something called "HugeTable" ... I assume you're talking
about huge pages.  If that's what you're talking about, have you also
turned off transparent huge pages?  If you haven't, you might want to
completely disable huge pages in your OS.  There's evidence that the
transparent option can affect performance.

I assume you've probably looked at my GC info at the following URL:

http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning_for_Solr

The parallel collector is most definitely not a good choice.  It does
not optimize for latency.  It's my understanding that it actually
prefers full GCs, because it is optimized for throughput.  Solr thrives
on good latency, throughput doesn't matter very much.

If you want to continue avoiding G1, you should definitely be using
CMS.  My recommendation right now would be to try the G1 settings on my
wiki page under the heading "Current experiments" or the CMS settings
just below that.

The out-of-the-box GC tuning included with Solr 6 is probably a better
option than the parallel collector you've got configured now.

Thanks,
Shawn



RE: [E] Re: Question(s) about Highlighting

2016-06-15 Thread Jamal, Sarfaraz
Update on this:

I feel I have a good grasp of synonyms:

In that I am doing it only at query time and not at indexing time

It looks like this in Synonyms.txt
sarfaraz jamal,sasjamal, sas,sarfaraz,wiggidy

Each one of those bring back the exact same records.

However it only highlights Jamal (with a space in front of it) 

Is there a way I can get the highlight snippets for each of the 4 synonyms of 
each other?

Thank you !

Sas


-Original Message-
From: Jamal, Sarfaraz [mailto:sarfaraz.ja...@verizonwireless.com.INVALID] 
Sent: Friday, June 3, 2016 9:52 AM
To: solr-user@lucene.apache.org
Subject: RE: [E] Re: Question(s) about Highlighting

Good Morning Alessandro,

I verified it through the analysis tool (thanks for pointing it out), and it 
appears to be working correctly - As I see all of them as being synonyms of 
each other for this entry:

sasjamal, sarfaraz, sas

- When I do it only at indexing time, and disable it during query time (editing 
the synonyms.txt file SOLR6) - It does not treat them equally

When I do it at indexing and query time, it seems to work - but the highlight 
snippets stop working.

I believe it is working, MINUS the highlighting/snippets if that makes sense?

Thanks

Sarfaraz Jamal (Sas)
Revenue Assurance Tech Ops
614-560-8556
sarfaraz.ja...@verizonwireless.com

-Original Message-
From: Alessandro Benedetti [mailto:abenede...@apache.org]
Sent: Thursday, June 2, 2016 5:41 PM
To: solr-user@lucene.apache.org
Subject: [E] Re: Question(s) about Highlighting

Hi Jamal,
I assume you are using the Synonym token filter.
From the observation I can assume you are using it only at indexing time.
This means that when you index you are  :

1) given a row in the synonym.txt you index all the terms per row in place of 
any of the term in the row .

2) given any of the term in the left side of the expression, you index the term 
in the right side of the expression

You can verify this easily with the analysis tool in the Solr UI .



On Thu, Jun 2, 2016 at 7:50 PM, Jamal, Sarfaraz < 
sarfaraz.ja...@verizonwireless.com.invalid> wrote:

> I am having some difficulty understanding how to do something and if 
> it is even possible
>
> I have tried the following sets of Synonyms:
>
> 1.  sarfaraz, sas, sasjamal
> 2.  sasjamal,sas => Sarfaraz
>
> In the second instance, any searches with the world 'sasjamal' do not 
> appear in the results, as it has been converted to Sarfaraz (I
> believe) -
>

This means you don't use the same synonym.txt at query time. indeed sasjamal is 
not in the index at all.


> In the first instance it works better - I believe all instances of any 
> of those words  appear in the results. However the highlighted 
> snippets also stop working when any of those words are Matched. Is 
> there any documentation, insights or help about this issue?
>

I should verify that, it could be related the term offset.
Please take a look to the analysis tool as well to understand better how the 
offsets are assigned.
I remember long time ago there was a discussion about it and a bug or similar 
raised.

Cheers

>
> Thanks in advance,
>
> Sas
>
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Thursday, June 2, 2016 2:43 PM
> To: solr-user@lucene.apache.org
> Subject: [E] Re: MongoDB and Solr - Massive re-indexing
>
> On 6/2/2016 11:56 AM, Robert Brown wrote:
> > My question is whether sending batches of 1,000 documents to Solr is 
> > still beneficial (thinking about docs that may not change), or if I 
> > should look at the MongoDB connector for Solr, based on the volume 
> > of incoming data we see.
> >
> > Would the connector still see all docs updating if I re-insert them 
> > blindly, and thus still send all 50m documents back to Solr everyday 
> > anyway?
> >
> > Is my setup quite typical for the MongoDB connector?
>
> Sending update requests to Solr containing batches of 1000 docs is a 
> good idea.  Depending on how large they are, you may be able to send 
> even more than 1000.  If you can avoid sending documents that haven't 
> changed, Solr will likely perform better and relevance scoring will be 
> better, because you won't have as many deleted docs.
>
> The mongo connector is not software from the Solr project, or even 
> from Apache.  We don't know anything about it.  If you have questions 
> about that software, please contact the people who maintain it.  If 
> their answers lead to questions about Solr itself, then you can bring those 
> back here.
>
> Thanks,
> Shawn
>
>


--
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Long STW GCs with Solr Cloud

2016-06-15 Thread Cas Rusnov
I know this has been discussed in the past (although not too recently), but
the advices in those have failed us, so here we are.

Some basics:

We're running Solr 6 (6.0.0 48c80f91b8e5cd9b3a9b48e6184bd53e7619e7e3 -
nknize - 2016-04-01 14:41:49), on Java 8 (OpenJDK Runtime Environment
(build 1.8.0_72-internal-b15)), on at this point some rather large cloud
instances (8 cpu / 40gb ram).

Our general cluster layout is 3 nodes per shard, 6 shards, we have three
collections, but one primary collection which is heavily used and is
resulting in the GC situation we're seeing. There are roughly 55m documents
in this collection.

Our test load is multiple large, complicated queries which facet across
multiple fields.

After trying many of the off the shelf configurations (including CMS
configurations but excluding G1GC, which we're still taking the warnings
about seriously), numerous tweaks, rumors, various instance sizes, and all
the rest, most of which regardless of heap size and newspace size resulted
in frequent 30+ second STW GCs, we settled on the following configuration
which leads to occasional
high GCs but mostly stays between 10-20 second STWs every few minutes
(which is almost acceptable):

-XX:+AggressiveOpts
-XX:+UnlockDiagnosticVMOptions
-XX:+UseAdaptiveSizePolicy
-XX:+UseLargePages
-XX:+UseParallelGC
-XX:+UseParallelOldGC
-XX:MaxGCPauseMillis=15000
-XX:MaxNewSize=12000m
-XX:ParGCCardsPerStrideChunk=4096
-XX:ParallelGCThreads=16
-Xms31000m
-Xmx31000m

Note that HugeTable is working on the instances, and allocates
approximately the size of the java instance, and Java doesn't produce the
error that indicates that the HugeTable didn't work - getting this working
did provide a marginal improvement in performance.

Mostly we're wondering if there's something we missed something in the
configuration, and if anyone has experienced something similar! Thanks for
any help!

-- 

Cas Rusnov,

Engineer
[image: Manzama Logo] 

Visit our Resource Center .

US & Canada Office: +1 (541) 306-3271 <+15413063271> | UK Office: +44
(0)203 282 1633 <+4402032821633> | AUS Office: +61 02 9326 6264
<+610293266264>

LinkedIn  | Twitter
| Facebook
| Google +
|
YouTube  |
Pinterest 


Re: Multiple calls across the distributed nodes for a query

2016-06-15 Thread Raveendra Yerraguntla
Thank you Jeff . Let me try out how much of improvement I get out of single 
pass param

Sent from my iPhone

> On Jun 15, 2016, at 1:59 PM, Jeff Wartes  wrote:
> 
> Any distributed query falls into the two-phase process. Actually, I think 
> some components may require a third phase. (faceting?)
> 
> However, there are also cases where only a single pass is required. A 
> fl=id,score will only be a single pass, for example, since it doesn’t need to 
> get the field values.
> 
> https://issues.apache.org/jira/browse/SOLR-5768 would be a good place to read 
> about some of this, and provides a way to help force a one-pass even if you 
> need other fields.
> 
> 
>> On 6/15/16, 7:31 AM, "Raveendra Yerraguntla" 
>>  wrote:
>> 
>> I need help in understanding a query in solr cloud.
>> When user issues a query , there are are two phases of query - one with the
>> purpose(from debug info) of GET_TOP_FIELDS and another with GET_FIELDS.
>> 
>> This is having an effect on end to end performance of the application.
>> 
>> - what triggers (any components like facet, highlight, spellchecker ??? )
>> the two calls
>> - How can I make a query to be executed only with GET_FIELDS only .
> 


Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2016-06-15 Thread Erick Erickson
Did you try setting the "magic" field _route_ in your docs to the
shard? Something like
doc.addField("_route", "shard1")?

Best,
Erick

On Wed, Jun 15, 2016 at 10:31 AM, nikosmarinos  wrote:
> Is it possible to give an example? I want doc1 to be explicitly routed to
> "shard1" of my "implicit" collection and doc2 to "shard4". How can I do
> that?
>
> Creating an implicit collection with one of the example configurations of
> the solr package, defining the "id" field as the router.field (not sure if
> necessary) and indexing id:shard1 id:shard2 id:shard3 takes all documents to
> the same (random) shard.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/indexing-data-to-solrcloud-with-implicit-is-not-distributing-across-cluster-tp4232956p4282428.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple calls across the distributed nodes for a query

2016-06-15 Thread Jeff Wartes
Any distributed query falls into the two-phase process. Actually, I think some 
components may require a third phase. (faceting?)

However, there are also cases where only a single pass is required. A 
fl=id,score will only be a single pass, for example, since it doesn’t need to 
get the field values.

https://issues.apache.org/jira/browse/SOLR-5768 would be a good place to read 
about some of this, and provides a way to help force a one-pass even if you 
need other fields.


On 6/15/16, 7:31 AM, "Raveendra Yerraguntla"  
wrote:

>I need help in understanding a query in solr cloud.
>When user issues a query , there are are two phases of query - one with the
>purpose(from debug info) of GET_TOP_FIELDS and another with GET_FIELDS.
>
>This is having an effect on end to end performance of the application.
>
>- what triggers (any components like facet, highlight, spellchecker ??? )
>the two calls
>- How can I make a query to be executed only with GET_FIELDS only .



Re: indexing data to solrcloud with "implicit" is not distributing across cluster.

2016-06-15 Thread nikosmarinos
Is it possible to give an example? I want doc1 to be explicitly routed to
"shard1" of my "implicit" collection and doc2 to "shard4". How can I do
that? 

Creating an implicit collection with one of the example configurations of
the solr package, defining the "id" field as the router.field (not sure if
necessary) and indexing id:shard1 id:shard2 id:shard3 takes all documents to
the same (random) shard.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/indexing-data-to-solrcloud-with-implicit-is-not-distributing-across-cluster-tp4232956p4282428.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field exclusion from fl and hl.fl

2016-06-15 Thread Erick Erickson
I'll be happy to commit it if someone fixes the current problems with it.

Best,
Erick

On Sun, Mar 6, 2016 at 6:44 PM, Zheng Lin Edwin Yeo
 wrote:
> Thank you.
>
> Looking forward for this to be solved.
>
> Regards,
> Edwin
>
>
> On 7 March 2016 at 07:41, William Bell  wrote:
>
>> Can we get this over the goal line?
>>
>> https://issues.apache.org/jira/browse/SOLR-3191
>>
>> On Sun, Mar 6, 2016 at 3:16 AM, Zheng Lin Edwin Yeo 
>> wrote:
>>
>> > Hi,
>> >
>> > No, I tried that and I got the following error.
>> >
>> > {
>> >   "responseHeader":{
>> > "status":500,
>> > "QTime":0},
>> >   "error":{
>> > "msg":"For input string: \"-\"",
>> > "trace":"java.lang.NumberFormatException: For input string:
>> > \"-\"\r\n\tat
>> >
>> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)\r\n\tat
>> > java.lang.Long.parseLong(Long.java:581)\r\n\tat
>> > java.lang.Long.parseLong(Long.java:631)\r\n\tat
>> > org.apache.solr.search.StrParser.getNumber(StrParser.java:124)\r\n\tat
>> >
>> >
>> org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:298)\r\n\tat
>> >
>> >
>> org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:80)\r\n\tat
>> > org.apache.solr.search.QParser.getQuery(QParser.java:141)\r\n\tat
>> >
>> >
>> org.apache.solr.search.SolrReturnFields.add(SolrReturnFields.java:297)\r\n\tat
>> >
>> >
>> org.apache.solr.search.SolrReturnFields.parseFieldList(SolrReturnFields.java:113)\r\n\tat
>> >
>> >
>> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:99)\r\n\tat
>> >
>> >
>> org.apache.solr.search.SolrReturnFields.(SolrReturnFields.java:75)\r\n\tat
>> >
>> >
>> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:139)\r\n\tat
>> >
>> >
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:247)\r\n\tat
>> >
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)\r\n\tat
>> > org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)\r\n\tat
>> >
>> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)\r\n\tat
>> > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457)\r\n\tat
>> >
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:222)\r\n\tat
>> >
>> >
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\r\n\tat
>> > org.eclipse.jetty.server.Server.handle(Server.java:499)\r\n\tat
>> > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\r\n\tat
>> >
>> >
>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\r\n\tat
>> > java.lang.Thread.run(Thread.java:745)\r\n",
>> > "code":500}}
>> >
>> >
>> > Regards,
>> > Edwin
>> >
>> >
>> > On 6 March 2016 at 11:19, William Bell  wrote:
>> >
>> > > it used to support
>> > >
>> > > fl=*,-field
>> > >
>> > > Does that not work now?
>> > >
>> > > On Sat, Mar 5, 2016 at 7:37 PM, Zheng Lin Edwin Yeo <
>> > edwinye...@gmail.com>
>> > > wrote:
>> > >
>> > > > I have yet to find any workaround so far.Still have to list out all
>> the
>> > > > remaining fields one by one.
>> > > >
>> > > > Does anyone else has any suggestions?
>> > > >
>> > > > Regards,
>> > > > Edwin
>> > > >
>> > > >
>> > 

Re: [SolrCloud] shard hash ranges changed after restoring backup

2016-06-15 Thread Erick Erickson
Simplest, though a bit risky is to manually edit the znode and
correct the znode entry. There are various tools out there, including
one that ships with Zookeeper (see the ZK documentation).

Or you can use the zkcli scripts (the Zookeeper ones) to get the znode
down to your local machine, edit it there and then push it back up to ZK.

I'd do all this with my Solr nodes shut down, then insure that my ZK
ensemble was consistent after the update etc

Best,
Erick

On Wed, Jun 15, 2016 at 8:36 AM, Gary Yao  wrote:
> Hi all,
>
> My team at work maintains a SolrCloud 5.3.2 cluster with multiple
> collections configured with sharding and replication.
>
> We recently backed up our Solr indexes using the built-in backup
> functionality. After the cluster was restored from the backup, we
> noticed that atomic updates of documents are failing occasionally with
> the error message 'missing required field [...]'. The exceptions are
> thrown on a host on which the document to be updated is not stored. From
> this we are deducing that there is a problem with finding the right host
> by the hash of the uniqueKey. Indeed, our investigations so far showed
> that for at least one collection in the new cluster, the shards have
> different hash ranges assigned now. We checked the hash ranges by
> querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
> hash ranges of one collection that we debugged.
>
>   Old cluster:
> shard1_0 8000 - aaa9
> shard1_1  - d554
> shard2_0 d555 - fffe
> shard2_1  - 2aa9
> shard3_0 2aaa - 5554
> shard3_1  - 7fff
>
>   New cluster:
> shard1 8000 - aaa9
> shard2  - d554
> shard3 d555 - 
> shard4 0 - 2aa9
> shard5 2aaa - 5554
> shard6  - 7fff
>
>   Note that the shard names differ because the old cluster's shards were
>   split.
>
> As you can see, the ranges of shard3 and shard4 differ from the old
> cluster. This change of hash ranges matches with the symptoms we are
> currently experiencing.
>
> We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
> in which David Smiley comments:
>
>   shard hash ranges aren't restored; this error could be disasterous
>
> It seems that this is what happened to us. We would like to hear some
> suggestions on how we could recover from this problem.
>
> Best,
> Gary


Re: result grouping in sharded index

2016-06-15 Thread Jay Potharaju
Collapse would also not work since it requires all the data to be on the
same shard.
"In order to use these features with SolrCloud, the documents must be
located on the same shard. To ensure document co-location, you can define
the router.name parameter as compositeId when creating the collection. "

On Wed, Jun 15, 2016 at 3:03 AM, Tom Evans  wrote:

> Do you have to group, or can you collapse instead?
>
>
> https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results
>
> Cheers
>
> Tom
>
> On Tue, Jun 14, 2016 at 4:57 PM, Jay Potharaju 
> wrote:
> > Any suggestions on how to handle result grouping in sharded index?
> >
> >
> > On Mon, Jun 13, 2016 at 1:15 PM, Jay Potharaju 
> > wrote:
> >
> >> Hi,
> >> I am working on a functionality that would require me to group documents
> >> by a id field. I read that the ngroups feature would not work in a
> sharded
> >> index.
> >> Can someone recommend how to handle this in a sharded index?
> >>
> >>
> >> Solr Version: 5.5
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
> >>
> >> --
> >> Thanks
> >> Jay
> >>
> >>
> >
> >
> >
> > --
> > Thanks
> > Jay Potharaju
>



-- 
Thanks
Jay Potharaju


[SolrCloud] shard hash ranges changed after restoring backup

2016-06-15 Thread Gary Yao
Hi all,

My team at work maintains a SolrCloud 5.3.2 cluster with multiple
collections configured with sharding and replication.

We recently backed up our Solr indexes using the built-in backup
functionality. After the cluster was restored from the backup, we
noticed that atomic updates of documents are failing occasionally with
the error message 'missing required field [...]'. The exceptions are
thrown on a host on which the document to be updated is not stored. From
this we are deducing that there is a problem with finding the right host
by the hash of the uniqueKey. Indeed, our investigations so far showed
that for at least one collection in the new cluster, the shards have
different hash ranges assigned now. We checked the hash ranges by
querying /admin/collections?action=CLUSTERSTATUS. Find below the shard
hash ranges of one collection that we debugged.

  Old cluster:
shard1_0 8000 - aaa9
shard1_1  - d554
shard2_0 d555 - fffe
shard2_1  - 2aa9
shard3_0 2aaa - 5554
shard3_1  - 7fff

  New cluster:
shard1 8000 - aaa9
shard2  - d554
shard3 d555 - 
shard4 0 - 2aa9
shard5 2aaa - 5554
shard6  - 7fff

  Note that the shard names differ because the old cluster's shards were
  split.

As you can see, the ranges of shard3 and shard4 differ from the old
cluster. This change of hash ranges matches with the symptoms we are
currently experiencing.

We found this JIRA ticket https://issues.apache.org/jira/browse/SOLR-5750
in which David Smiley comments:

  shard hash ranges aren't restored; this error could be disasterous

It seems that this is what happened to us. We would like to hear some
suggestions on how we could recover from this problem.

Best,
Gary


Re: Update jar file in Solr 4.4.0

2016-06-15 Thread Erick Erickson
It sounds like you were somehow getting the old jar file
rather than the new one. It _should_ have worked to just
drop the new jar file in the directory, assuming you'd
removed all traces of the old jar file...

But if you have it working now, that's what counts.

FWIW,
Erick

On Tue, Jun 14, 2016 at 9:31 PM, thakkar.aayush
 wrote:
> Actually my changes in updateProcessor.0.1.jar were not reflecting
> (functionality wise). I was getting no errors.
>
> Well I dropped the jar file in shared folder only updateProcessor.0.1.jar .
> The entry added in solrconfig file was
> * class="org.apache.solr.update.processor.MyUpdateProcessorFactory">*
>
> In /updateProcessor.0.1.jar/ I had class file with the path
> /org.apache.solr.update.processor.MyUpdateProcessorFactory/
>
> However I have made some changes and it is working as expected.
> *Solution that worked for me:* I changed entry in solrconfig to
> * class="org.apache.solr.update.processor.MyUpdateProcessorFactory2">*
>
> Then created new jar file /updateProcessor.0.2.jar/ with following class:
> /org.apache.solr.update.processor.MyUpdateProcessorFactory2/
>
> Thanks for your help. I will check with team about zookeepers though :)
>
> Regards,
> Aayush
>
>
>
> First, having 5 Zookeeper nodes to manage 4 Solr nodes
> is serious overkill. Three should be more than sufficient.
>
> what did you put in your configuration? Does your 
> directive in solrconfig.xml mention updateProcessor.0.1?
>
> And what error are you seeing exactly?
>
> When Solr starts up, part of the voluminous messages
> are where exactly it looks for jar files. So you should
> be able to see exactly what Solr is aware of.
>
> If you didn't specify a  directive, one assumes you
> dropped the jar somewhere in the Tomcat hive. Is
> it in the right place? Did you restart Tomcat? (not sure
> this last is necessary, but just in case...)
>
> Best,
> Erick
>
> On Mon, Jun 13, 2016 at 7:22 PM, thakkar.aayush
> thakkar.aayush@ wrote:
>> I have Solr cloud configuration which we run on 4 servers. We use tomcat
>> as
>> web server for solr. I have 5 zookeepers to maintain the data-replication.
>> I
>> have added a jar file with custom update processor. This is in shared
>> folder
>> which is mention in solr.xml**While creating the first version of this jar
>> file I gave the name updateProcessor.0.1.jar as the file name. Even though
>> it was shared, jar files were added in all the 4 servers.But now I have to
>> update the updateProcessor. For this I created updateProcessor0.2.jar. I
>> deleted the updateProcessor.0.1.jar from each sever and added a new one.
>> But
>> changes were not seen ?Any ideas what I am doing wrong? Should this is be
>> checked using zkcli ?
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Update-jar-file-in-Solr-4-4-0-tp4282164.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Update-jar-file-in-Solr-4-4-0-tp4282164p4282328.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting exact match fields.

2016-06-15 Thread Alessandro Benedetti
In addition to what Erick correctly proposed,
are you storing norms for your field of interest ( to boost documents with
shorter field values )?
If you are, I find suspicious "Sony Ear Phones" to win over "Ear Phones"
for your "Ear Phones" query.
What are the other factors currently involved in your relevancy score
calculus ?

Cheers

On Tue, Jun 14, 2016 at 4:48 PM, Erick Erickson 
wrote:

> If these are the complete field, i.e. your document
> contains exactly "ear phones" and not "ear phones
> are great" use a copyField to put it into an "exact_match"
> field that uses a much simpler analysis chain based
> on KeywordTokenizer (plus, perhaps things like
> lowercaseFilter, maybe strip punctuation and the like".
> Then you add a clause on exact_match boosted
> really high.
>
> Best,
> Erick
>
> On Tue, Jun 14, 2016 at 1:01 AM, Naveen Pajjuri
>  wrote:
> > Hi,
> >
> > I have documents with a field (data type definition for that field is
> > below) values as ear phones, sony ear phones, philips ear phones. when i
> > query for earphones sony ear phones is the top result where as i want ear
> > phones as top result. please suggest how to boost exact matches. PS: I
> have
> > earphones => ear phones in my synonyms.txt and the datatype definition
> for
> > that field keywords is  > positionIncrementGap="100">   > "solr.WhitespaceTokenizerFactory"/>  class="solr.StopFilterFactory"
> > ignoreCase="true" words="stopwords.txt"/>  > "solr.LowerCaseFilterFactory"/>  > synonyms="synonyms.txt" ignoreCase="true" expand="true"/>  > "solr.RemoveDuplicatesTokenFilterFactory"/>   > "query">   class=
> > "solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
>  > class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true"
> > expand="true"/>   class=
> > "solr.RemoveDuplicatesTokenFilterFactory"/>  
> REGARDS,
> > Naveen
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Multiple calls across the distributed nodes for a query

2016-06-15 Thread Raveendra Yerraguntla
I need help in understanding a query in solr cloud.
When user issues a query , there are are two phases of query - one with the
purpose(from debug info) of GET_TOP_FIELDS and another with GET_FIELDS.

This is having an effect on end to end performance of the application.

- what triggers (any components like facet, highlight, spellchecker ??? )
the two calls
- How can I make a query to be executed only with GET_FIELDS only .


RE: wildcard search for string having spaces

2016-06-15 Thread Roshan Kamble
Great.
First option worked for me. I was trying with q=abc\sp*... it should be q=abc\ 
p*

Thanks

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Wednesday, June 15, 2016 6:25 PM
To: solr-user@lucene.apache.org; Roshan Kamble
Subject: Re: wildcard search for string having spaces

Hi Roshan,

I think there are two options:

1) escape the space q=abc\ p*
2) use prefix query parser q={!prefix f=my_string}abc p

Ahmet


On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble 
 wrote:
Hello,

I have below custom field type defined for solr 6.0.0

   

  
  


  
  

  


I am using above field to ensure that entire string is considered as single 
token and search should be case insensitive.

It works for most of the scnearios with wildcard search.
e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* 
gives this three results.

But I am not able to search with say abc p*

Search with query q="abc pqr" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like above 
example


i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.

 The information in this email is confidential and may be legally privileged. 
It is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.


Re: wildcard search for string having spaces

2016-06-15 Thread Ahmet Arslan
Hi Roshan,

I think there are two options:

1) escape the space q=abc\ p*
2) use prefix query parser q={!prefix f=my_string}abc p

Ahmet


On Wednesday, June 15, 2016 3:48 PM, Roshan Kamble 
 wrote:
Hello,

I have below custom field type defined for solr 6.0.0

   

  
  


  
  

  


I am using above field to ensure that entire string is considered as single 
token and search should be case insensitive.

It works for most of the scnearios with wildcard search.
e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* 
gives this three results.

But I am not able to search with say abc p*

Search with query q="abc pqr" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like above 
example


i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful. 


wildcard search for string having spaces

2016-06-15 Thread Roshan Kamble
Hello,

I have below custom field type defined for solr 6.0.0

   

  
  


  
  

  


I am using above field to ensure that entire string is considered as single 
token and search should be case insensitive.

It works for most of the scnearios with wildcard search.
e.g. if my data is "abc.pqr" and "abc_pqr" and "abc pqr" then search with abc* 
gives this three results.

But I am not able to search with say abc p*

Search with query q="abc pqr" gives exact match and desired result.

I want to do wildcard search where criteria can include spaces like above 
example


i.e. if space is present then I am not able to to wildcard search.

Is there any way by which wildcard search will be achieved even if space is 
present in token.

Regards,
Roshan

The information in this email is confidential and may be legally privileged. It 
is intended solely for the addressee. Access to this email by anyone else is 
unauthorised. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it, is 
prohibited and may be unlawful.


Re: result grouping in sharded index

2016-06-15 Thread Tom Evans
Do you have to group, or can you collapse instead?

https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results

Cheers

Tom

On Tue, Jun 14, 2016 at 4:57 PM, Jay Potharaju  wrote:
> Any suggestions on how to handle result grouping in sharded index?
>
>
> On Mon, Jun 13, 2016 at 1:15 PM, Jay Potharaju 
> wrote:
>
>> Hi,
>> I am working on a functionality that would require me to group documents
>> by a id field. I read that the ngroups feature would not work in a sharded
>> index.
>> Can someone recommend how to handle this in a sharded index?
>>
>>
>> Solr Version: 5.5
>>
>>
>> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
>>
>> --
>> Thanks
>> Jay
>>
>>
>
>
>
> --
> Thanks
> Jay Potharaju