Re: High CPU utilization on Upgrading to Solr Version 6.3

2017-08-02 Thread Atita Arora
Hi All ,

Just thought of giving quick update on this.
So we were able to *knock down this issue by using jvisualvm* which comes
with java .
So , we enabled monitoring  through jmx and the CPU profiling showed (as
attached in one of my previous emails) *Highlighting taking maximum
processing.*
Mysteriously , this was happening in highlighting-> merge which was invoked
through when we enabled *mergecontiguous=true* I'm still surprised as to
turning this only property false, resolved the issue and we happily went
live last week.

Later , as I found the code for this particular property is causing endless
recursions as I traced.

Please guide / share if you may have any other thoughts.

Thanks,
Atita



On Fri, Jul 28, 2017 at 7:18 PM, Shawn Heisey  wrote:

> On 7/27/2017 1:30 AM, Atita Arora wrote:
> > What OS is Solr running on?  I'm only asking because some additional
> > information I'm after has different gathering methods depending on OS.
> > Other questions:
> >
> > /*OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE
> > (1.8.0_141-b16), built on Jul 20 2017 21:47:59 by "mockbuild" with gcc
> > 4.4.7 20120313 (Red Hat 4.4.7-18)*/
> > /*Memory: 4k page, physical 264477520k(92198808k free), swap 0k(0k
> free)*/
>
> Linux is the easiest to get good information from.  Run the "top"
> program in a commandline session.  Press shift-M to sort by memory size,
> and grab a screenshot.  Share that screenshot with a file sharing site
> and give us the URL.
>
> > Is there only one Solr process per machine, or more than one?
> > /*On an average yes , one solr process per machine , however , we do
> > have a machine (where this log is taken) has two solr processes
> > (master and slave)*/
>
> Running a master and a slave on one machine does nothing for
> redundancy.  They need to be on separate machines for that to really
> help.  As for multiple processes per machine, tou can have many indexes
> in one Solr instance -- you don't need more than one in most cases.
>
> > How many total documents are managed by one machine?
> > */About 220945 per machine ( and double for this machine as it has
> > instance of master as well as other slave)/*
> >
> > How big is all the index data managed by one machine?
> > */The index is about 4G./*
>
> If less than a quarter of a million documents results in a 4GB index,
> those documents must be ENORMOUS, or else there is something strange
> going on.
>
> > What is the max heap on each Solr process?
> > */Max heap is 25G for each Solr Process. (Xms 25g Xmx 25g)/*
> > */
> > /*
> > The reason of choosing RAMDirectory was that it was used in the
> > similar manner while the production Solr was on Version 4.3.2, so no
> > particular reason but just replicated how it was working , never
> > thought this may give troubles.
>
> Set up the slaves just like the masters, with
> NRTCachingDirectoryFactory.  For a couple hundred thousand docs, you
> probably only need a 2GB heap, possibly even less.
>
> > I had included a pastebin of GC snapshot (the complete log was too big
> > to be included in the pastebin , so pasted a sampler)
>
> I asked for the full log because that's what I need to look deeper.  A
> sampler won't be enough.  There are file sharing websites for sharing
> larger content, and if you compress the file before uploading it, you
> should be able to achieve a fairly impressive compression ratio.
> Dropbox is generally a good choice for sharing fairly large content.
> Dropbox also works for image data, like the "top" screenshot I asked for
> above.
>
> > Another thing is as we observed the CPU cycles yesterday in high load
> > condition we observed that the Highlighter component was taking
> > longest , is there anything in particular we forgot to include that
> > highlighting doesn't gives a performance hit .
> > Attached is the snapshot taken from jvisualvm.
>
> Attachments rarely make it through the mailing list.  Yours didn't, so I
> cannot see that snapshot.
>
> I do not know anything about highlighting, so I cannot comment on how
> much CPU it takes.  I've never used the feature.
>
> My best idea about why your CPU is so high is problems with garbage
> collection.  To look into that, I need to have the full GC log.  The
> rest of the information I've asked for will help focus my efforts.
>
> Thanks,
> Shawn
>
>


Re: Limiting the number of queries/updates to Solr

2017-08-02 Thread Hrishikesh Gadre
At one point I was working on SOLR-7344
 (but it fell off the
radar due to various reasons). Specifically I built a servlet request
filter which implements a customizable queuing mechanism using asynchronous
servlet API (Servlet 3 spec). This way you can define how many concurrent
requests of a specific type (e.g. query, indexing etc.) you want to
process. This can also be extended at a core (or collection) level.

https://github.com/hgadre/servletrequest-scheduler


If this is something interesting and useful for the community, I would be
more than happy to help moving this forward. Otherwise I would like to get
any feedback for possible improvements (or drawbacks) etc.

Thanks
Hrishikesh




On Wed, Aug 2, 2017 at 9:45 PM, Walter Underwood 
wrote:

>
> > On Aug 2, 2017, at 8:33 PM, Shawn Heisey  wrote:
> >
> > IMHO, intentionally causing connections to fail when a limit is exceeded
> > would not be a very good idea.  When the rate gets too high, the first
> > thing that happens is all the requests slow down.  The slowdown could be
> > dramatic.  As the rate continues to increase, some of the requests
> > probably would begin to fail.
>
> No, this is a very good idea. It is called “load shedding” or “fail fast”.
> Gracefully dealing with overload is an essential part of system design.
>
> At Netflix, with a pre-Jetty Solr (war file running under Tomcat), we took
> down 40 front end servers with slow response times from the Solr server
> farm. We tied up all the front end threads waiting on responses from the
> Solr servers. That left no front end threads available to respond to
> incoming HTTP requests. It was not a fun evening.
>
> To fix this, we configured the Citrix load balancer to overflow to a
> different server when the outstanding back-end requests hit a limit. The
> overflow server was a virtual server that immediately returned a 503. That
> would free up front end connections and threads in an overload condition.
> The users would get a “search unavailable” page, but the rest of the site
> would continue to work.
>
> Unfortunately, the AWS load balancers don’t offer anything like this, ten
> years later.
>
> The worst case version of this is a stable congested state. It is pretty
> easy to put requests into a queue (connection/server) that are guaranteed
> to time out before they are serviced. If you have 35 requests in the queue,
> a 1 second service time, and a 30 second timeout, those requests are
> already dead when you put them on the queue.
>
> I learned about this when I worked with John Nagle at Ford Aerospace. I
> recommend his note “On Packet Switches with Infinite Storage” (1985) for
> the full story. It is only eight pages long, but packed with goodness.
>
> https://tools.ietf.org/html/rfc970
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>


Re: Limiting the number of queries/updates to Solr

2017-08-02 Thread Walter Underwood

> On Aug 2, 2017, at 8:33 PM, Shawn Heisey  wrote:
> 
> IMHO, intentionally causing connections to fail when a limit is exceeded
> would not be a very good idea.  When the rate gets too high, the first
> thing that happens is all the requests slow down.  The slowdown could be
> dramatic.  As the rate continues to increase, some of the requests
> probably would begin to fail.

No, this is a very good idea. It is called “load shedding” or “fail fast”. 
Gracefully dealing with overload is an essential part of system design.

At Netflix, with a pre-Jetty Solr (war file running under Tomcat), we took down 
40 front end servers with slow response times from the Solr server farm. We 
tied up all the front end threads waiting on responses from the Solr servers. 
That left no front end threads available to respond to incoming HTTP requests. 
It was not a fun evening.

To fix this, we configured the Citrix load balancer to overflow to a different 
server when the outstanding back-end requests hit a limit. The overflow server 
was a virtual server that immediately returned a 503. That would free up front 
end connections and threads in an overload condition. The users would get a 
“search unavailable” page, but the rest of the site would continue to work.

Unfortunately, the AWS load balancers don’t offer anything like this, ten years 
later.

The worst case version of this is a stable congested state. It is pretty easy 
to put requests into a queue (connection/server) that are guaranteed to time 
out before they are serviced. If you have 35 requests in the queue, a 1 second 
service time, and a 30 second timeout, those requests are already dead when you 
put them on the queue.

I learned about this when I worked with John Nagle at Ford Aerospace. I 
recommend his note “On Packet Switches with Infinite Storage” (1985) for the 
full story. It is only eight pages long, but packed with goodness.

https://tools.ietf.org/html/rfc970

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Re: Limiting the number of queries/updates to Solr

2017-08-02 Thread Shawn Heisey
On 8/2/2017 8:41 PM, S G wrote:
> Problem is that peak load estimates are just estimates.
> It would be nice to enforce them from Solr side such that if a rate higher 
> than that is seen at any core, the core will automatically begin to reject 
> the requests.
> Such a feature would contribute to cluster stability while making sure the 
> customer gets an exception to remind them of a slower rate.

Solr doesn't have anything like this.  This is primarily because there
is no network server code in Solr.  The networking is provided by the
servlet container.  The container in modern Solr versions is nearly
guaranteed to be Jetty.  As long as I have been using Solr, it has
shipped with a Jetty container.

https://wiki.apache.org/solr/WhyNoWar

I have no idea whether Jetty is capable of the kind of rate limiting
you're after.  If it is, it would be up to you to figure out the
configuration.

You could always put a proxy server like haproxy in front of Solr.  I'm
pretty sure that haproxy is capable rejecting connections when the
request rate gets too high.  Other proxy servers (nginx, apache, F5
BigIP, solutions from Microsoft, Cisco, etc) are probably also capable
of this.

IMHO, intentionally causing connections to fail when a limit is exceeded
would not be a very good idea.  When the rate gets too high, the first
thing that happens is all the requests slow down.  The slowdown could be
dramatic.  As the rate continues to increase, some of the requests
probably would begin to fail.

What you're proposing would be guaranteed to cause requests to fail. 
Failing requests are even more likely than slow requests to result in
users finding a new source for whatever service they are getting from
your organization.

Your customer teams might not be able to control the request rate, as it
would probably be related to the number of users who connect to their
services.  It seems like a better option to inform a team that they have
exceeded their request estimates and that they will need to come up with
additional budget so more hardware can be deployed.  If that doesn't
happen, then their service may suffer, and it will not be your fault.

The RateLimiter class in Lucene that you mentioned is designed to limit
the I/O rate of disk or network data transfers, not a request rate.  One
of the most visible uses of this capability in Solr is the ability to
limit the transfer rate of the old-style index replication.  It is also
used in Lucene to slow down the disk I/O usage of segment merging.

A custom Solr component could be built that can be added to a request
handler that does what you're proposing.  If you wanted to write such a
component, you could donate it to the project and try to get it included
in Solr.  Even though I believe such a feature is a bad idea, I'm sure
it would be loved by some users.

Thanks,
Shawn



Limiting the number of queries/updates to Solr

2017-08-02 Thread S G
Hi,

My team provides Solr clusters to several other teams in my company.
We get peak-requirements for query-rate and update-rate from our customers
and load-test the cluster based on the same.
This helps us arrive at a cluster suitable for a given peak load.

Problem is that peak load estimates are just estimates.
It would be nice to enforce them from Solr side such that if a rate higher
than that is seen at any core, the core will automatically begin to reject
the requests.
Such a feature would contribute to cluster stability while making sure the
customer gets an exception to remind them of a slower rate.

A configuration like the following in managed-schema or solrconfig.xml
would be great:

  
  
  
  


If the rate exceeds the above limits, an exception like the following
should be thrown: "Cannot process more than 500 updates/second. Please slow
down or raise the coreRateLimiter.update limits in solrconfig.xml'

Is
https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/store/RateLimiter.SimpleRateLimiter.html
a step in that direction?

Thanks
SG


Re: Arabic words search in solr

2017-08-02 Thread Tim Casey
There should be a way to use a phrasal query for the specific names.

On Wed, Aug 2, 2017 at 2:15 PM, Phil Scadden  wrote:

> Hopefully changing to default AND solves your problem. If so, I would be
> quite interested in what your index config looks like in the end. I also
> have upcoming need to index Arabic words.
>
> -Original Message-
> From: mohanmca01 [mailto:mohanmc...@gmail.com]
> Sent: Thursday, 3 August 2017 12:58 a.m.
> To: solr-user@lucene.apache.org
> Subject: RE: Arabic words search in solr
>
> Hi Phil Scadden,
>
>  Thank you for your reply,
>
> we tried your suggested solution by removing hyphen while indexing, but it
> was getting wrong results. i was searching for "شرطة ازكي" and it was
> showing me the result that am looking for, plus irrelevant result which
> either have the first or second word that i have typed while searching.
>
> First word: شرطة
> Second Word: ازكي
>
> results that we are getting:
>
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 3,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(شرطة ازكي)",
>   "_": "1501678260335",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 444,
> "start": 0,
> "docs": [
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> - مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>   {
> "id": "13937",
> "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
> "_version_": 157462113219720
>   },
>   {
> "id": "15914",
> "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
> "_version_": 1574621132344000500
>   },
>   {
> "id": "20639",
> "bizNameAr": "سحائب ازكي للتجارة",
> "_version_": 1574621132574687200
>   },
>   {
> "id": "25108",
> "bizNameAr": "المستشفيات -  - مستشفى إزكي",
> "_version_": 1574621132737216500
>   },
>   {
> "id": "27629",
> "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
> "_version_": 1574621132833685500
>   },
>   {
> "id": "36351",
> "bizNameAr": "طوارئ الكهرباء - إزكي",
> "_version_": 157462113318391
>   },
>   {
> "id": "61235",
> "bizNameAr": "اضواء ازكي للتجارة",
> "_version_": 1574621133785792500
>   },
>   {
> "id": "66821",
> "bizNameAr": "أطلال إزكي للتجارة",
> "_version_": 1574621133915816000
>   },
>   {
> "id": "67011",
> "bizNameAr": "بنك ظفار - فرع ازكي",
> "_version_": 1574621133920010200
>   }
> ]
>   }
> }
>
> Actually  we expecting the below results only since it has both the words
> that we typed while searching:
>
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> - مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>
>
> Configuration:
>
> In schema.xml we configured as below:
>
> 
>
>
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_ar.txt" />
> 
> 
> 
> 
>  replacement="ئ"/>
>  replacement=""/>
>   
> 
>
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> Notice: This email and any attachments are confidential and may not be
> used, published or redistributed without the prior written consent of the
> Institute of Geological and Nuclear Sciences Limited (GNS Science). If
> received in error please destroy and immediately notify GNS Science. Do not
> copy or disclose the contents.
>


RE: Arabic words search in solr

2017-08-02 Thread Phil Scadden
Hopefully changing to default AND solves your problem. If so, I would be quite 
interested in what your index config looks like in the end. I also have 
upcoming need to index Arabic words.

-Original Message-
From: mohanmca01 [mailto:mohanmc...@gmail.com]
Sent: Thursday, 3 August 2017 12:58 a.m.
To: solr-user@lucene.apache.org
Subject: RE: Arabic words search in solr

Hi Phil Scadden,

 Thank you for your reply,

we tried your suggested solution by removing hyphen while indexing, but it was 
getting wrong results. i was searching for "شرطة ازكي" and it was showing me 
the result that am looking for, plus irrelevant result which either have the 
first or second word that i have typed while searching.

First word: شرطة
Second Word: ازكي

results that we are getting:


{
  "responseHeader": {
"status": 0,
"QTime": 3,
"params": {
  "indent": "true",
  "q": "bizNameAr:(شرطة ازكي)",
  "_": "1501678260335",
  "wt": "json"
}
  },
  "response": {
"numFound": 444,
"start": 0,
"docs": [
  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  - 
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },
  {
"id": "13937",
"bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
"_version_": 157462113219720
  },
  {
"id": "15914",
"bizNameAr": "العلوي والازكي المتحدة ش.م.م",
"_version_": 1574621132344000500
  },
  {
"id": "20639",
"bizNameAr": "سحائب ازكي للتجارة",
"_version_": 1574621132574687200
  },
  {
"id": "25108",
"bizNameAr": "المستشفيات -  - مستشفى إزكي",
"_version_": 1574621132737216500
  },
  {
"id": "27629",
"bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
"_version_": 1574621132833685500
  },
  {
"id": "36351",
"bizNameAr": "طوارئ الكهرباء - إزكي",
"_version_": 157462113318391
  },
  {
"id": "61235",
"bizNameAr": "اضواء ازكي للتجارة",
"_version_": 1574621133785792500
  },
  {
"id": "66821",
"bizNameAr": "أطلال إزكي للتجارة",
"_version_": 1574621133915816000
  },
  {
"id": "67011",
"bizNameAr": "بنك ظفار - فرع ازكي",
"_version_": 1574621133920010200
  }
]
  }
}

Actually  we expecting the below results only since it has both the words that 
we typed while searching:

  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  - 
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },


Configuration:

In schema.xml we configured as below:





  








  



Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
Sent from the Solr - User mailing list archive at Nabble.com.
Notice: This email and any attachments are confidential and may not be used, 
published or redistributed without the prior written consent of the Institute 
of Geological and Nuclear Sciences Limited (GNS Science). If received in error 
please destroy and immediately notify GNS Science. Do not copy or disclose the 
contents.


Re: generate field name in query

2017-08-02 Thread Rick Leir
Peter
The common setup is to use copyfield from all your fields into a 'grab bag' 
containing everything, and then to search on it alone. Cheers -- Rick

On August 2, 2017 7:31:10 AM EDT, Peter Kirk  wrote:
>Hi - is it possible to create a query (or fq) which generates the field
>to search on, based on whether or not the document has that field?
>
>Eg. Search for documents with prices in the range 100 - 200, using
>either the field "price_owner_float" or "price_customer_float" (if a
>document has a field "price_owner_float" then use that, otherwise use
>the field "price_customer_float").
>
>This gives a syntax error:
>fq=if(exists(price_owner_float),price_owner_float,price_customer_float):[100
>TO 200]
>
>Thanks,
>Peter

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

default values in multiValue field

2017-08-02 Thread Steve Pruitt
Are default values supported for fields defined as multivalued text?  I can't 
get it to work.
Scouring the documentation, I found nothing indicating the two attributes are 
mutually exclusive.
I found a couple of online examples indicating the two attributes can be used 
together.  

I have this field type defined in my schema.



and this input


:

:


The default value is not set.

I changed the schema to a string type, but it still doesn't work.




Thanks.

-Steve


RE: Arabic words search in solr

2017-08-02 Thread Allison, Timothy B.
+1

I was hoping to use this as a case for arguing for turning off an overly 
aggressive stemmer, but I checked on your 10 docs and query, and David is 
right, of course -- if you change the default operator to AND, you only get the 
one document back that you had intended to.

I can still use this as a case for getting on my Unicode normalization soapbox 
and +1'ing your use of the ICUFoldingFilter.  With no token filters, you get 4 
results; when you add the ICUFoldingFilter, you get 8 results; and when you add 
in the Arabic stemmer, you get all 10.  Not that you need this, but see slide 
33 of [1], where we show 78 Unicode variants for "America" in ~800k docs in an 
Arabic script language.  Without Unicode normalization, users might get 1/2 the 
documents back or far, far fewer...and they wouldn't even know what they were 
missing!

[1] 
https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com] 
Sent: Wednesday, August 2, 2017 9:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Arabic words search in solr

perhaps change your default operator to AND instead of OR if thats what you are 
expecting for a result

On Wed, Aug 2, 2017 at 8:57 AM, mohanmca01  wrote:

> Hi Phil Scadden,
>
>  Thank you for your reply,
>
> we tried your suggested solution by removing hyphen while indexing, 
> but it was getting wrong results. i was searching for "شرطة ازكي" and 
> it was showing me the result that am looking for, plus irrelevant 
> result which either have the first or second word that i have typed while 
> searching.
>
> First word: شرطة
> Second Word: ازكي
>
> results that we are getting:
>
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 3,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(شرطة ازكي)",
>   "_": "1501678260335",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 444,
> "start": 0,
> "docs": [
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  
> -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>   {
> "id": "13937",
> "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
> "_version_": 157462113219720
>   },
>   {
> "id": "15914",
> "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
> "_version_": 1574621132344000500
>   },
>   {
> "id": "20639",
> "bizNameAr": "سحائب ازكي للتجارة",
> "_version_": 1574621132574687200
>   },
>   {
> "id": "25108",
> "bizNameAr": "المستشفيات -  - مستشفى إزكي",
> "_version_": 1574621132737216500
>   },
>   {
> "id": "27629",
> "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
> "_version_": 1574621132833685500
>   },
>   {
> "id": "36351",
> "bizNameAr": "طوارئ الكهرباء - إزكي",
> "_version_": 157462113318391
>   },
>   {
> "id": "61235",
> "bizNameAr": "اضواء ازكي للتجارة",
> "_version_": 1574621133785792500
>   },
>   {
> "id": "66821",
> "bizNameAr": "أطلال إزكي للتجارة",
> "_version_": 1574621133915816000
>   },
>   {
> "id": "67011",
> "bizNameAr": "بنك ظفار - فرع ازكي",
> "_version_": 1574621133920010200
>   }
> ]
>   }
> }
>
> Actually  we expecting the below results only since it has both the 
> words that we typed while searching:
>
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  
> -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>
>
> Configuration:
>
> In schema.xml we configured as below:
>
>  stored="true"/>
>
>
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_ar.txt" />
> 
> 
> 
> 
>  pattern="ى"
> replacement="ئ"/>
>  pattern="ء"
> replacement=""/>
>   
> 
>
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Replication Question

2017-08-02 Thread Michael B. Klein
And the one that isn't getting the updates is the one marked in the cloud
diagram as the leader.

/me bangs head on desk

On Wed, Aug 2, 2017 at 10:31 AM, Michael B. Klein  wrote:

> Another observation: After bringing the cluster back up just now, the
> "1-in-3 nodes don't get the updates" issue persists, even with the cloud
> diagram showing 3 nodes, all green.
>
> On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Klein 
> wrote:
>
>> Thanks for your responses, Shawn and Erick.
>>
>> Some clarification questions, but first a description of my
>> (non-standard) use case:
>>
>> My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are
>> working well so far on the production cluster (knock wood); its the staging
>> cluster that's giving me fits. Here's why: In order to save money, I have
>> the AWS auto-scaler scale the cluster down to zero nodes when it's not in
>> use. Here's the (automated) procedure:
>>
>> SCALE DOWN
>> 1) Call admin/collections?action=BACKUP for each collection to a shared
>> NFS volume
>> 2) Shut down all the nodes
>>
>> SCALE UP
>> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
>> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
>> live_nodes
>> 3) Call admin/collections?action=RESTORE to put all the collections back
>>
>> This has been working very well, for the most part, with the following
>> complications/observations:
>>
>> 1) If I don't optimize each collection right before BACKUP, the backup
>> fails (see the attached solr_backup_error.json).
>> 2) If I don't specify a replicationFactor during RESTORE, the admin
>> interface's Cloud diagram only shows one active node per collection. Is
>> this expected? Am I required to specify the replicationFactor unless I'm
>> using a shared HDFS volume for solr data?
>> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
>> message in the response, even though the restore seems to succeed.
>> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
>> not currently have any replication stuff configured (as it seems I should
>> not).
>> 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
>> diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
>> think all replicas were live and synced and happy, and because I was
>> accessing solr through a round-robin load balancer, I was never able to
>> tell which node was out of sync.
>>
>> If it happens again, I'll make node-by-node requests and try to figure
>> out what's different about the failing one. But the fact that this happened
>> (and the way it happened) is making me wonder if/how I can automate this
>> automated staging environment scaling reliably and with confidence that it
>> will Just Work™.
>>
>> Comments and suggestions would be GREATLY appreciated.
>>
>> Michael
>>
>>
>>
>> On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
>> wrote:
>>
>>> And please do not use optimize unless your index is
>>> totally static. I only recommend it when the pattern is
>>> to update the index periodically, like every day or
>>> something and not update any docs in between times.
>>>
>>> Implied in Shawn's e-mail was that you should undo
>>> anything you've done in terms of configuring replication,
>>> just go with the defaults.
>>>
>>> Finally, my bet is that your problematic Solr node is misconfigured.
>>>
>>> Best,
>>> Erick
>>>
>>> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey 
>>> wrote:
>>> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>>> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most
>>> stuff
>>> >> seems to be working OK, except that one of the nodes never seems to
>>> get its
>>> >> replica updated.
>>> >>
>>> >> Queries take place through a non-caching, round-robin load balancer.
>>> The
>>> >> collection looks fine, with one shard and a replicationFactor of 3.
>>> >> Everything in the cloud diagram is green.
>>> >>
>>> >> But if I (for example) select?q=id:hd76s004z, the results come up
>>> empty 1
>>> >> out of every 3 times.
>>> >>
>>> >> Even several minutes after a commit and optimize, one replica still
>>> isn’t
>>> >> returning the right info.
>>> >>
>>> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
>>> options on
>>> >> the `/replication` requestHandler, or is that a non-solrcloud,
>>> >> standalone-replication thing?
>>> >
>>> > This is one of the more confusing aspects of SolrCloud.
>>> >
>>> > When everything is working perfectly in a SolrCloud install, the
>>> feature
>>> > in Solr called "replication" is *never* used.  SolrCloud does require
>>> > the replication feature, though ... which is what makes this whole
>>> thing
>>> > very confusing.
>>> >
>>> > Replication is used to replicate an entire Lucene index (consisting of
>>> a
>>> > bunch of files on the disk) from a core on a master server to a core on
>>> > a slave server.  This is how replication was done before SolrCloud was
>>> > created.
>>> >
>>> > The way

Re: Solr 4.10.4 export handler NPE

2017-08-02 Thread Erick Erickson
That the JIRA says 5.5 just means that's the version the submitter was
using when the NPE was encountered. No workarounds that I know of, but
that code has changed drastically since the 4.10 days so I really have
no clue. Any chance of trying it on a 6.6 release?

Best,
Erick

On Tue, Aug 1, 2017 at 8:43 PM, Lasitha Wattaladeniya  wrote:
> Hi devs,
>
> I was exploring the /export handler in solr and got an exception. When I
> research online I found this open jira case : SOLR-8860
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/SOLR-8806
>
> is this a valid jira case? Any workarounds?
>
> Jira says affect version is 5.5 but I'm getting this in 4.10.4 also
>
>
> Regards,
> Lasitha


Re: Replication Question

2017-08-02 Thread Michael B. Klein
Another observation: After bringing the cluster back up just now, the
"1-in-3 nodes don't get the updates" issue persists, even with the cloud
diagram showing 3 nodes, all green.

On Wed, Aug 2, 2017 at 9:56 AM, Michael B. Klein  wrote:

> Thanks for your responses, Shawn and Erick.
>
> Some clarification questions, but first a description of my (non-standard)
> use case:
>
> My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are
> working well so far on the production cluster (knock wood); its the staging
> cluster that's giving me fits. Here's why: In order to save money, I have
> the AWS auto-scaler scale the cluster down to zero nodes when it's not in
> use. Here's the (automated) procedure:
>
> SCALE DOWN
> 1) Call admin/collections?action=BACKUP for each collection to a shared
> NFS volume
> 2) Shut down all the nodes
>
> SCALE UP
> 1) Spin up 2 Zookeeper nodes and wait for them to stabilize
> 2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
> live_nodes
> 3) Call admin/collections?action=RESTORE to put all the collections back
>
> This has been working very well, for the most part, with the following
> complications/observations:
>
> 1) If I don't optimize each collection right before BACKUP, the backup
> fails (see the attached solr_backup_error.json).
> 2) If I don't specify a replicationFactor during RESTORE, the admin
> interface's Cloud diagram only shows one active node per collection. Is
> this expected? Am I required to specify the replicationFactor unless I'm
> using a shared HDFS volume for solr data?
> 3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
> message in the response, even though the restore seems to succeed.
> 4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
> not currently have any replication stuff configured (as it seems I should
> not).
> 5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
> diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
> think all replicas were live and synced and happy, and because I was
> accessing solr through a round-robin load balancer, I was never able to
> tell which node was out of sync.
>
> If it happens again, I'll make node-by-node requests and try to figure out
> what's different about the failing one. But the fact that this happened
> (and the way it happened) is making me wonder if/how I can automate this
> automated staging environment scaling reliably and with confidence that it
> will Just Work™.
>
> Comments and suggestions would be GREATLY appreciated.
>
> Michael
>
>
>
> On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
> wrote:
>
>> And please do not use optimize unless your index is
>> totally static. I only recommend it when the pattern is
>> to update the index periodically, like every day or
>> something and not update any docs in between times.
>>
>> Implied in Shawn's e-mail was that you should undo
>> anything you've done in terms of configuring replication,
>> just go with the defaults.
>>
>> Finally, my bet is that your problematic Solr node is misconfigured.
>>
>> Best,
>> Erick
>>
>> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
>> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
>> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
>> >> seems to be working OK, except that one of the nodes never seems to
>> get its
>> >> replica updated.
>> >>
>> >> Queries take place through a non-caching, round-robin load balancer.
>> The
>> >> collection looks fine, with one shard and a replicationFactor of 3.
>> >> Everything in the cloud diagram is green.
>> >>
>> >> But if I (for example) select?q=id:hd76s004z, the results come up
>> empty 1
>> >> out of every 3 times.
>> >>
>> >> Even several minutes after a commit and optimize, one replica still
>> isn’t
>> >> returning the right info.
>> >>
>> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
>> options on
>> >> the `/replication` requestHandler, or is that a non-solrcloud,
>> >> standalone-replication thing?
>> >
>> > This is one of the more confusing aspects of SolrCloud.
>> >
>> > When everything is working perfectly in a SolrCloud install, the feature
>> > in Solr called "replication" is *never* used.  SolrCloud does require
>> > the replication feature, though ... which is what makes this whole thing
>> > very confusing.
>> >
>> > Replication is used to replicate an entire Lucene index (consisting of a
>> > bunch of files on the disk) from a core on a master server to a core on
>> > a slave server.  This is how replication was done before SolrCloud was
>> > created.
>> >
>> > The way that SolrCloud keeps replicas in sync is *entirely* different.
>> > SolrCloud has no masters and no slaves.  When you index or delete a
>> > document in a SolrCloud collection, the request is forwarded to the
>> > leader of the correct shard for that document.  The leader then sends a
>> > copy of that request to all th

Re: Custom Sort option to apply at SOLR index

2017-08-02 Thread Erick Erickson
I guess I don't see the problem, just store it as a string and sort on
the field.

# sorts before numbers which sort before characters. Or I'm reading
the ASCII chart wrong.

Best,
Erick

On Wed, Aug 2, 2017 at 6:55 AM, padmanabhan
 wrote:
> Hello Solr Geeks,
>
> Am newbie to SOLR. I have a requirement as given below, Could any one please
> provide some insights on how to go about on this.
>
> "Ascending by name" (#, 0 - 9, A - Z)
>
> "Descending by name" (Z - A, 9 - 0, #)
>
> Sample name value can be
>
> ABCD5678
> 1234ABCD
> #2345ABCD
> #1234ABCD
> 5678ABCD
> #2345ACBD
> 5678EFGH
> #2345DBCA
> ABCD1234
> 1234#ABCD
>
> *Expected Ascending order*
>
> #2345ABCD
> #2345ACBD
> #2345DBCA
> 1234#ABCD
> 1234ABCD
> 5678ABCD
> ABCD1234
> ABCD5678
>
> *Expected Descending order*
>
> ABCD5678
> ABCD1234
> 5678ABCD
> 1234ABCD
> 1234#ABCD
> #2345DBCA
> #2345ACBD
> #2345ABCD
>
> Thanks & Regards,
> Paddy
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Custom-Sort-option-to-apply-at-SOLR-index-tp4348787.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Move index directory to another partition

2017-08-02 Thread Erick Erickson
Shawn:

Not entirely sure about AWS intricacies, but getting a new replica to
use a particular index directory in the general case is just
specifying dataDir=some_directory on the ADDREPLICA command. The index
just needs an HTTP connection (uses the old replication process) so
nothing huge there. Then DELETEREPLICA for the old one. There's
nothing that ZK has to know about to make this work, it's all local to
the Solr instance.

Or I'm completely out in the weeds.

Best,
Erick

On Tue, Aug 1, 2017 at 7:52 PM, Dave  wrote:
> To add to this, not sure of solr cloud uses it, but you're going to want to 
> destroy the wrote.lock file as well
>
>> On Aug 1, 2017, at 9:31 PM, Shawn Heisey  wrote:
>>
>>> On 8/1/2017 7:09 PM, Erick Erickson wrote:
>>> WARNING: what I currently understand about the limitations of AWS
>>> could fill volumes so I might be completely out to lunch.
>>>
>>> If you ADDREPLICA with the new replica's  data residing on the new EBS
>>> volume, then wait for it to sync (which it'll do all by itself) then
>>> DELETEREPLICA on the original you'll be all set.
>>>
>>> In recent Solr's, theres also the MOVENODE collections API call.
>>
>> I did consider mentioning that as a possible way forward, but I hate to
>> rely on special configurations with core.properties, particularly if the
>> newly built replica core instanceDirs aren't in the solr home (or
>> coreRootDirectory) at all.  I didn't want to try and explain the precise
>> steps required to get that plan to work.  I would expect to need some
>> arcane Collections API work or manual ZK modification to reach a correct
>> state -- steps that would be prone to error.
>>
>> The idea I mentioned seemed to me to be the way forward that would
>> require the least specialized knowledge.  Here's a simplified stating of
>> the steps:
>>
>> * Mount the new volume somewhere.
>> * Use multiple rsync passes to get the data copied.
>> * Stop Solr.
>> * Do a final rsync pass.
>> * Unmount the original volume.
>> * Remount the new volume in the original location.
>> * Start Solr.
>>
>> Thanks,
>> Shawn
>>


migrating XML queryparser from solr 4.7 to 6.6

2017-08-02 Thread Kempelen , Ákos
Hello everybody,
We are using the XML queryparser in our application with Solr 4.7, but now we 
need to migrate to version 6.6.
Unfortunately things have changed a bit..
The Filter class is missing, but we are using the FilteredQuery class massively 
to create complex filters.
So we have to find a way to reimplement (or at least mimic) the Filter 
functionality, if we dont want to rewrite our client codebase.
Second problem is that we have some custom filters, where we manipulate 
DocIdSets in the getDocIdSet(AtomicReaderContext arc, Bits bits) method to 
filter out specific documents.
Currently I do not know how to implement this feature without Filter.
Any help would be great!:)
Thanks,
Akos





Re: Replication Question

2017-08-02 Thread Michael B. Klein
Thanks for your responses, Shawn and Erick.

Some clarification questions, but first a description of my (non-standard)
use case:

My Zookeeper/SolrCloud cluster is running on Amazon AWS. Things are working
well so far on the production cluster (knock wood); its the staging cluster
that's giving me fits. Here's why: In order to save money, I have the AWS
auto-scaler scale the cluster down to zero nodes when it's not in use.
Here's the (automated) procedure:

SCALE DOWN
1) Call admin/collections?action=BACKUP for each collection to a shared NFS
volume
2) Shut down all the nodes

SCALE UP
1) Spin up 2 Zookeeper nodes and wait for them to stabilize
2) Spin up 3 Solr nodes and wait for them to show up under Zookeeper's
live_nodes
3) Call admin/collections?action=RESTORE to put all the collections back

This has been working very well, for the most part, with the following
complications/observations:

1) If I don't optimize each collection right before BACKUP, the backup
fails (see the attached solr_backup_error.json).
2) If I don't specify a replicationFactor during RESTORE, the admin
interface's Cloud diagram only shows one active node per collection. Is
this expected? Am I required to specify the replicationFactor unless I'm
using a shared HDFS volume for solr data?
3) If I don't specify maxShardsPerNode=1 during RESTORE, I get a warning
message in the response, even though the restore seems to succeed.
4) Aside from the replicationFactor parameter on the CREATE/RESTORE, I do
not currently have any replication stuff configured (as it seems I should
not).
5) At the time my "1-in-3 requests are failing" issue occurred, the Cloud
diagram looked like the attached solr_admin_cloud_diagram.png. It seemed to
think all replicas were live and synced and happy, and because I was
accessing solr through a round-robin load balancer, I was never able to
tell which node was out of sync.

If it happens again, I'll make node-by-node requests and try to figure out
what's different about the failing one. But the fact that this happened
(and the way it happened) is making me wonder if/how I can automate this
automated staging environment scaling reliably and with confidence that it
will Just Work™.

Comments and suggestions would be GREATLY appreciated.

Michael



On Tue, Aug 1, 2017 at 8:14 PM, Erick Erickson 
wrote:

> And please do not use optimize unless your index is
> totally static. I only recommend it when the pattern is
> to update the index periodically, like every day or
> something and not update any docs in between times.
>
> Implied in Shawn's e-mail was that you should undo
> anything you've done in terms of configuring replication,
> just go with the defaults.
>
> Finally, my bet is that your problematic Solr node is misconfigured.
>
> Best,
> Erick
>
> On Tue, Aug 1, 2017 at 2:36 PM, Shawn Heisey  wrote:
> > On 8/1/2017 12:09 PM, Michael B. Klein wrote:
> >> I have a 3-node solrcloud cluster orchestrated by zookeeper. Most stuff
> >> seems to be working OK, except that one of the nodes never seems to get
> its
> >> replica updated.
> >>
> >> Queries take place through a non-caching, round-robin load balancer. The
> >> collection looks fine, with one shard and a replicationFactor of 3.
> >> Everything in the cloud diagram is green.
> >>
> >> But if I (for example) select?q=id:hd76s004z, the results come up empty
> 1
> >> out of every 3 times.
> >>
> >> Even several minutes after a commit and optimize, one replica still
> isn’t
> >> returning the right info.
> >>
> >> Do I need to configure my `solrconfig.xml` with `replicateAfter`
> options on
> >> the `/replication` requestHandler, or is that a non-solrcloud,
> >> standalone-replication thing?
> >
> > This is one of the more confusing aspects of SolrCloud.
> >
> > When everything is working perfectly in a SolrCloud install, the feature
> > in Solr called "replication" is *never* used.  SolrCloud does require
> > the replication feature, though ... which is what makes this whole thing
> > very confusing.
> >
> > Replication is used to replicate an entire Lucene index (consisting of a
> > bunch of files on the disk) from a core on a master server to a core on
> > a slave server.  This is how replication was done before SolrCloud was
> > created.
> >
> > The way that SolrCloud keeps replicas in sync is *entirely* different.
> > SolrCloud has no masters and no slaves.  When you index or delete a
> > document in a SolrCloud collection, the request is forwarded to the
> > leader of the correct shard for that document.  The leader then sends a
> > copy of that request to all the other replicas, and each replica
> > (including the leader) independently handles the updates that are in the
> > request.  Since all replicas index the same content, they stay in sync.
> >
> > What SolrCloud does with the replication feature is index recovery.  In
> > some situations recovery can be done from the leader's transaction log,
> > but when a replica has gotten so far out of sync 

RE: Solr Index issue on string type while querying

2017-08-02 Thread padmanabhan
Thank you Matt for the reply. my apologize on the clarity about the problem
statement. 

The problem was with the source attribute value defined at the source
system.

Source system with the 

heightSquareTube_string_mv: > 90 - 100 mm

Solr index converts the xml or html code to its symbol equivalent. 

heightSquareTube_string_mv: > 90 - 100 mm



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Index-issue-on-string-type-while-querying-tp4335340p4348788.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom Sort option to apply at SOLR index

2017-08-02 Thread padmanabhan
Hello Solr Geeks,

Am newbie to SOLR. I have a requirement as given below, Could any one please
provide some insights on how to go about on this. 

"Ascending by name" (#, 0 - 9, A - Z)

"Descending by name" (Z - A, 9 - 0, #)

Sample name value can be 

ABCD5678
1234ABCD
#2345ABCD
#1234ABCD
5678ABCD
#2345ACBD
5678EFGH
#2345DBCA
ABCD1234
1234#ABCD

*Expected Ascending order*

#2345ABCD
#2345ACBD
#2345DBCA
1234#ABCD
1234ABCD
5678ABCD
ABCD1234
ABCD5678

*Expected Descending order*

ABCD5678
ABCD1234
5678ABCD
1234ABCD
1234#ABCD
#2345DBCA
#2345ACBD
#2345ABCD

Thanks & Regards,
Paddy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-Sort-option-to-apply-at-SOLR-index-tp4348787.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic words search in solr

2017-08-02 Thread David Hastings
perhaps change your default operator to AND instead of OR if thats what you
are expecting for a result

On Wed, Aug 2, 2017 at 8:57 AM, mohanmca01  wrote:

> Hi Phil Scadden,
>
>  Thank you for your reply,
>
> we tried your suggested solution by removing hyphen while indexing, but it
> was getting wrong results. i was searching for "شرطة ازكي" and it was
> showing me the result that am looking for, plus irrelevant result which
> either have the first or second word that i have typed while searching.
>
> First word: شرطة
> Second Word: ازكي
>
> results that we are getting:
>
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 3,
> "params": {
>   "indent": "true",
>   "q": "bizNameAr:(شرطة ازكي)",
>   "_": "1501678260335",
>   "wt": "json"
> }
>   },
>   "response": {
> "numFound": 444,
> "start": 0,
> "docs": [
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>   {
> "id": "13937",
> "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
> "_version_": 157462113219720
>   },
>   {
> "id": "15914",
> "bizNameAr": "العلوي والازكي المتحدة ش.م.م",
> "_version_": 1574621132344000500
>   },
>   {
> "id": "20639",
> "bizNameAr": "سحائب ازكي للتجارة",
> "_version_": 1574621132574687200
>   },
>   {
> "id": "25108",
> "bizNameAr": "المستشفيات -  - مستشفى إزكي",
> "_version_": 1574621132737216500
>   },
>   {
> "id": "27629",
> "bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
> "_version_": 1574621132833685500
>   },
>   {
> "id": "36351",
> "bizNameAr": "طوارئ الكهرباء - إزكي",
> "_version_": 157462113318391
>   },
>   {
> "id": "61235",
> "bizNameAr": "اضواء ازكي للتجارة",
> "_version_": 1574621133785792500
>   },
>   {
> "id": "66821",
> "bizNameAr": "أطلال إزكي للتجارة",
> "_version_": 1574621133915816000
>   },
>   {
> "id": "67011",
> "bizNameAr": "بنك ظفار - فرع ازكي",
> "_version_": 1574621133920010200
>   }
> ]
>   }
> }
>
> Actually  we expecting the below results only since it has both the words
> that we typed while searching:
>
>   {
> "id": "28107",
> "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -
> -
> مركز شرطة إزكي",
> "_version_": 1574621132849414100
>   },
>
>
> Configuration:
>
> In schema.xml we configured as below:
>
> 
>
>
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_ar.txt" />
> 
> 
> 
> 
>  pattern="ى"
> replacement="ئ"/>
>  pattern="ء"
> replacement=""/>
>   
> 
>
>
> Thanks,
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Arabic words search in solr

2017-08-02 Thread mohanmca01
Hi Phil Scadden,

 Thank you for your reply,

we tried your suggested solution by removing hyphen while indexing, but it
was getting wrong results. i was searching for "شرطة ازكي" and it was
showing me the result that am looking for, plus irrelevant result which
either have the first or second word that i have typed while searching.

First word: شرطة 
Second Word: ازكي

results that we are getting:


{
  "responseHeader": {
"status": 0,
"QTime": 3,
"params": {
  "indent": "true",
  "q": "bizNameAr:(شرطة ازكي)",
  "_": "1501678260335",
  "wt": "json"
}
  },
  "response": {
"numFound": 444,
"start": 0,
"docs": [
  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  -
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },
  {
"id": "13937",
"bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات",
"_version_": 157462113219720
  },
  {
"id": "15914",
"bizNameAr": "العلوي والازكي المتحدة ش.م.م",
"_version_": 1574621132344000500
  },
  {
"id": "20639",
"bizNameAr": "سحائب ازكي للتجارة",
"_version_": 1574621132574687200
  },
  {
"id": "25108",
"bizNameAr": "المستشفيات -  - مستشفى إزكي",
"_version_": 1574621132737216500
  },
  {
"id": "27629",
"bizNameAr": "وزارة الداخلية -  -  - والي إزكي -",
"_version_": 1574621132833685500
  },
  {
"id": "36351",
"bizNameAr": "طوارئ الكهرباء - إزكي",
"_version_": 157462113318391
  },
  {
"id": "61235",
"bizNameAr": "اضواء ازكي للتجارة",
"_version_": 1574621133785792500
  },
  {
"id": "66821",
"bizNameAr": "أطلال إزكي للتجارة",
"_version_": 1574621133915816000
  },
  {
"id": "67011",
"bizNameAr": "بنك ظفار - فرع ازكي",
"_version_": 1574621133920010200
  }
]
  }
}

Actually  we expecting the below results only since it has both the words
that we typed while searching:

  {
"id": "28107",
"bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية  -  -
مركز شرطة إزكي",
"_version_": 1574621132849414100
  },


Configuration:

In schema.xml we configured as below:





   






 
 
  



Thanks,





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html
Sent from the Solr - User mailing list archive at Nabble.com.


generate field name in query

2017-08-02 Thread Peter Kirk
Hi - is it possible to create a query (or fq) which generates the field to 
search on, based on whether or not the document has that field?

Eg. Search for documents with prices in the range 100 - 200, using either the 
field "price_owner_float" or "price_customer_float" (if a document has a field 
"price_owner_float" then use that, otherwise use the field 
"price_customer_float").

This gives a syntax error:
fq=if(exists(price_owner_float),price_owner_float,price_customer_float):[100 TO 
200]

Thanks,
Peter




Re: Problems retrieving large documents

2017-08-02 Thread Aman Tandon
Did you find any error in Solr logs?

On Sat, Jul 29, 2017, 23:13 Aman Tandon  wrote:

> Hello,
>
> Kindly check the Solr logs when you are hitting the query. Attach the same
> here, that I could gave more insight.
>
> For me it looks like the OOM, but check the Solr logs I hope we could get
> more information from there.
>
> On Sat, Jul 29, 2017, 14:35 SOLR6932  wrote:
>
>> Hey all,
>> I am using Solr 4.10.3 and my collection consists around 2300 large
>> documents that are distributed across a number of shards. Each document is
>> estimated to be around 50-70 megabytes. The queries that I run are
>> sophisticated, involve a range of parameters and diverse query filters.
>> Whenever I wish to retrieve all the returned document fields (fl:* [around
>> 50 fields in my schema]), I receive an impossible exception - specifically
>> /org.apache.solr.common.SolrException: Impossible Exception/ that is
>> logged
>> by both SolrCore and SolrDispachFilter. Has anyone experienced a similar
>> problem and knows how to solve this issue?
>> Thanks in advance,
>> Louie.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Problems-retrieving-large-documents-tp4348169.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>


Re: edismax, pf2 and use of both AND and OR parameter

2017-08-02 Thread Aman Tandon
Hi,

Ideally it should but from the debug query it seems like it is not
respecting Boolean clauses.

Anyone else could help here? Is this the ideal behavior?

On Jul 31, 2017 5:47 PM, "Niraj Aswani"  wrote:

> Hi Aman,
>
> Thank you very much your reply.
>
> Let me elaborate my question a bit more using your example in this case.
>
> AFAIK, what the pf2 parameter is doing to the query is adding the following
> phrase queries:
>
> (_text_:"system memory") (_text_:"memory oem") (_text_:"oem retail")
>
> There are three phrases being checked here:
> - system memory
> - memory oem
> - oem retail
>
> However, what I actually expected it to look like is the following:
> - system memory
> - memory oem
> - memory retail
>
> My understanding of the edismax parser is that it interprets the AND / OR
> parameters correctly so it should generate the bi-gram phrases respecting
> the AND /OR parameters as well, right?
>
> Am I missing something here?
>
> Regards,
> Niraj
>
> On Mon, Jul 31, 2017 at 4:24 AM, Aman Tandon 
> wrote:
>
> > Hi Niraj,
> >
> > Should I expect it to check the following bigram phrases?
> >
> > Yes it will check.
> >
> > ex- documents & query is given below
> >
> > http://localhost:8983/solr/myfile/select?wt=xml&fl=name&;
> > indent=on&q=*System
> > AND Memory AND (OEM OR Retail)*&rows=50&wt=json&*qf=_text_&pf2=_text_*
> > &debug=true&defType=edismax
> >
> > 
> > 
> > 
> > 
> > A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System
> > Memory - OEM
> > 
> > 
> > 
> > 
> > 
> > 
> > CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200)
> > System Memory - Retail
> > 
> > 
> > 
> > 
> > 
> > 
> > CORSAIR XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200)
> > Dual Channel Kit System Memory - Retail
> > 
> > 
> > 
> > 
> >
> >
> > *Below is the parsed query*
> >
> > 
> > +(+(_text_:system) +(_text_:memory) +((_text_:oem) (_text_:retail)))
> > ((_text_:"system memory") (_text_:"memory oem") (_text_:"oem retail"))
> > 
> >
> > In case if you are in such scenarios where you need to knwo what query
> will
> > form, then you could us the debug=true to know more about the query &
> > timings of different component.
> >
> > *And when the ps2 is not specified default ps will be applied on pf2.*
> >
> > I hope this helps.
> >
> > With Regards
> > Aman Tandon
> >
> > On Mon, Jul 31, 2017 at 4:18 AM, Niraj Aswani 
> > wrote:
> >
> > > Hi,
> > >
> > > I am using solr 4.4 and bit confused about how does the edismax parser
> > > treat the pf2 parameter when both the AND and OR operators are used in
> > the
> > > query with ps2=0
> > >
> > > For example:
> > >
> > > pf2=title^100
> > > q=HDMI AND Video AND (Wire OR Cable)
> > >
> > > Should I expect it to check the following bigram phrases?
> > >
> > > hdmi video
> > > video wire
> > > video cable
> > >
> > > Regards
> > > Niraj
> > >
> >
>


RE: Solr Input and Output format

2017-08-02 Thread Ranganath B N
Hi,






  I am not asking about the file formats. Rather, It is about SolrInputFormat 
and SolrOutputFormat interfaces which deal with getsplit(), getRecordReader() 
and getRecordWriter() methods. Are there any Implementations for these 
interfaces?





Thanks,

Ranganath B. N.








From: Ranganath B N
Sent: Monday, July 31, 2017 6:33 PM
To: 'solr-user@lucene.apache.org'
Cc: Vadiraj Muradi
Subject: Solr Input and Output format


Hi All,

 Can you point me to some of the implementations  of Solr Input and Output 
format? I wanted to know them to  understand the distributed implementation 
approach.


Thanks,
Ranganath B. N.