Re: Null pointer exception in spell checker at addchecker method
yes, it worked. And i got the reason for the error. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Null-pointer-exception-in-spell-checker-at-addchecker-method-tp4105489p4105636.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Constantly increasing time of full data import
On Tue, 2013-12-03 at 17:09 +0100, michallos wrote: This occurs only on production environment so I can't profile it :-) Sure you can [Smiley] If you use jvisualvm and stay away from the Profiler-tab, then you should be fine. The Sampler performs non-intrusive profiling. Not as thorough as real profiling, but it might help. So far it sounds like a classic merge-issue though. This would probably not show up in the profiler. Have you tweaked the mergeFactor? http://wiki.apache.org/solr/SolrPerformanceFactors#mergeFactor With 16 shards/node (guessing: Same storage backend for all shards on a single node, different storage backends across the nodes) and 15 second commit time, a segment will be created every second (oversimplifying as they will cluster, which makes matters worse for spinning drives). If the mergeFactor is 10, this means that a merge will be going on every 10 seconds. Merges are bulk IO and for spinning drives they get penalized by concurrent random access. Consider doing a non-intrusive IO load-logging (bulk as well as IO/sec) on a node. If you see bulk speed go down considerably when the IO/sec rises, then you have your problem. Some solutions are - Increase your maxTime for autoCommit - Increase the mergeFactor - Use SSDs - Maybe lower the amount of shards to lower the amount of thrashing triggered by concurrent merges - More RUM (and more RAM) Regards, Toke Eskildsen, State and University Library, Denmark
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram
Re: Faceting within groups
Can you try setting group.truncate to true? On Sunday, December 8, 2013 3:18 PM, Cool Techi cooltec...@outlook.com wrote: Any help here? From: cooltec...@outlook.com To: solr-user@lucene.apache.org Subject: Faceting within groups Date: Sat, 7 Dec 2013 14:00:20 +0530 Hi, I am not sure if faceting with groups is supported, the documents do seem to suggest it works, but cant seem to get the intended results. str name=q(Amazon Cloud OR (IBM Cloud)/strstr name=group.fieldsourceId/strstr name=facet.fieldsentiment/strstr name=grouptrue/strstr name=group.facettrue/str Also, if it work's does solr cloud support it. Regards,Ayush
Re: alternative to DisMaxRequestHandler needed for upgrade to solr 4.6.0
thanks guys, that worked On 6 December 2013 23:55, Shawn Heisey s...@elyograg.org wrote: On 12/6/2013 8:58 AM, Peri Stracchino wrote: I'm trying to upgrade a solr installation from 1.4 (yes, really) to 4.6.0, and I find our requesthandler was solr.DisMaxRequestHandler, which is now not only deprecated but deleted from solr-core-4.6.0.jar. Can anyone advise on suitable alternatives, or was there any form of direct replacement? Erick is right, you should probably use edismax. In addition, it's important to note a critical distinction here ... it's the *handler* object that's deprecated and removed, not the parser. The old dismax query parser is still alive and well, alongside the new extended dismax query parser. You need to use a standard search request handler and set the defType parameter to dismax or edismax. http://wiki.apache.org/solr/DisMaxQParserPlugin http://wiki.apache.org/solr/ExtendedDisMax I would recommend that you not use /dismax or /edismax for the handler name, just to avoid terminology clashes. I use /ncdismax for my handler name ... the string nc has meaning for our web application. Eventually I hope to move all searching to edismax and therefore just use /select or /search for the handler name. Right now we do almost everything with the standard query parser, and we are still tuning edismax. This is my handler definition: requestHandler name=/ncdismax class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=echoParamsall/str int name=rows70/int str name=dfcatchall/str xi:include href=shards.xml xmlns:xi=http://www.w3.org/ 2001/XInclude/ str name=shards.qt/search/str str name=shards.infotrue/str str name=shards.toleranttrue/str float name=tie0.1/float int name=qs3/int int name=ps3/int str name=qfcatchall/str str name=pfcatchall^2/str str name=boostmin(recip(abs(ms(NOW/HOUR,pd)),1.92901e-10,1.5, 1.5),0.85)/str str name=mm100%/str str name=q.alt*:*/str bool name=lowercaseOperatorsfalse/bool /lst /requestHandler Thanks, Shawn
Re: Constantly increasing time of full data import
on production - no I can't profile it (because of huge overhead) ... Maybe with dynamic tracing but we can't do it right now. After server restart, delta time reset to 15-20 seconds so it is not caused by the mergeFactor. We have SSD and 70GB RAM (it is enough for us). -- View this message in context: http://lucene.472066.n3.nabble.com/Constantly-increasing-time-of-full-data-import-tp4103873p4105658.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing on plain text and binary data in a single HTTP POST request
Hi, I am using Solr for searching my email data. My application is in C++ so I a using CURL library to POST the data to Solr for indexing. I am posting data in XML format and some of the XML fields are in plain text and some of the fields are in binary format. I want to know what should I do so that Solr can index both types of data (plain text as well as binary data) coming in a single XML file. For the reference my XML file looks like: adddocfield name=mailbox-id/fieldfield name=folderINBOX/fieldfield name=fromsolr solr s...@abc.com/fieldfield name=tosolr s...@abc.com/fieldfield name=email-bodyHI I AM EMAIL BODY\r\n\r\nTHANKS/fieldfield name=email-attachmentSome binary data/doc/add I tried to use ExtractingUpdateProcessorFactory but it seems to me that ExtractingUpdateProcessorFactory support is not in Solr 4.5(which I am using) even not in any of the Solr version available in market. Also, I think I can not use ExtractingRequestHandler for my problem as the document is of type XML format and having mixed type of data(text and binary). Am I right ?? If yes, pls. suggest me how to proceed and if no, how can I extract text using ExtractingRequestHandler from some of the binary fields. Any help is highly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Difference between textfield and strfield
Hi Manju, Ahmet is me :) Faceting will be OK with lowercase field type. Even if it is solr.TextField. KeywordTokenizer keeps its input as a single token. Similar behavior as string field. With solr.TextField + KeywordTokenizer you can add further token filters. For example lowercase filter. With string type you cannot add any token filters. As Erick suggested, you can play with field types at Admin analysis page. It allows you to enter sample text and displays generated tokens visually. On Sunday, December 8, 2013 2:00 PM, manju16832003 manju16832...@gmail.com wrote: I don't understand. Use the field type *Ahmet* recommended. Who is Ahmet? -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916p4105570.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Constantly increasing time of full data import
On Mon, 2013-12-09 at 11:29 +0100, michallos wrote: on production - no I can't profile it (because of huge overhead) ... Maybe with dynamic tracing but we can't do it right now. https://blogs.oracle.com/nbprofiler/entry/visualvm_1_3_released First section: Sampler In The Core Tool. After server restart, delta time reset to 15-20 seconds so it is not caused by the mergeFactor. Unless your merges are cascading so that the amount of concurrent merges is growing. But with fast storage and a lot of RAM for write cache, that does not sound probable. We have SSD and 70GB RAM (it is enough for us). Sounds like more than enough for a 120GB index. - Toke Eskildsen, State and University Library, Denmark
Re: Indexing on plain text and binary data in a single HTTP POST request
Not a solution, but a couple of thoughts: 1) For your email address fields, you are escaping the brackets, right? Not just solr solr s...@abc.com as you show, but the and escaped, right? Otherwise, those email addresses become part of XML markup and mess it all up 2) Your binary content is encoded in some way inside XML, right? Not just random binary, which would make it invalid XML? Like base64 or something? 3) I suspect you will need to use UpdateRequestProcessor one way or another. To decode base64 as first step and to feed it through whatever you want to process actually binary with as a second step. So, it might be a custom URP, with similar functionality to ExtractingRequestHandler with the difference that you already have a document object and you are mapping one - binary - field in it into a bunch of other fields with some conventions on names, overrides, etc. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Mon, Dec 9, 2013 at 5:55 PM, neerajp neeraj_star2...@yahoo.com wrote: Hi, I am using Solr for searching my email data. My application is in C++ so I a using CURL library to POST the data to Solr for indexing. I am posting data in XML format and some of the XML fields are in plain text and some of the fields are in binary format. I want to know what should I do so that Solr can index both types of data (plain text as well as binary data) coming in a single XML file. For the reference my XML file looks like: adddocfield name=mailbox-id/fieldfield name=folderINBOX/fieldfield name=fromsolr solr s...@abc.com/fieldfield name=tosolr s...@abc.com/fieldfield name=email-bodyHI I AM EMAIL BODY\r\n\r\nTHANKS/fieldfield name=email-attachmentSome binary data/doc/add I tried to use ExtractingUpdateProcessorFactory but it seems to me that ExtractingUpdateProcessorFactory support is not in Solr 4.5(which I am using) even not in any of the Solr version available in market. Also, I think I can not use ExtractingRequestHandler for my problem as the document is of type XML format and having mixed type of data(text and binary). Am I right ?? If yes, pls. suggest me how to proceed and if no, how can I extract text using ExtractingRequestHandler from some of the binary fields. Any help is highly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661.html Sent from the Solr - User mailing list archive at Nabble.com.
Searching for document by id in a sharded environment
Hi, I'm in the process of migrating an application that queries Solr to use a new sharded SolrCloud, and as part of this I'm adding the shard key to the document id when we index documents (as we're using grouping and we need to ensure that grouped documents end up on the same shard) e.g. 156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 I'm having a problem with my application when searching by id with SolrJ CloudSolrServer - the exclamation point is misinterpreted as a boolean negation, and the matching document is not returned in the search results. I just wanted to check if the only way to make this work would be to escape the exclamation point (i.e. prefix with a slash, or enclose the id within quotes). We're keen to avoid this, as this will require lots of modifications throughout the code on a series of applications that interact with Solr. If anyone has any better suggestions on how to achieve this it would be very much appreciated! Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
Re: Searching for document by id in a sharded environment
Hi Daniel, TermQueryParser comes handy when you don't want to escape. q = {!term f=id}156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 On Monday, December 9, 2013 2:14 PM, Daniel Bryant daniel.bry...@tai-dev.co.uk wrote: Hi, I'm in the process of migrating an application that queries Solr to use a new sharded SolrCloud, and as part of this I'm adding the shard key to the document id when we index documents (as we're using grouping and we need to ensure that grouped documents end up on the same shard) e.g. 156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 I'm having a problem with my application when searching by id with SolrJ CloudSolrServer - the exclamation point is misinterpreted as a boolean negation, and the matching document is not returned in the search results. I just wanted to check if the only way to make this work would be to escape the exclamation point (i.e. prefix with a slash, or enclose the id within quotes). We're keen to avoid this, as this will require lots of modifications throughout the code on a series of applications that interact with Solr. If anyone has any better suggestions on how to achieve this it would be very much appreciated! Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk
Re: Resolve an issuse with SOLR
Hi Munusamy, A typical core directory contains a conf/ folder and a data/ folder The conf directory should contain the solrconfig.xml and schema.xml You should have a folder with the same name as the instanceDir parameter on the admin UI. Inside this folder the conf/ and data/ directories should exist. So you first need to have the directory present and with the solrconfig and schema files and then go to the core admin page and create a core. On Mon, Dec 9, 2013 at 12:45 PM, Munusamy, Kannan kannan.munus...@capgemini.com wrote: Hi, I have used the +add core option in the admin UI. But I can’t able to add a core. After then, it showed the *“hTTP Status 500 - {msg=SolrCore 'new_core' is not available due to init failure: Path must not end with /” … * Once I restarted the solr service, now I am getting this error in the UI – *“Unable to load environment info from /solr/collection1_shard1_replica1/admin/system?wt=json.* *This interface requires that you activate the admin request handlers in all SolrCores by adding the following configuration to your solrconfig.xml:”* PFA error image. Please provide suggestions and help us to resolve the issue. Thanks Regards, * Kannan Munusamy *| *♠* *Cap**gemini** India* | Bangalore É Off: 080 66567000 Extn: 8068605 I Cell: + 91 9952312352 [image: cid:image001.gif@01CCD07C.65A424C0]4 kannan.munus...@capgemini.com I www.in.capgemini.com *People matter, results count.* P Print only if absolutely necessary | *7* Switch off as you go |*q*Recycle always This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message. -- Regards, Varun Thacker http://www.vthacker.in/
Is it possible to retain the indexed data from solr
I am implementing solr search in my application.I am indexing the data from mysql server to xml file,using the version solr 1.4.My question is 1.Is it possible to retain the indexed xml data into a csv or pdf file. 2.Is it possible to save the data from indexed xml to mysql server.For example,if i am indexing a xml file manually,not from mysql server is any chance to save the indexed data to mysql server. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-retain-the-indexed-data-from-solr-tp4105682.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi part field
I am trying to implement a ranged field type in a booking system. The price structure is variable between 2 dates (determined by the property owner) So it looks like this Date A - Date B = Price Value I've been looking through a lot of docs, but so far have not been able to find how I could possibly implement such an object within SOLR. the only thing I have so fa thought of is have 2 fields -DATE PRICE RANGE - PRICE RANGE VAL then get the index of the DATE PRICE RANGE array object that matches and apply that to the PRICE RANGE VAL to get the value. Any help would be very appreciated on this as it's the make or break of the new search system for our site just now. -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-part-field-tp4105685.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi part field - EXAMPLE DATA
prices: [{start-date: 05-01-2013,end-date: 02-03-2013,price: 760},{start-date: 02-03-2013,end-date: 06-04-2013,price: 800},{start-date: 06-04-2013,end-date: 01-06-2013,price: 1028},{start-date: 01-06-2013,end-date: 29-06-2013,price: 1240},{start-date: 29-06-2013,end-date: 06-07-2013,price: 1340},{start-date: 06-07-2013,end-date: 10-08-2013,price: 1678},{start-date: 10-08-2013,end-date: 24-08-2013,price: 1578},{start-date: 24-08-2013,end-date: 31-08-2013,price: 1340},{start-date: 31-08-2013,end-date: 21-09-2013,price: 1240},{start-date: 21-09-2013,end-date: 19-10-2013,price: 1028},{start-date: 19-10-2013,end-date: 02-11-2013,price: 800},{start-date: 02-11-2013,end-date: 11-01-2014,price: 760}], -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-part-field-tp4105685p4105686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram
Getting Solr Document Attributes from a Custom Function
Hi All, I have a written a custom solr function and I would like to read a property of the document inside my custom function. Is it possible to get that using Solr? For eg. inside the floatVal method, I would like to get the value of the attribute name public class CustomValueSource extends ValueSource { @Override public FunctionValues getValues(Map context, AtomicReaderContext readerContext) throws IOException { return new FloatDocValues(this) { @Override public float floatVal(int doc) { /*** getDocument(doc).getAttribute(name) / }}} Thanks Regards Mukund
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and remarks on the internet suggest that it is only for smaller SMP setups. Removing CMS did not fix anything. I've considered that the heap is way too large (30GB from 40GB) and may not leave enough memory for mmap operations (MMap appears to be used in the field cache). Based on active memory utilization in Java, seems like I might be able to reduce down to 22GB safely
Re: [Solr Wiki] Your wiki account data
Hello, Is this email address still valid? Kind Regards 2013/12/4 Mehdi Burgy gla...@gmail.com Hello, We've recently launched a job search engine using Solr, and would like to add it here: https://wiki.apache.org/solr/PublicServers Would it be possible to allow me be part of the publishing group? Thank you for your help Kind Regards, Mehdi Burgy New Job Search Engine: www.jobreez.com -- Forwarded message -- From: Apache Wiki wikidi...@apache.org Date: 2013/12/4 Subject: [Solr Wiki] Your wiki account data To: Apache Wiki wikidi...@apache.org Somebody has requested to email you a password recovery token. If you lost your password, please go to the password reset URL below or go to the password recovery page again and enter your username and the recovery token. Login Name: madeinch
SolrCloud 4.6.0 - leader election issue
Hello, I am using SolrCloud 4.6.0 with two shards, two replicas by shard and with two collections. collection fr_blue: - shard1 - server-01 (replica1), server-01 (replica2) - shard2 - server-02 (replica1), server-02 (replica2) collection fr_green: - shard1 - server-01 (replica1), server-01 (replica2) - shard2 - server-02 (replica1), server-02 (replica2) If I start the four solr instances without a delay between each start, it is not possible to connect to them and it is not possible to acces to the Solr Admin page. If I get the clusterstate.json with zkCli, the statuses are: - active for the leaders of the first collection - recoveryingfor the other replicas of the first collection - down for all replicas of the second collection (no leader) The logs loop on the following messages : server-01: 2013-12-09 14:41:28,634 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010063, packet:: clientPath:null serverPath:null finished:false header:: 568,4 replyHeader:: 568,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard1,F response:: 2013-12-09 14:41:28,635 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010064, packet:: clientPath:null serverPath:null finished:false header:: 372,4 replyHeader:: 372,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard2,F response:: server-02: 2013-12-09 14:41:51,381 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1014,4 replyHeader:: 1014,483813,0 request:: '/s6fr/overseer_elect/leader,F response:: #7b226964223a2239303837313832313732343837363839342d6463312d76742d6465762d78656e2d30362d766d2d30362e6465762e6463312e6b656c6b6f6f2e6e65743a383038375f736561726368736f6c726e6f646566722d6e5f30303030303030303634227d,s{483632,483632,1386599789203,1386599789203,0,0,0,90871821724876894,104,0,483632} 2013-12-09 14:41:51,383 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1015,8 replyHeader:: 1015,483813,0 request:: '/s6fr/overseer/queue,F response:: v{} 2013-12-09 14:41:51,385 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1016,8 replyHeader:: 1016,483813,0 request:: '/s6fr/overseer/queue-work,F response:: v{} After 10 minutes, there is a WARN message, a leader is found for the second collection and it is possible to connect to the solr instances: 2013-12-06 21:17:57,635 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:212 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) 2013-12-06 21:27:58,719 [coreLoadExecutor-4-thread-2] WARN org.apache.solr.update.PeerSync:handleResponse:322 - PeerSync: core=fr_green url=http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr exception talking to http://dc1-vt-dev-xen-06-vm-06.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/, failed org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: incref on a closed log: tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.001 refcount=1} at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) 2013-12-06 21:27:58,730 [coreLoadExecutor-4-thread-2] INFO org.apache.solr.cloud.SyncStrategy:syncReplicas:134 - Leader's attempt to sync with shard failed,
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
I was trying to locate the release notes for 3.6.x it is too old, if I were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you since it is a minor release, locate the release notes and see if something that is affecting you got fixed, also, I would be thinking on moving on to 4.x which is quite stable and fast. Like anything with Java and concurrency, it will just get better (and faster) with bigger numbers and concurrency frameworks becoming more and more reliable, standard and stable. Regards, Guido. On 09/12/13 15:07, Patrick O'Lone wrote: I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes
Re: Indexing on plain text and binary data in a single HTTP POST request
Hi Alexandre, Thanks very much for responding my post. Pls. find my response in-line: 1) For your email address fields, you are escaping the brackets, right? Not just solr solr [hidden email] as you show, but the and escaped, right? Otherwise, those email addresses become part of XML markup and mess it all up [Neraj]: Yes, you are right. I used CDATA for escaping and or any special characters in XML 2) Your binary content is encoded in some way inside XML, right? Not just random binary, which would make it invalid XML? Like base64 or something? [Neeraj]: I want to use random binary(*not base64 encoded*) in some of the XML fields inside CDATA tag so that XML will not become invalid. I hope I can do this. 3) To decode base64 as first step and to feed it through whatever you want to process actually binary with as a second step. So, it might be a custom URP, with similar functionality to ExtractingRequestHandler with the difference that you already have a document object and you are mapping one - binary - field in it into a bunch of other fields with some conventions on names, overrides, etc. [Neeraj]: Now, My XML document is containing some of the fields in plain text and some of the fields in random binary format. I tried to use ExtractingUpdateProcessor but soon came to know that the same is not rolled out in solr 4.5 I am not sure how to use ExtractingRequestHandler for an XML document having some of the fields in plain text and some of the fields in random binary format. It seems to me that ExtractingRequestHandler is used to extract text from a binary file input but my input document is in XML format not binary. I am new to Solr so need your valuable suggestion. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM crashed when start solr
Which are you solr startup parameters (java options) ? You can assign more memory to the JVM by specifying -Xmx=10G or whichever value works for you. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrCloud 4.6.0 - leader election issue
I can confirm i've seen this issue as well on trunk, a very recent build. -Original message- From:Elodie Sannier elodie.sann...@kelkoo.fr Sent: Monday 9th December 2013 16:43 To: solr-user@lucene.apache.org Cc: search5t...@lists.kelkoo.com Subject: SolrCloud 4.6.0 - leader election issue Hello, I am using SolrCloud 4.6.0 with two shards, two replicas by shard and with two collections. collection fr_blue: - shard1 - server-01 (replica1), server-01 (replica2) - shard2 - server-02 (replica1), server-02 (replica2) collection fr_green: - shard1 - server-01 (replica1), server-01 (replica2) - shard2 - server-02 (replica1), server-02 (replica2) If I start the four solr instances without a delay between each start, it is not possible to connect to them and it is not possible to acces to the Solr Admin page. If I get the clusterstate.json with zkCli, the statuses are: - active for the leaders of the first collection - recoveryingfor the other replicas of the first collection - down for all replicas of the second collection (no leader) The logs loop on the following messages : server-01: 2013-12-09 14:41:28,634 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010063, packet:: clientPath:null serverPath:null finished:false header:: 568,4 replyHeader:: 568,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard1,F response:: 2013-12-09 14:41:28,635 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be010064, packet:: clientPath:null serverPath:null finished:false header:: 372,4 replyHeader:: 372,483813,-101 request:: '/s6fr/collections/fr_green/leaders/shard2,F response:: server-02: 2013-12-09 14:41:51,381 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1014,4 replyHeader:: 1014,483813,0 request:: '/s6fr/overseer_elect/leader,F response:: #7b226964223a2239303837313832313732343837363839342d6463312d76742d6465762d78656e2d30362d766d2d30362e6465762e6463312e6b656c6b6f6f2e6e65743a383038375f736561726368736f6c726e6f646566722d6e5f30303030303030303634227d,s{483632,483632,1386599789203,1386599789203,0,0,0,90871821724876894,104,0,483632} 2013-12-09 14:41:51,383 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1015,8 replyHeader:: 1015,483813,0 request:: '/s6fr/overseer/queue,F response:: v{} 2013-12-09 14:41:51,385 [main-SendThread(dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:2181)] DEBUG org.apache.zookeeper.ClientCnxn:readResponse:815 - Reading reply sessionid:0x142d770be01005e, packet:: clientPath:null serverPath:null finished:false header:: 1016,8 replyHeader:: 1016,483813,0 request:: '/s6fr/overseer/queue-work,F response:: v{} After 10 minutes, there is a WARN message, a leader is found for the second collection and it is possible to connect to the solr instances: 2013-12-06 21:17:57,635 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader:process:212 - A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) 2013-12-06 21:27:58,719 [coreLoadExecutor-4-thread-2] WARN org.apache.solr.update.PeerSync:handleResponse:322 - PeerSync: core=fr_green url=http://dc1-vt-dev-xen-06-vm-07.dev.dc1.kelkoo.net:8080/searchsolrnodefr exception talking to http://dc1-vt-dev-xen-06-vm-06.dev.dc1.kelkoo.net:8080/searchsolrnodefr/fr_green/, failed org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: incref on a closed log: tlog{file=/opt/kookel/data/searchSolrNode/solrindex/fr1_green/tlog/tlog.001 refcount=1} at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:491) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:118) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Unfortunately, in a test environment, this happens in version 4.4.0 of Solr as well. I was trying to locate the release notes for 3.6.x it is too old, if I were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you since it is a minor release, locate the release notes and see if something that is affecting you got fixed, also, I would be thinking on moving on to 4.x which is quite stable and fast. Like anything with Java and concurrency, it will just get better (and faster) with bigger numbers and concurrency frameworks becoming more and more reliable, standard and stable. Regards, Guido. On 09/12/13 15:07, Patrick O'Lone wrote: I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
If you want a start time within the next 5 minutes, I think your filter is not the good one. * will be replaced by the first date in your field Try : fq=start_time:[NOW TO NOW+5MINUTE] Franck Brisbart Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit : I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath /usr/local/share/apache-tomcat/bin/bootstrap.jar \ -Dcatalina.base=/usr/local/share/apache-tomcat \ -Dcatalina.home=/usr/local/share/apache-tomcat \ -Djava.io.tmpdir=/tmp \ org.apache.catalina.startup.Bootstrap start I've tried a few GC option changes from this (been running this way for a couple of years now) - primarily removing CMS Incremental mode as we have 8 cores and
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Did you add the Garbage collection JVM options I suggested you? -XX:+UseG1GC -XX:MaxGCPauseMillis=50 Guido. On 09/12/13 16:33, Patrick O'Lone wrote: Unfortunately, in a test environment, this happens in version 4.4.0 of Solr as well. I was trying to locate the release notes for 3.6.x it is too old, if I were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you since it is a minor release, locate the release notes and see if something that is affecting you got fixed, also, I would be thinking on moving on to 4.x which is quite stable and fast. Like anything with Java and concurrency, it will just get better (and faster) with bigger numbers and concurrency frameworks becoming more and more reliable, standard and stable. Regards, Guido. On 09/12/13 15:07, Patrick O'Lone wrote: I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \ -verbose:gc \ -XX:+PrintGCDetails \ -server \ -Dcom.sun.management.jmxremote \ -XX:+UseConcMarkSweepGC \ -XX:+UseParNewGC \ -XX:+CMSIncrementalMode \ -XX:+CMSParallelRemarkEnabled \ -XX:+CMSIncrementalPacing \ -XX:NewRatio=3 \ -Xms30720M \ -Xmx30720M \ -Djava.endorsed.dirs=/usr/local/share/apache-tomcat/endorsed \ -classpath
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Yeah, I tried G1, but it did not help - I don't think it is a garbage collection issue. I've made various changes to iCMS as well and the issue ALWAYS happens - no matter what I do. If I'm taking heavy traffic (200 requests per second) - as soon as I hit a 5 minute mark - the world stops - garbage collection would be less predictable. Nearly all of my requests have this 5 minute windowing behavior on time though, which is why I have it as a strong suspect now. If it blocks on that - even for a couple of seconds, my traffic backlog will be 600-800 requests. Did you add the Garbage collection JVM options I suggested you? -XX:+UseG1GC -XX:MaxGCPauseMillis=50 Guido. On 09/12/13 16:33, Patrick O'Lone wrote: Unfortunately, in a test environment, this happens in version 4.4.0 of Solr as well. I was trying to locate the release notes for 3.6.x it is too old, if I were you I would update to 3.6.2 (from 3.6.1), it shouldn't affect you since it is a minor release, locate the release notes and see if something that is affecting you got fixed, also, I would be thinking on moving on to 4.x which is quite stable and fast. Like anything with Java and concurrency, it will just get better (and faster) with bigger numbers and concurrency frameworks becoming more and more reliable, standard and stable. Regards, Guido. On 09/12/13 15:07, Patrick O'Lone wrote: I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very
Re: Indexing on plain text and binary data in a single HTTP POST request
On 12/9/2013 9:20 AM, neerajp wrote: I tried to use ExtractingUpdateProcessor but soon came to know that the same is not rolled out in solr 4.5 I am not sure how to use ExtractingRequestHandler for an XML document having some of the fields in plain text and some of the fields in random binary format. It seems to me that ExtractingRequestHandler is used to extract text from a binary file input but my input document is in XML format not binary. ExtractingRequestHandler is a contrib module. It's not included in the Solr application war itself, but it IS in the download. You can find the jars in contrib/extraction/lib in all 4.x versions, including 4.5, 4.5.1, and 4.6. Thanks, Shawn
Re: JVM crashed when start solr
Hi michael, Thank you for you response. I start solr with follow command line: java -Xms10240m -Xmx20480m -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DzkHost=node4:9983 -DnumShards=3 -jar start.jar It doesn't work any more. the solr server crashed when the memory usage of the server raise up to 5G. 2013/12/10 michael.boom my_sky...@yahoo.com Which are you solr startup parameters (java options) ? You can assign more memory to the JVM by specifying -Xmx=10G or whichever value works for you. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Bad fieldNorm when using morphologic synonyms
In order to set discountOverlaps to true you must have added the similarity class=solr.DefaultSimilarityFactory to the schema.xml, which is commented out by default! As by default this param is false, the above situation is expected with correct positioning, as said. In order to fix the field norms you'd have to reindex with the similarity class which initializes the param to true. Cheers, Manu
RE: JVM crashed when start solr
you may want to start by updating both your solr and JVM to more recent releases. looks like you are running solr 4.3.0 and java 6 u31 in your trace. i would suggest trying with solr 4.5.1 and java 7 u45. From: Wukang Lin vboylin1...@gmail.com Sent: Monday, December 09, 2013 09:19 To: solr-user@lucene.apache.org Subject: Re: JVM crashed when start solr Hi michael, Thank you for you response. I start solr with follow command line: java -Xms10240m -Xmx20480m -Dbootstrap_confdir=./solr/conf -Dcollection.configName=myconf -DzkRun -DzkHost=node4:9983 -DnumShards=3 -jar start.jar It doesn't work any more. the solr server crashed when the memory usage of the server raise up to 5G. 2013/12/10 michael.boom my_sky...@yahoo.com Which are you solr startup parameters (java options) ? You can assign more memory to the JVM by specifying -Xmx=10G or whichever value works for you. - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Well, I want to include everything will start in the next 5 minute interval and everything that came before. The query is more like: fq=start_time:[* TO NOW+5MINUTE/5MINUTE] so that it rounds to the nearest 5 minute interval on the right-hand side. But, as soon as 1 second after that 5 minute window, everything pauses wanting for filter cache (at least that's my working theory based on observation). Is it possible to do something like: fq=start_time:[* TO NOW+1DAY/DAY]q=start_time:[* TO NOW/MINUTE] where it would use the filter cache to narrow down by day resolution and then filter as part of the standard query, or something like that? My thought is that this would still gain a benefit from a query cache, but somewhat slower since it must remove results for things appearing later in the day. If you want a start time within the next 5 minutes, I think your filter is not the good one. * will be replaced by the first date in your field Try : fq=start_time:[NOW TO NOW+5MINUTE] Franck Brisbart Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit : I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run multiple data centers using Solr and I was comparing garbage collection processes between and noted that the old generation is collected very differently on this data center versus others. The old generation is collected as a massive collect event (several gigabytes worth) - the other data center is more saw toothed and collects only in 500MB-1GB at a time. Here's my parameters to java (the same in all environments): /usr/java/jre/bin/java \
Re: [Solr Wiki] Your wiki account data
: Is this email address still valid? : : Kind Regards Mehdi: i don't understand your question, particularly in the context of the thread you are replying to. On Dec 4, you asked if your wiki id (madeinch) could be added to the editing group for the solr wiki, and Erick Erickson replied on the same day that he did that. You now have the ability to edit the wiki using that wiki account, but if you are having problems loging into that account that may be a seperate problem? (It's not clear what you were asking about when you forwarded the password recovery email bellow) : 2013/12/4 Mehdi Burgy gla...@gmail.com : : Hello, : : We've recently launched a job search engine using Solr, and would like to : add it here: https://wiki.apache.org/solr/PublicServers : : Would it be possible to allow me be part of the publishing group? : : Thank you for your help : : Kind Regards, : : Mehdi Burgy : New Job Search Engine: : www.jobreez.com : : -- Forwarded message -- : From: Apache Wiki wikidi...@apache.org : Date: 2013/12/4 : Subject: [Solr Wiki] Your wiki account data : To: Apache Wiki wikidi...@apache.org : : : : Somebody has requested to email you a password recovery token. : : If you lost your password, please go to the password reset URL below or : go to the password recovery page again and enter your username and the : recovery token. : : Login Name: madeinch : : : : -Hoss http://www.lucidworks.com/
Displaying actual field values and searching lowercase ignoring spaces
Values of the field [street] in my DB may be Castle Road However, I want to be able to find these values using lowercase including dashes, so castle-road would be a match. When I use fieldtype text_lower_space, which holds a solr.WhitespaceTokenizerFactory, the value is split in 2 values, Castle and Road. When I use type string of fieldtype solr.StrField, I can not search lowercase and still find values which hold uppercase characters, such as Castle Road. I need to be able to find values (regardless of their casing) using a lowercase query. I will be using the [street] field to display facets, so the text displayed to the user should be the exact value including casing from field [street], however, when I search on the field, castle-road should return a match. original value found on Castle Road castle-road Oak-tree lane oak-tree-lane The problem now is that I don't know which tokenizer I need to use, both for index and query. fieldType name=text_lower_space class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Displaying-actual-field-values-and-searching-lowercase-ignoring-spaces-tp4105723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: JVM crashed when start solr
On 12/9/2013 10:29 AM, Boogie Shafer wrote: you may want to start by updating both your solr and JVM to more recent releases. looks like you are running solr 4.3.0 and java 6 u31 in your trace. i would suggest trying with solr 4.5.1 and java 7 u45. There are bugs in Java 7 which make using 7u40 and 7u45 problematic. The 7u25 version works OK. Here's an issue that mentions 7u40, but it's still an issue with 7u45. https://issues.apache.org/jira/browse/LUCENE-5212 This bug has been fixed and should be in 7u60 when that gets released. https://bugs.openjdk.java.net/browse/JDK-8024830 I thought there was another issue specific for 7u45, but I can't seem to locate it. Thanks, Shawn
Re: passing SYS_REFCURSOR as out parameter for Oracle stored procedure
I would probably do something like create a function that called your stored procedure and returned the function, and then call TABLE() on the result of your function so that DataImportHandler gets something that looks like a table to it. I'm not sure that DataImportHandler is set up to deal with cursors or out parameters. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinionshttps://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Fri, Dec 6, 2013 at 5:18 AM, aniljayanti aniljaya...@yahoo.co.in wrote: Hi, I am using solr 3.3 for index generation with sql server, generating index successfully, now I am trying to generate with Oracle DB. I am using *UDP_Getdetails* procedure to generate the required indexes. In this procedure its taking 2 inputs and 1 output parameters. *input params : id name output params : cv_1 IN OUT SYS_REFCURSOR* In solr, data-config.xml below is my configuration. *entity name=index query=UDP_Getdetails(32,'GT', ); * I donot know how to pass *SYS_REFCURSOR* to procedure in solr. Please help me out of this. Thanks in Advance, Aniljayanti -- View this message in context: http://lucene.472066.n3.nabble.com/passing-SYS-REFCURSOR-as-out-parameter-for-Oracle-stored-procedure-tp4105307.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ANNOUNCE: Apache Solr Reference Guide 4.6
: But it still has the error about TrimFilterFactory in it, which I reported a couple of days back. Bernd, thanks for reporting this -- I did not notice your email when you initially sent it, but it was after the vote for hte RC began anyway, and was not brought up in the VOTE thread as a blocker. I've updated the docs to fix this... https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions In the future, if you have comments/suggestions about doc improvements, please post them as comments in the ref guide -- that not only makes them directly accessible by people reviewing the online copy, but also helps them stand out better when folks are reviewing the docs for bugs just prior to release. thanks again for catching this. -Hoss http://www.lucidworks.com/
RE: JVM crashed when start solr
aah good to know. i hadn't seen any issues on our solr 4.5.1 setups with 7u45 yet but perhaps we've just been lucky so far. From: Shawn Heisey s...@elyograg.org Sent: Monday, December 09, 2013 09:46 To: solr-user@lucene.apache.org Subject: Re: JVM crashed when start solr On 12/9/2013 10:29 AM, Boogie Shafer wrote: you may want to start by updating both your solr and JVM to more recent releases. looks like you are running solr 4.3.0 and java 6 u31 in your trace. i would suggest trying with solr 4.5.1 and java 7 u45. There are bugs in Java 7 which make using 7u40 and 7u45 problematic. The 7u25 version works OK. Here's an issue that mentions 7u40, but it's still an issue with 7u45. https://issues.apache.org/jira/browse/LUCENE-5212 This bug has been fixed and should be in 7u60 when that gets released. https://bugs.openjdk.java.net/browse/JDK-8024830 I thought there was another issue specific for 7u45, but I can't seem to locate it. Thanks, Shawn
Re: Indexing on plain text and binary data in a single HTTP POST request
On 09 Dec 2013, at 17:20 , neerajp neeraj_star2...@yahoo.com wrote: 2) Your binary content is encoded in some way inside XML, right? Not just random binary, which would make it invalid XML? Like base64 or something? [Neeraj]: I want to use random binary(*not base64 encoded*) in some of the XML fields inside CDATA tag so that XML will not become invalid. I hope I can do this. You can't – there are binary values that are simply not acceptable in an XML stream. Encoding the binary is the canonical way around this. That said, the obvious alternative is to use /update/extract instead of /update – this gives you a way of handling up to one binary stream in addition to any number of fields that can be represented as text. In that case, you need to construct a POST request that sends the binary content as a file stream, and the other parameters as ordinary form data (actually, it may be possible to send some/all of the other fields as url parameters, but that does not really simplify things).
Re: ANNOUNCE: Apache Solr Reference Guide 4.6
Can we please give some thought to producing these manuals in ebook formats? On Mon, Dec 2, 2013 at 12:28 PM, Chris Hostetter hoss...@apache.org wrote: The Lucene PMC is pleased to announce the release of the Apache Solr Reference Guide for Solr 4.6. This 347 page PDF serves as the definitive users manual for Solr 4.6. The Solr Reference Guide is available for download from the Apache mirror network: https://www.apache.org/dyn/closer.cgi/lucene/solr/ref-guide/ (If you have followup questions, please send them only to solr-user@lucene.apache.org) -Hoss
Re: Getting Solr Document Attributes from a Custom Function
Smells like an XY problem ... Can you please describe what your end goal is in writing a custom function, and what you would do with things like the name field inside your funciton? In general, accessing stored field values for indexed documents ca be prohibitively expensive, it rather defeats the entire point of the inverted index data structure. If you help us understand what your goal is, people may be able to offer performant suggestions. https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 : Date: Mon, 9 Dec 2013 20:24:15 +0530 : From: Mukundaraman valakumaresan muk...@8kmiles.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Getting Solr Document Attributes from a Custom Function : : Hi All, : : I have a written a custom solr function and I would like to read a property : of the document inside my custom function. Is it possible to get that using : Solr? : : For eg. inside the floatVal method, I would like to get the value of the : attribute name : : public class CustomValueSource extends ValueSource { : : @Override : public FunctionValues getValues(Map context, : AtomicReaderContext readerContext) throws IOException { : return new FloatDocValues(this) { @Override public float floatVal(int doc) : { : /*** : getDocument(doc).getAttribute(name) : : / }}} : : Thanks Regards : Mukund : -Hoss http://www.lucidworks.com/
Re: ANNOUNCE: Apache Solr Reference Guide 4.6
: Can we please give some thought to producing these manuals in ebook formats? People have given it thought, but it's not as simple as just snapping our fingers and making it happen. If you would like to contibute to the effort of figuring out the how/where/what to make this happening, there is an existing jira for dicussing it. https://issues.apache.org/jira/browse/SOLR-5467 -Hoss http://www.lucidworks.com/
Re: Bad fieldNorm when using morphologic synonyms
no, its turned on by default in the default similarity. as i said, all that is necessary is to fix your analyzer to emit the proper position increments. On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: In order to set discountOverlaps to true you must have added the similarity class=solr.DefaultSimilarityFactory to the schema.xml, which is commented out by default! As by default this param is false, the above situation is expected with correct positioning, as said. In order to fix the field norms you'd have to reindex with the similarity class which initializes the param to true. Cheers, Manu
Re: Bad fieldNorm when using morphologic synonyms
Hi Robert and Manuel. The DefaultSimilarity indeed sets discountOverlap to true by default. BUT, the *factory*, aka DefaultSimilarityFactory, when called by IndexSchema (the getSimilarity method), explicitly sets this value to the value of its corresponding class member. This class member is initialized to be FALSE when the instance is created (like every boolean variable in the world). It should be set when init method is called. If the parameter is not set in schema.xml, the default is true. Everything seems to be alright, but the issue is that init method is NOT called, if the similarity is not *explicitly* declared in schema.xml. In that case, init method is not called, the discountOverlaps member (of the factory class) remains FALSE, and getSimilarity explicitly calls setDiscountOverlaps with value of FALSE. This is very easy to reproduce and debug. On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir rcm...@gmail.com wrote: no, its turned on by default in the default similarity. as i said, all that is necessary is to fix your analyzer to emit the proper position increments. On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: In order to set discountOverlaps to true you must have added the similarity class=solr.DefaultSimilarityFactory to the schema.xml, which is commented out by default! As by default this param is false, the above situation is expected with correct positioning, as said. In order to fix the field norms you'd have to reindex with the similarity class which initializes the param to true. Cheers, Manu
Re: ANNOUNCE: Apache Solr Reference Guide 4.6
Is it possible to export the doc into markdown? - Mensaje original - De: Chris Hostetter hossman_luc...@fucit.org Para: solr-user@lucene.apache.org Enviados: Lunes, 9 de Diciembre 2013 14:00:34 Asunto: Re: ANNOUNCE: Apache Solr Reference Guide 4.6 : Can we please give some thought to producing these manuals in ebook formats? People have given it thought, but it's not as simple as just snapping our fingers and making it happen. If you would like to contibute to the effort of figuring out the how/where/what to make this happening, there is an existing jira for dicussing it. https://issues.apache.org/jira/browse/SOLR-5467 -Hoss http://www.lucidworks.com/ III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu
Re: LocalParam for nested query without escaping?
If so, can someone suggest how a query should be escaped (securely and correctly)? Should I escape the quote mark (and backslash mark itself) only? On Fri, Dec 6, 2013 at 2:59 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Obviously, there is the option of external parameter ({... v=$nestedq}nestedq=...) This is a good solution, but it is not practical, when having a lot of such nested queries. Any ideas? On Friday, December 6, 2013, Isaac Hebsh wrote: We want to set a LocalParam on a nested query. When quering with v inline parameter, it works fine: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND {!lucene df=text v=TERM2 TERM3 \TERM4 TERM5\} the parsedquery_toString is +id:TERM1 +(text:term2 text:term3 text:term4 term5) Query using the _query_ also works fine: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND _query_:{!lucene df=text}TERM2 TERM3 \TERM4 TERM5\ (parsedquery is exactly the same). BUT, when trying to put the nested query in place, it yields syntax error: http://localhost:8983/solr/collection1/select?debugQuery=truedefType=lucenedf=idq=TERM1AND {!lucene df=text}(TERM2 TERM3 TERM4 TERM5) org.apache.solr.search.SyntaxError: Cannot parse '(TERM2' The previous options are less preferred, because the escaping that should be made on the nested query. Can't I set a LocalParam to a nested query without escaping the query?
Re: Global query parameters to facet query
created SOLR-5542. Anyone else want it? On Thu, Dec 5, 2013 at 8:55 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi, It seems that a facet query does not use the global query parameters (for example, field aliasing for edismax parser). We have an intensive use of facet queries (in some cases, we have a lot of facet.query for a single q), and the using of LocalParams for each facet.query is not convenient. Did I miss a normal way to solve it? Did anyone else encountered this requirement?
Re: Bad fieldNorm when using morphologic synonyms
Isaac, is there an easy way to recognize this problem? We also index synonym tokens in the same position (like you do, and I'm sure that our positions are set correctly). I could test whether the default similarity factory in solrconfig.xml had any effect (before/after reindexing). --roman On Mon, Dec 9, 2013 at 2:42 PM, Isaac Hebsh isaac.he...@gmail.com wrote: Hi Robert and Manuel. The DefaultSimilarity indeed sets discountOverlap to true by default. BUT, the *factory*, aka DefaultSimilarityFactory, when called by IndexSchema (the getSimilarity method), explicitly sets this value to the value of its corresponding class member. This class member is initialized to be FALSE when the instance is created (like every boolean variable in the world). It should be set when init method is called. If the parameter is not set in schema.xml, the default is true. Everything seems to be alright, but the issue is that init method is NOT called, if the similarity is not *explicitly* declared in schema.xml. In that case, init method is not called, the discountOverlaps member (of the factory class) remains FALSE, and getSimilarity explicitly calls setDiscountOverlaps with value of FALSE. This is very easy to reproduce and debug. On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir rcm...@gmail.com wrote: no, its turned on by default in the default similarity. as i said, all that is necessary is to fix your analyzer to emit the proper position increments. On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: In order to set discountOverlaps to true you must have added the similarity class=solr.DefaultSimilarityFactory to the schema.xml, which is commented out by default! As by default this param is false, the above situation is expected with correct positioning, as said. In order to fix the field norms you'd have to reindex with the similarity class which initializes the param to true. Cheers, Manu
Replicating from the correct collections in SolrCloud on solr start
I have a Solr configuration that I am trying to replicate on several machines as part of a package installation. I have a cluster of machines that will run the SolrCloud, with 3 machines in the cluster running a zookeeper ensemble. As part of the installation of each machine, Solr is started with the desired configuration uploaded (java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=ipaddress1:2181,ipaddress2:2181,ipaddress3:2181 -jar start.jar). My problem is that when I add a new machine to my SolrCloud cluster, I expect it to replicate data from the collections I have in SolrCloud. This doesn't appear to be happening. Instead, each new machine just replicates the default collection1 collection. I'd added the collection in question with this command: http://localhost:8983/solr/admin/collections?action=CREATEname=SolrCloudTestnumShards=1replicationFactor=2collection.configName=myconf So my question is simple: Why is it that when I start a new Solr instance on the same zookeeper ensemble, it does not replicate the data from the SolrCloudTest collection, and instead only replicates collection1? -- View this message in context: http://lucene.472066.n3.nabble.com/Replicating-from-the-correct-collections-in-SolrCloud-on-solr-start-tp4105754.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Bad fieldNorm when using morphologic synonyms
You can see the norm value, in the explain text, when setting debugQuery=true. If the same item gets different norm before/after, that's it. Note that this configuration is in schema.xml (not solrconfig.xml...) On Monday, December 9, 2013, Roman Chyla wrote: Isaac, is there an easy way to recognize this problem? We also index synonym tokens in the same position (like you do, and I'm sure that our positions are set correctly). I could test whether the default similarity factory in solrconfig.xml had any effect (before/after reindexing). --roman On Mon, Dec 9, 2013 at 2:42 PM, Isaac Hebsh isaac.he...@gmail.comjavascript:; wrote: Hi Robert and Manuel. The DefaultSimilarity indeed sets discountOverlap to true by default. BUT, the *factory*, aka DefaultSimilarityFactory, when called by IndexSchema (the getSimilarity method), explicitly sets this value to the value of its corresponding class member. This class member is initialized to be FALSE when the instance is created (like every boolean variable in the world). It should be set when init method is called. If the parameter is not set in schema.xml, the default is true. Everything seems to be alright, but the issue is that init method is NOT called, if the similarity is not *explicitly* declared in schema.xml. In that case, init method is not called, the discountOverlaps member (of the factory class) remains FALSE, and getSimilarity explicitly calls setDiscountOverlaps with value of FALSE. This is very easy to reproduce and debug. On Mon, Dec 9, 2013 at 9:19 PM, Robert Muir rcm...@gmail.comjavascript:; wrote: no, its turned on by default in the default similarity. as i said, all that is necessary is to fix your analyzer to emit the proper position increments. On Mon, Dec 9, 2013 at 12:27 PM, Manuel Le Normand manuel.lenorm...@gmail.com javascript:; wrote: In order to set discountOverlaps to true you must have added the similarity class=solr.DefaultSimilarityFactory to the schema.xml, which is commented out by default! As by default this param is false, the above situation is expected with correct positioning, as said. In order to fix the field norms you'd have to reindex with the similarity class which initializes the param to true. Cheers, Manu
Re: JVM crashed when start solr
And it was only reproduced with JVM 32 bits, not 64 bits. Guido. On 09/12/13 17:46, Shawn Heisey wrote: On 12/9/2013 10:29 AM, Boogie Shafer wrote: you may want to start by updating both your solr and JVM to more recent releases. looks like you are running solr 4.3.0 and java 6 u31 in your trace. i would suggest trying with solr 4.5.1 and java 7 u45. There are bugs in Java 7 which make using 7u40 and 7u45 problematic. The 7u25 version works OK. Here's an issue that mentions 7u40, but it's still an issue with 7u45. https://issues.apache.org/jira/browse/LUCENE-5212 This bug has been fixed and should be in 7u60 when that gets released. https://bugs.openjdk.java.net/browse/JDK-8024830 I thought there was another issue specific for 7u45, but I can't seem to locate it. Thanks, Shawn
Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField
Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField I am attempting to migrate from Solr 4.3 to Solr 4.6. When I run the example in 4.6, I get warnings SortableIntField etc. asking me to consult the documentation to replace them accordingly. If these classes are deprecated, I think it would not be a good idea to use them in the examples as in: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_6/solr/example/example-DIH/solr/db/conf/schema.xml Here, weight, price and popularity seem to use the deprecated sfloat and sint. Does anyone know where I can find documentation to replace these classes in my schema file. Thank you, O. O.
Re: Solr 3.6.1 stalling with high CPU and blocking on field cache
Patrick, Are you getting these stalls following a commit? If so then the issue is most likely fieldCache warming pauses. To stop your users from seeing this pause you'll need to add static warming queries to your solrconfig.xml to warm the fieldCache before it's registered . On Mon, Dec 9, 2013 at 12:33 PM, Patrick O'Lone pol...@townnews.com wrote: Well, I want to include everything will start in the next 5 minute interval and everything that came before. The query is more like: fq=start_time:[* TO NOW+5MINUTE/5MINUTE] so that it rounds to the nearest 5 minute interval on the right-hand side. But, as soon as 1 second after that 5 minute window, everything pauses wanting for filter cache (at least that's my working theory based on observation). Is it possible to do something like: fq=start_time:[* TO NOW+1DAY/DAY]q=start_time:[* TO NOW/MINUTE] where it would use the filter cache to narrow down by day resolution and then filter as part of the standard query, or something like that? My thought is that this would still gain a benefit from a query cache, but somewhat slower since it must remove results for things appearing later in the day. If you want a start time within the next 5 minutes, I think your filter is not the good one. * will be replaced by the first date in your field Try : fq=start_time:[NOW TO NOW+5MINUTE] Franck Brisbart Le lundi 09 décembre 2013 à 09:07 -0600, Patrick O'Lone a écrit : I have a new question about this issue - I create a filter queries of the form: fq=start_time:[* TO NOW/5MINUTE] This is used to restrict the set of documents to only items that have a start time within the next 5 minutes. Most of my indexes have millions of documents with few documents that start sometime in the future. Nearly all of my queries include this, would this cause every other search thread to block until the filter query is re-cached every 5 minutes and if so, is there a better way to do it? Thanks for any continued help with this issue! We have a webapp running with a very high HEAP size (24GB) and we have no problems with it AFTER we enabled the new GC that is meant to replace sometime in the future the CMS GC, but you have to have Java 6 update Some number I couldn't find but latest should cover to be able to use: 1. Remove all GC options you have and... 2. Replace them with /-XX:+UseG1GC -XX:MaxGCPauseMillis=50/ As a test of course, more information you can read on the following (and interesting) article, we also have Solr running with these options, no more pauses or HEAP size hitting the sky. Don't get bored reading the 1st (and small) introduction page of the article, page 2 and 3 will make lot of sense: http://www.drdobbs.com/jvm/g1-javas-garbage-first-garbage-collector/219401061 HTH, Guido. On 26/11/13 21:59, Patrick O'Lone wrote: We do perform a lot of sorting - on multiple fields in fact. We have different kinds of Solr configurations - our news searches do little with regards to faceting, but heavily sort. We provide classified ad searches and that heavily uses faceting. I might try reducing the JVM memory some and amount of perm generation as suggested earlier. It feels like a GC issue and loading the cache just happens to be the victim of a stop-the-world event at the worse possible time. My gut instinct is that your heap size is way too high. Try decreasing it to like 5-10G. I know you say it uses more than that, but that just seems bizarre unless you're doing something like faceting and/or sorting on every field. -Michael -Original Message- From: Patrick O'Lone [mailto:pol...@townnews.com] Sent: Tuesday, November 26, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Solr 3.6.1 stalling with high CPU and blocking on field cache I've been tracking a problem in our Solr environment for awhile with periodic stalls of Solr 3.6.1. I'm running up to a wall on ideas to try and thought I might get some insight from some others on this list. The load on the server is normally anywhere between 1-3. It's an 8-core machine with 40GB of RAM. I have about 25GB of index data that is replicated to this server every 5 minutes. It's taking about 200 connections per second and roughly every 5-10 minutes it will stall for about 30 seconds to a minute. The stall causes the load to go to as high as 90. It is all CPU bound in user space - all cores go to 99% utilization (spinlock?). When doing a thread dump, the following line is blocked in all running Tomcat threads: org.apache.lucene.search.FieldCacheImpl$Cache.get ( FieldCacheImpl.java:230 ) Looking the source code in 3.6.1, that is a function call to syncronized() which blocks all threads and causes the backlog. I've tried to correlate these events to the replication events - but even with replication disabled - this still happens. We run
Re: Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField
Javadoc for this deprecated classes suggest to use TrieIntField, TrieFloatField and TrieDoubleField respectively instead 10.12.2013, 01:19, O. Olson olson_...@yahoo.it: Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField I am attempting to migrate from Solr 4.3 to Solr 4.6. When I run the example in 4.6, I get warnings SortableIntField etc. asking me to consult the documentation to replace them accordingly. If these classes are deprecated, I think it would not be a good idea to use them in the examples as in: http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_6/solr/example/example-DIH/solr/db/conf/schema.xml Here, weight, price and popularity seem to use the deprecated sfloat and sint. Does anyone know where I can find documentation to replace these classes in my schema file. Thank you, O. O.
Re: Replicating from the correct collections in SolrCloud on solr start
This is currently as designed / expected. The reason that collection is replicated is because it's configured by default in a default Solr install. When you use the collections API, it only takes into account the current nodes. Eventually, there will be a mode where the Overseer will create/remove SolrCore's based on the replicationFactor, etc as you add and remove nodes, but that is not yet supported. If you add a node after the fact and want a replica on it, you have to preconfigure the SolrCore as is done with collection1 before starting the node, or use the Core Admin API to add the new SolrCore and make sure it's collection param matches the collection you want to add it too and the shard param matches the shard you want to add it to. On Mon, Dec 9, 2013 at 12:40 PM, cwhi chris.whi...@gmail.com wrote: I have a Solr configuration that I am trying to replicate on several machines as part of a package installation. I have a cluster of machines that will run the SolrCloud, with 3 machines in the cluster running a zookeeper ensemble. As part of the installation of each machine, Solr is started with the desired configuration uploaded (java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=ipaddress1:2181,ipaddress2:2181,ipaddress3:2181 -jar start.jar). My problem is that when I add a new machine to my SolrCloud cluster, I expect it to replicate data from the collections I have in SolrCloud. This doesn't appear to be happening. Instead, each new machine just replicates the default collection1 collection. I'd added the collection in question with this command: http://localhost:8983/solr/admin/collections?action=CREATEname=SolrCloudTestnumShards=1replicationFactor=2collection.configName=myconf So my question is simple: Why is it that when I start a new Solr instance on the same zookeeper ensemble, it does not replicate the data from the SolrCloudTest collection, and instead only replicates collection1? -- View this message in context: http://lucene.472066.n3.nabble.com/Replicating-from-the-correct-collections-in-SolrCloud-on-solr-start-tp4105754.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Mark
Re: solr.xml
Sounds like a bug. If you are seeing this happen in 4.6, I'd file a JIRA issue. - Mark On Sun, Dec 8, 2013 at 3:49 PM, William Bell billnb...@gmail.com wrote: Any thoughts? Why are we getting duplicate items in solr.xml ? -- Forwarded message -- From: William Bell billnb...@gmail.com Date: Sat, Dec 7, 2013 at 1:48 PM Subject: solr.xml To: solr-user@lucene.apache.org solr-user@lucene.apache.org We are having issues with SWAP CoreAdmin in 4.5.1 and 4.6. Using legacy solr.xml we issue a SWAP, and we want it persistent. It has bee running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in 4.5.1 doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- - Mark
Re: Prioritize search returns by URL path?
1) i would strongly advise you against falling in the trap of thinking things like Wiki posts should always be returned higher than blog posts ... unless you truly want *any* wiki post that matches your keywords, no matter how tangentially and how poorly, to come back higher on the list of results that any blog post -- evne if that blog post is 100% dedicated to the keywords the user searched for. if that's really want you want, then all you need is sort=doc_type desc, score desc where you assign a numeric doct_type value at index type -- but i assure you, it's a terrible idea. 2) in general, what you are interesting in is domain boosting ... where because of the specifics of your domain knowledge, you know that certain documents should generally score higher -- how much higher is an art form, that again is going to largely dependon the specifics of your domain, but you will most likeley want it to be something you can tweak and tune. 3) regardless of the specifics of the website you are dealing with, and the URL structure used, what really matters is how you convert the raw data on your website into documents to be indexed -- when you do that, however you do that, is when you can add fields to your documents to convey information like this document is from the wiki or this document is from the forum or this doument is a verified forum answer. If the only way you can conceptually know this information is by parsing the URL, then so be it -- but more then likeley if you are reading this data directly from an authoritative source (instead of just crawling URLs), there are easy methods to determine this stuff. . . . My initial suggestion would be to create a simple field called doc_type containing values like wiki, blog, forum, forum_verified, and forum_suggested ... with those values *indexed* for each doc, you can then use the ExternalFileField to associate a numeric value to each of those special values, and you can tune tweak those numeric values w/o re-indexing. Then you should look into how boost functions work to make those numeric values an input into the final score calculations. In the long run hwoever, you may want ot consider indexing a general importance value for each doc that you re-compute periodically based not just on the *type* of the document, but also things like the number of page views, the number of votes for forum answers to be verified, etc... More information about domain boosting... https://people.apache.org/~hossman/ac2012eu/ http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630 On Fri, 6 Dec 2013, Jim Glynn wrote: : Date: Fri, 6 Dec 2013 13:10:59 -0800 (PST) : From: Jim Glynn jrgl...@hotmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re: Prioritize search returns by URL path? : : Thanks all. Yes, we can differentiate between content types by URL. : Everything else being equal, Wiki posts should always be returned higher : than blog posts, and blog posts should always be returned higher than forum : posts. : : Within forum posts, we want to rank Verified answered and Suggested answered : posts higher than unanswered posts. These cannot be identified via path - : only via metadata attached to the individual post. Any suggestions? : : @Alex, I'll investigate the references you provided. Thanks! : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/Prioritize-search-returns-by-URL-path-tp4105023p4105426.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss http://www.lucidworks.com/
Re: Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField
Thank you kydryavtsev andrey. Could you please suggest some examples. There is no documentation on this. Also is there a reason why these classes are not used in the examples even though they are deprecated? I am looking for examples like below: Should I put the following in my schema.xml file to use the TrieIntField: fieldType name=sint class=solr.TrieIntField sortMissingLast=true omitNorms=true/ Is this specification correct? Should it also have the sortMissingLast and omitNorms, because I want something that I can use for sorting? I have no clue how you get these. Thank you again, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Use-of-Deprecated-Classes-SortableIntField-SortableFloatField-SortableDoubleField-tp4105762p4105781.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching for document by id in a sharded environment
Daniel, What version of Solr are you using? I'll see if I can recreate this. On Mon, Dec 9, 2013 at 7:21 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Daniel, TermQueryParser comes handy when you don't want to escape. q = {!term f=id}156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 On Monday, December 9, 2013 2:14 PM, Daniel Bryant daniel.bry...@tai-dev.co.uk wrote: Hi, I'm in the process of migrating an application that queries Solr to use a new sharded SolrCloud, and as part of this I'm adding the shard key to the document id when we index documents (as we're using grouping and we need to ensure that grouped documents end up on the same shard) e.g. 156a05d1-8ebe-4f3c-b548-60a84d167a16!643fd57c-c65e-4929-bc0e-029aa4f07475 I'm having a problem with my application when searching by id with SolrJ CloudSolrServer - the exclamation point is misinterpreted as a boolean negation, and the matching document is not returned in the search results. I just wanted to check if the only way to make this work would be to escape the exclamation point (i.e. prefix with a slash, or enclose the id within quotes). We're keen to avoid this, as this will require lots of modifications throughout the code on a series of applications that interact with Solr. If anyone has any better suggestions on how to achieve this it would be very much appreciated! Best wishes, Daniel -- *Daniel Bryant | Software Development Consultant | www.tai-dev.co.uk http://www.tai-dev.co.uk/* daniel.bry...@tai-dev.co.uk mailto:daniel.bry...@tai-dev.co.uk | +44 (0) 7799406399 | Twitter: @taidevcouk https://twitter.com/taidevcouk -- Joel Bernstein Search Engineer at Heliosearch
Newbie to SOLR with ridiculously simple questions
OK... Im a Windows guy who is being forced to learn SoLR on Ubuntu for the whole organizations. I fancy myself somewhat capable of following directions but this Solr concept is puzzling. Here is what I think i know. Solr houses indexes. Each index record (usually based on a document) need to be added to the Solr collection. This seems fairly simple and I can run the post.jar and various xml and json files FROM THE UBUNTU TERMINAL. I doubt you have to use the Terminal every time you want to add an index. My guess is that you have to feed Solr from third party systems using the http: update url into the solr server. Is this correct? Lets say i have a (god forbid) a sharepoint site and I want to move all the document text and document metadata into Solr. Do I simply run a script (say in .NET or Coldfusion) that loops through the SP doc records and sends out the http update url to Solr for each doc??? How does Tika fit in ? thanks steve -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to SOLR with ridiculously simple questions
Hi Steve, Good luck. I would start from doing online tutorial if you haven't already (do it on Windows) and then reading a book. There are several on the market, including my own for the beginners ( http://blog.outerthoughts.com/2013/06/my-book-on-solr-is-now-published/ ). For SharePoint, I would look at http://manifoldcf.apache.org/en_US/ , they seem to be covering that use case specifically and sending information to Solr. For more general case, I would look at SolrNet ( https://github.com/mausch/SolrNet/blob/master/Documentation/README.md ). To use Solr 4 with SorlNet, you would need to get the latest build or build it yourself from source, it is not terribly complicated. Tika, is a separate Apache project bundled with Solr and is used to parse binary files (e.g. PDFs, MSWord, etc) and extract whatever is possible, usually structured metadata and some sort of internal text. For the interface, there is a couple of options, though most people are rolling their own. The main reason is because you should NOT expose Solr directly to the web (not secure), so there is a need for Solr middleware. Solr middleware is usually custom with project-specific enhancements, etc. But you could have a look at Hue for internal/intermediate usage. Hue is for Hadoop ecosystem, but does include Solr support too: http://gethue.tumblr.com/tagged/search The most important point to remember when you are understanding Solr is that it is there for _search_. You shape your data to match that purpose. If that breaks relationships and duplicates data in Solr, that's fine. You still have your primary data safe in relational/document storage. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Dec 10, 2013 at 6:13 AM, smetzger smetz...@msi-inc.com wrote: OK... Im a Windows guy who is being forced to learn SoLR on Ubuntu for the whole organizations. I fancy myself somewhat capable of following directions but this Solr concept is puzzling. Here is what I think i know. Solr houses indexes. Each index record (usually based on a document) need to be added to the Solr collection. This seems fairly simple and I can run the post.jar and various xml and json files FROM THE UBUNTU TERMINAL. I doubt you have to use the Terminal every time you want to add an index. My guess is that you have to feed Solr from third party systems using the http: update url into the solr server. Is this correct? Lets say i have a (god forbid) a sharepoint site and I want to move all the document text and document metadata into Solr. Do I simply run a script (say in .NET or Coldfusion) that loops through the SP doc records and sends out the http update url to Solr for each doc??? How does Tika fit in ? thanks steve -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Global query parameters to facet query
: It seems that a facet query does not use the global query parameters (for : example, field aliasing for edismax parser). can you please give a specific example of a query that isn't working for you? Using this query against the examle data, things work exactly as i would expect showing that the QParsers used for facet.queries inherit the global params (unless overridden by local params of course)... http://localhost:8983/solr/select?q=*:*wt=jsonindent=truefacet=truefacet.query={!dismax}solr+bogusfacet.query={!dismax%20mm=1}solr+bogusfacet.query={!dismax%20mm=1%20qf=%27foo_t%27}solr+bogusrows=0mm=2qf=name { responseHeader:{ status:0, QTime:2, params:{ mm:2, facet:true, indent:true, facet.query:[{!dismax}solr bogus, {!dismax mm=1}solr bogus, {!dismax mm=1 qf='foo_t'}solr bogus], q:*:*, qf:name, wt:json, rows:0}}, response:{numFound:32,start:0,docs:[] }, facet_counts:{ facet_queries:{ {!dismax}solr bogus:0, {!dismax mm=1}solr bogus:1, {!dismax mm=1 qf='foo_t'}solr bogus:0}, facet_fields:{}, facet_dates:{}, facet_ranges:{}}} -Hoss http://www.lucidworks.com/
Re: Newbie to SOLR with ridiculously simple questions
Thanks for the reply Alex... in fact I am using your book! the book seems like a good tutorial ... My bitnami solr instance however already includes Solr (running in background) and a directory structure : root --opt bitnami --apache-solr solr --collection1 I assume that the apache-solr directory is the same as the universal example directory mentioned in many tutorials. If I follow your book I create a new directory under apache-solr called SOLR-INDEXING with the collection1/conf/ and .xml files per your instruction. but now i have two instances running and somehow I need to point solr from the solr/collection1 core to the SOLR-INDEXING/collection1 core I would think this could be done on the Solr Admin page but can't see how. If i try and restart the jetty with java -Dsolr.solr.home=SOLR-INDEXING -jar start.jarit runs and does some install but I think it does not shut down the prior one first. In fact once i run that i lose all my solr and have to reinstall the VMWARE snapshot. Any guidance would be useful so I can continue with your book. Thanks steve -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788p4105812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Newbie to SOLR with ridiculously simple questions
I think you might be complicating your life with BitNami stack during learning. I would just download latest Solr to your Windows desktop and go through the examples there. Still, you can try moving collection1 directory under 'solr' and putting my examples there instead. Then, you don't need to change any scripts. Or rename collection1 to another name and add it to solr.xml as per instructions in the book to have it as a second core. Basically, change the content of 'solr' directory rather than the scripts that make it work. But then you still need need to know where the libraries are as I bet the file path would be different from my book's instructions. Use 'locate' command on unix to find where the jar might be. Just make sure BitNami stack Solr is at least 4.3 (4.3.1?) as per book's minimum requirements. Otherwise, more advanced examples will fail in strange ways. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Dec 10, 2013 at 8:22 AM, smetzger smetz...@msi-inc.com wrote: Thanks for the reply Alex... in fact I am using your book! the book seems like a good tutorial ... My bitnami solr instance however already includes Solr (running in background) and a directory structure : root --opt bitnami --apache-solr solr --collection1 I assume that the apache-solr directory is the same as the universal example directory mentioned in many tutorials. If I follow your book I create a new directory under apache-solr called SOLR-INDEXING with the collection1/conf/ and .xml files per your instruction. but now i have two instances running and somehow I need to point solr from the solr/collection1 core to the SOLR-INDEXING/collection1 core I would think this could be done on the Solr Admin page but can't see how. If i try and restart the jetty with java -Dsolr.solr.home=SOLR-INDEXING -jar start.jarit runs and does some install but I think it does not shut down the prior one first. In fact once i run that i lose all my solr and have to reinstall the VMWARE snapshot. Any guidance would be useful so I can continue with your book. Thanks steve -- View this message in context: http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788p4105812.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing on plain text and binary data in a single HTTP POST request
Thanks everybody for throwing your ideas. So, I came to know that XML can not carry random binary data so I will encode the data in base64 format. Yes, I can write a custom URP which can convert the base64 encode fields to binary fields. Now, I have binary fields in my document.* My question is that how can I convert those binary fields to text so that Solr can index them ? * -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105826.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing on plain text and binary data in a single HTTP POST request
Hi, Pls. find my response in-line: That said, the obvious alternative is to use /update/extract instead of /update – this gives you a way of handling up to one binary stream in addition to any number of fields that can be represented as text. In that case, you need to construct a POST request that sends the binary content as a file stream, and the other parameters as ordinary form data (actually, it may be possible to send some/all of the other fields as url parameters, but that does not really simplify things). [Neeraj]: I thought about this solution but it won't work in my solution as there are a lot text fields and size is also very significant. I am looking for some other suggestion -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105827.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing on plain text and binary data in a single HTTP POST request
On 12/9/2013 11:13 PM, neerajp wrote: Hi, Pls. find my response in-line: That said, the obvious alternative is to use /update/extract instead of /update – this gives you a way of handling up to one binary stream in addition to any number of fields that can be represented as text. In that case, you need to construct a POST request that sends the binary content as a file stream, and the other parameters as ordinary form data (actually, it may be possible to send some/all of the other fields as url parameters, but that does not really simplify things). [Neeraj]: I thought about this solution but it won't work in my solution as there are a lot text fields and size is also very significant. I am looking for some other suggestion -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105827.html Sent from the Solr - User mailing list archive at Nabble.com. Assuming that your binary fields are mime attachments to email messages, they will probably already be encoded as base 64. Why not just leave them that way in solr too? You can't do much with them other than store them right? Or do you have some kind of image processing going on? You can always decode them in your client when you pull them out. -Mike
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
We used that syntax in 1.4.1 when Surround was not part of SOLR and has to register it. Didn't know that it is now part of SOLR. Any ways this is a red herring since I have totally removed Surround and the issue remains there. Below is the debug info when I give a simple phrase query having common words with default Query Parser. What I don't understand is that why is it including single tokens as well? I have also included the relevant config part below. rawquerystring: Contents:\only be\, querystring: Contents:\only be\, parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\), parsedquery_toString: Contents:\(only only_be) be\, QParser: LuceneQParser, = fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.CommonGramsFilterFactory words=commonwords.txt ignoreCase=true/ /analyzer /fieldtype On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote: But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
Re: SOLR 4 - Query Issue in Common Grams with Surround Query Parser
Hi Salman, I never used commons gram filer but I remember there are two classes in this family. CommonGramsFilter and CommonGramsQueryFilter. It seems that CommonsGramsQueryFilter is what you are after. http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/commongrams/CommonGramsQueryFilter.html http://khaidoan.wikidot.com/solr-common-gram-filter On Tuesday, December 10, 2013 6:43 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: We used that syntax in 1.4.1 when Surround was not part of SOLR and has to register it. Didn't know that it is now part of SOLR. Any ways this is a red herring since I have totally removed Surround and the issue remains there. Below is the debug info when I give a simple phrase query having common words with default Query Parser. What I don't understand is that why is it including single tokens as well? I have also included the relevant config part below. rawquerystring: Contents:\only be\, querystring: Contents:\only be\, parsedquery: MultiPhraseQuery(Contents:\(only only_be) be\), parsedquery_toString: Contents:\(only only_be) be\, QParser: LuceneQParser, = fieldtype name=text class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.CommonGramsFilterFactory words=commonwords.txt ignoreCase=true/ /analyzer /fieldtype On Mon, Dec 9, 2013 at 7:46 PM, Erik Hatcher erik.hatc...@gmail.com wrote: But again, as Ahmet mentioned… it doesn't look like the surround query parser is actually being used. The debug output also mentioned the query parser used, but that part wasn't provided below. One thing to note here, the surround query parser is not available in 1.4.1. It also looks like you're surrounding your query with angle brackets, as it says query string is {!surround}Contents:only be, which is not correct syntax. And one of the most important things to note here is that the surround query parser does NOT use the analysis chain of the field, see http://wiki.apache.org/solr/SurroundQueryParser#Limitations. In short, you're going to have to do some work to get common grams factored into a surround query (such as maybe calling to the analysis request hander to parse the query before sending it to the surround query parser). Erik On Dec 9, 2013, at 9:36 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Yup on debugging I found that its coming in Analyzer. We are using Standard Analyzer. It seems to be a SOLR 4 issue with Common Grams. Not sure if its a bug or I am missing some config. On Mon, Dec 9, 2013 at 2:03 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Salman, I am confused because with surround no analysis is applied at query time. I suspect that surround query parser is not kicking in. You should see SrndQuery or something like at parser query section. On Monday, December 9, 2013 6:24 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: All, I posted this sub-issue with another issue few days back but maybe it was not obvious so posting it on a separate thread. We recently migrated to SOLR 4.6. We use Common Grams but queries with words in the CG list have slowed down. On debugging we found that for CG words the parser is adding individual tokens of those words in the query too which ends up slowing it. Below is an example: Query = only be Here is what debug shows. I have highlighted the red part which is different in both versions i.e. SOLR 4.6 is making it a multiphrasequery and adding individual tokens too. Can someone help? SOLR 4.6 (takes 20 secs) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryMultiPhraseQuery(Contents:(only only_be) be)/str str name=parsedquery_toStringContents:(only only_be) be/str SOLR 1.4.1 (takes 1 sec) str name=rawquerystring{!surround}Contents:only be/str str name=querystring{!surround}Contents:only be/str str name=parsedqueryContents:only_be/str str name=parsedquery_toStringContents:only_be/str-- Regards, Salman Akram -- Regards, Salman Akram -- Regards, Salman Akram
Re: Getting Solr Document Attributes from a Custom Function
Hi Hoss, Thanks a lot for your response. The actual problem is, For every record that I query, I have to execute a formula and sort the records based on the value of the formula. The formula has elements from the record. For eg. for the following document ,I need to apply the formula (maxprice - solrprice)/ (maxprice - minprice) + count(cities)/totalcities. where maxprice, maxprice and total cities will be available at run time. So for the following record, it has to execute as (1 - *5000*)/(1-2000) + *2*/5 (where 5000 and 2, which are in bold are from the document) doc field name=idapartment_1/field field name=nameCasa Grande/field field name=localitychennai/field field name=localitybangalore/field field name=price5000/field /doc Thanks Regards Mukund On Tue, Dec 10, 2013 at 12:22 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: Smells like an XY problem ... Can you please describe what your end goal is in writing a custom function, and what you would do with things like the name field inside your funciton? In general, accessing stored field values for indexed documents ca be prohibitively expensive, it rather defeats the entire point of the inverted index data structure. If you help us understand what your goal is, people may be able to offer performant suggestions. https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 : Date: Mon, 9 Dec 2013 20:24:15 +0530 : From: Mukundaraman valakumaresan muk...@8kmiles.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Getting Solr Document Attributes from a Custom Function : : Hi All, : : I have a written a custom solr function and I would like to read a property : of the document inside my custom function. Is it possible to get that using : Solr? : : For eg. inside the floatVal method, I would like to get the value of the : attribute name : : public class CustomValueSource extends ValueSource { : : @Override : public FunctionValues getValues(Map context, : AtomicReaderContext readerContext) throws IOException { : return new FloatDocValues(this) { @Override public float floatVal(int doc) : { : /*** : getDocument(doc).getAttribute(name) : : / }}} : : Thanks Regards : Mukund : -Hoss http://www.lucidworks.com/
Re: Use of Deprecated Classes: SortableIntField SortableFloatField SortableDoubleField
Could you please suggest some examples. There is no documentation on this. You can find examples with this field types in solr codebase (like this http://svn.apache.org/viewvc/lucene/dev/trunk/solr/example/solr/collection1/conf/schema.xml?view=cocontent-type=text%2Fplain ) You can find more details about solr field types here http://wiki.apache.org/solr/SchemaXml or here https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr fieldType name=sint class=solr.TrieIntField sortMissingLast=true omitNorms=true/ Yes, it seems like correct specification. Should it also have the sortMissingLast and omitNorms, because I want something that I can use for sorting? Only name and class parameters are mandatory. But optional parameters can also be useful for your field type. You can find what they actually mean here https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties 10.12.2013, 02:19, O. Olson olson_...@yahoo.it: Thank you kydryavtsev andrey. Could you please suggest some examples. There is no documentation on this. Also is there a reason why these classes are not used in the examples even though they are deprecated? I am looking for examples like below: Should I put the following in my schema.xml file to use the TrieIntField: fieldType name=sint class=solr.TrieIntField sortMissingLast=true omitNorms=true/ Is this specification correct? Should it also have the sortMissingLast and omitNorms, because I want something that I can use for sorting? I have no clue how you get these. Thank you again, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Use-of-Deprecated-Classes-SortableIntField-SortableFloatField-SortableDoubleField-tp4105762p4105781.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr standard score
Hi, I have a requirement to standardize solr scores. For example, docs with score 7 Most relevant docs with score 7 and 4 Moderate docs with score 4 Less relevant. But in the real scenario this does not happen, as in few scenarios the top document may have a score of 3.5. Can i have the scores standardized in someway ( by index/query boosting) so that i can achieve this. Thanks, Prasi
Re: Difference between textfield and strfield
Hey Iori, Apologize for misunderstanding :-). Yes agree with you, faceting will be OK with TextField type however I'm concern about performance impact while running the facets if we have millions of documents. I wish in future we could apply tokensizers and filters to String fields. :-).Thanks for your inputs. -- View this message in context: http://lucene.472066.n3.nabble.com/Difference-between-textfield-and-strfield-tp3986916p4105841.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting Solr Document Attributes from a Custom Function
You can implement it in this way: Index number of cities as new int field (like field name=numberOfCities2/field) and implement user function like customFunction(price, numberOfCities, 1, 2000, 5) Custom parser should parse this into value sources list. From first two field sources we can get per doc value for this particular fields, another three will be ConstValueSource instances - just constants, so we can access all 5 values and implement custom formula per doc id. Find examples in ValueSourceParser and solr functions like DefFunction or MinFloatFunction 10.12.2013, 09:31, Mukundaraman valakumaresan muk...@8kmiles.com: Hi Hoss, Thanks a lot for your response. The actual problem is, For every record that I query, I have to execute a formula and sort the records based on the value of the formula. The formula has elements from the record. For eg. for the following document ,I need to apply the formula (maxprice - solrprice)/ (maxprice - minprice) + count(cities)/totalcities. where maxprice, maxprice and total cities will be available at run time. So for the following record, it has to execute as (1 - *5000*)/(1-2000) + *2*/5 (where 5000 and 2, which are in bold are from the document) doc field name=idapartment_1/field field name=nameCasa Grande/field field name=localitychennai/field field name=localitybangalore/field field name=price5000/field /doc Thanks Regards Mukund On Tue, Dec 10, 2013 at 12:22 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: Smells like an XY problem ... Can you please describe what your end goal is in writing a custom function, and what you would do with things like the name field inside your funciton? In general, accessing stored field values for indexed documents ca be prohibitively expensive, it rather defeats the entire point of the inverted index data structure. If you help us understand what your goal is, people may be able to offer performant suggestions. https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 : Date: Mon, 9 Dec 2013 20:24:15 +0530 : From: Mukundaraman valakumaresan muk...@8kmiles.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Getting Solr Document Attributes from a Custom Function : : Hi All, : : I have a written a custom solr function and I would like to read a property : of the document inside my custom function. Is it possible to get that using : Solr? : : For eg. inside the floatVal method, I would like to get the value of the : attribute name : : public class CustomValueSource extends ValueSource { : : @Override : public FunctionValues getValues(Map context, : AtomicReaderContext readerContext) throws IOException { : return new FloatDocValues(this) { @Override public float floatVal(int doc) : { : /*** : getDocument(doc).getAttribute(name) : : / }}} : : Thanks Regards : Mukund : -Hoss http://www.lucidworks.com/
Re: Solr standard score
The scores cannot be normalized that way. You can try, but it just isn't going to work the way you expect. Tell the people who wrote this requirement that it isn't possible. http://wiki.apache.org/lucene-java/ScoresAsPercentages wunder On Dec 9, 2013, at 10:21 PM, Prasi S prasi1...@gmail.com wrote: Hi, I have a requirement to standardize solr scores. For example, docs with score 7 Most relevant docs with score 7 and 4 Moderate docs with score 4 Less relevant. But in the real scenario this does not happen, as in few scenarios the top document may have a score of 3.5. Can i have the scores standardized in someway ( by index/query boosting) so that i can achieve this. Thanks, Prasi
Re: solr.xml
Thanks Mark. https://issues.apache.org/jira/browse/SOLR-5543 On Mon, Dec 9, 2013 at 2:39 PM, Mark Miller markrmil...@gmail.com wrote: Sounds like a bug. If you are seeing this happen in 4.6, I'd file a JIRA issue. - Mark On Sun, Dec 8, 2013 at 3:49 PM, William Bell billnb...@gmail.com wrote: Any thoughts? Why are we getting duplicate items in solr.xml ? -- Forwarded message -- From: William Bell billnb...@gmail.com Date: Sat, Dec 7, 2013 at 1:48 PM Subject: solr.xml To: solr-user@lucene.apache.org solr-user@lucene.apache.org We are having issues with SWAP CoreAdmin in 4.5.1 and 4.6. Using legacy solr.xml we issue a SWAP, and we want it persistent. It has bee running flawless since 4.5. Now it creates duplicate lines in solr.xml. Even the example multi core schema in 4.5.1 doesn't work with persistent=true - it creates duplicate lines in solr.xml. cores adminPath=/admin/cores core name=autosuggest loadOnStartup=true instanceDir=autosuggest transient=false/ core name=citystateprovider loadOnStartup=true instanceDir=citystateprovider transient=false/ core name=collection1 loadOnStartup=true instanceDir=collection1 transient=false/ core name=facility loadOnStartup=true instanceDir=facility transient=false/ core name=inactiveproviders loadOnStartup=true instanceDir=inactiveproviders transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ core name=locationgeo loadOnStartup=true instanceDir=locationgeo transient=false/ core name=market loadOnStartup=true instanceDir=market transient=false/ core name=portalprovider loadOnStartup=true instanceDir=portalprovider transient=false/ core name=practice loadOnStartup=true instanceDir=practice transient=false/ core name=provider loadOnStartup=true instanceDir=provider transient=false/ core name=providersearch loadOnStartup=true instanceDir=providersearch transient=false/ core name=tridioncomponents loadOnStartup=true instanceDir=tridioncomponents transient=false/ core name=linesvcgeo instanceDir=linesvcgeo loadOnStartup=true transient=false/ core name=linesvcgeofull instanceDir=linesvcgeofull loadOnStartup=true transient=false/ /cores -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- - Mark -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Indexing on plain text and binary data in a single HTTP POST request
Pls. find my response in-line: Assuming that your binary fields are mime attachments to email messages, they will probably already be encoded as base 64. Why not just leave them that way in solr too? You can't do much with them other than store them right? Or do you have some kind of image processing going on? You can always decode them in your client when you pull them out. [Neeraj]: Yes, binary fields are mime attachments to email messages. But I want to index attachment. For that I need to convert base64 encoded data in binary format at Solr side and then by using some technique, I need to extract text out of it so that the text can be indexed and I can search inside attachment. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105860.html Sent from the Solr - User mailing list archive at Nabble.com.