Re: Does sorting skip everything having to do with relevancy?
Hi, By default solr is using the sort parameter over the score field. So if you overwrite it using other sort field, yes solr will use the parameter you've provided. Remember, you can use multiple fields for sortinghttp://wiki.apache.org/solr/CommonQueryParameters#sort so you can make something like: sort score desc, your_field1 asc, your_field2 desc The score of documents is calculated on every query (it does not depend on the sort parameter or the debugQueryParameter) and the debubQuery is only a mechanism for showing (or hidding) how score was calculated. If you want to see a document score for a particular query (apart from the debugQuery) you can ask for it in the solr response adding the parameter *fl=*,score* to your request. Regards. On Fri, Apr 4, 2014 at 4:42 AM, Shawn Heisey s...@elyograg.org wrote: If I provide a sort parameter, will Solr (4.6.1) skip score/boost processing? In particular I would like to know what happens if I have a boost parameter (with a complex function) for edismax search, but I include a sort parameter on one of my fields. I am using distributed search. I do know that if debugQuery is used, the score IS calculated, but I'm talking about when debugQuery is not used. Thanks, Shawn
Re: Boosing Basic
Hi, If I were you, I would start reading the edismax documentationhttps://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser. Apart from the wiki, you can find in every distribution a full example with the configuration of the edismax query parser (check the xml node requestHandler name=/browse in the next file: $YOUR_SOLR_DISTRIBUTION_DIRECTORY/solr/example/solr/collection1/conf/solrconfig.xml). Regards. On Thu, Apr 3, 2014 at 6:55 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hello, I am trying to implement boosting but I am not able to find a good example, Some places asking to add ^10 to boost score in some places it says use bf . I have query with condition (Name OR Description OR ProductType) but I like to show the results first Name and need to boost the condition. Thanks Ravi
Re: Solr join and lucene scoring
Hi, The defect you are referencing is closed with a resolution of *Invalid*, so it seems the scoring is working fine with the join. I've made the next two tests on my own data and seems it is working: *TestA* - fl=id,score - q=notebook - fq={!join from=product_list to=id fromIndex=product}id:* - rows=2 Gives me the next result with the score calculated: doc str name=id4ADCBA5F-B532-4154-8E12-47311DC0FD50/str float name=score*2.6598556*/float /doc doc str name=idC861CC4A-6481-4754-946F-EA3903371C80/str float name=score*2.6598551*/float /doc /result *TESTB* - fl=id,score - q=notebook AND _query_:{!join from=product_list to=id fromIndex=product}id:* - rows=2 Gives me the next result with the score calcualted: doc str name=id5C449525-8A69-409B-829C-671E147BF6BB/str float name=score*0.1573925*/float /doc doc str name=idD1A719E8-F843-4E8D-AD82-64AA88D78BBB/str float name=score*0.1571764*/float /doc Regards. On Thu, Apr 3, 2014 at 11:42 AM, m...@preselect-media.com wrote: Hello, referencing to this issue: https://issues.apache.org/jira/browse/SOLR-4307 Is it still not possible with the solr query time join to use scoring? Do I still have to write my own plugin or is there a plugin somewhere I could use? I never wrote a plugin for solr before, so I would prefer if I don't have to start from scratch. THX, Moritz
How to reduce the search speed of solrcloud
Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to reduce the search speed of solrcloud
Show a sample query string that does that (takes 6 seconds to return). Including all defaults you may have put in solrconfig.xml (if any). That might give us a hint which features you are using and what possible direction you could go in next. For the bonus points, enable debug flag and rows=1 parameter to see how big your documents themselves are. You may have issues with a particular non-cloud-friendly feature, with caches, with not reusing parts of your queries as 'fq', returning too many fields or a bunch of other things. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 2:31 PM, Sathya sathia.blacks...@gmail.com wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to reduce the search speed of solrcloud
Hi Alex, str name=id33026985/str str name=subjectComponent Audio\:A Shopping List/str str name=download_date2012-01-11 09:02:42.96/str This is what i am indexed in solr. I have only 3 fields in index. And i am just indexing id, subject and date of the news articles. Nearly 5 crore documents. Also i have attached my solrconfig and solr.xml file. If u need more information, pls let me know. On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129068...@n3.nabble.com wrote: Show a sample query string that does that (takes 6 seconds to return). Including all defaults you may have put in solrconfig.xml (if any). That might give us a hint which features you are using and what possible direction you could go in next. For the bonus points, enable debug flag and rows=1 parameter to see how big your documents themselves are. You may have issues with a particular non-cloud-friendly feature, with caches, with not reusing parts of your queries as 'fq', returning too many fields or a bunch of other things. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML solrconfig.xml (101K) http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml solr.xml (1K) http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to reduce the search speed of solrcloud
What does your Solr query looks like (check the Solr backend log if you don't know)? And how many document is that? 50 million? Does not sound like much for 3 fields. And what's the definitions (schema.xml rather than solr.xml). And what happens if you issue the query directly to Solr rather than through the client? Is the speed much different? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:12 PM, Sathya sathia.blacks...@gmail.com wrote: Hi Alex, str name=id33026985/str str name=subjectComponent Audio\:A Shopping List/str str name=download_date2012-01-11 09:02:42.96/str This is what i am indexed in solr. I have only 3 fields in index. And i am just indexing id, subject and date of the news articles. Nearly 5 crore documents. Also i have attached my solrconfig and solr.xml file. If u need more information, pls let me know. On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129068...@n3.nabble.com wrote: Show a sample query string that does that (takes 6 seconds to return). Including all defaults you may have put in solrconfig.xml (if any). That might give us a hint which features you are using and what possible direction you could go in next. For the bonus points, enable debug flag and rows=1 parameter to see how big your documents themselves are. You may have issues with a particular non-cloud-friendly feature, with caches, with not reusing parts of your queries as 'fq', returning too many fields or a bunch of other things. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML solrconfig.xml (101K) http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml solr.xml (1K) http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to reduce the search speed of solrcloud
Hi, I have attached my schema.xml file too. And you are right. I have 50 million documents. When i use solr browser to search a document, it will return within 1000 to 2000 ms. My query looks like this: http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true On 4/4/14, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129074...@n3.nabble.com wrote: What does your Solr query looks like (check the Solr backend log if you don't know)? And how many document is that? 50 million? Does not sound like much for 3 fields. And what's the definitions (schema.xml rather than solr.xml). And what happens if you issue the query directly to Solr rather than through the client? Is the speed much different? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:12 PM, Sathya sathia.blacks...@gmail.com wrote: Hi Alex, str name=id33026985/str str name=subjectComponent Audio\:A Shopping List/str str name=download_date2012-01-11 09:02:42.96/str This is what i am indexed in solr. I have only 3 fields in index. And i am just indexing id, subject and date of the news articles. Nearly 5 crore documents. Also i have attached my solrconfig and solr.xml file. If u need more information, pls let me know. On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129068...@n3.nabble.com wrote: Show a sample query string that does that (takes 6 seconds to return). Including all defaults you may have put in solrconfig.xml (if any). That might give us a hint which features you are using and what possible direction you could go in next. For the bonus points, enable debug flag and rows=1 parameter to see how big your documents themselves are. You may have issues with a particular non-cloud-friendly feature, with caches, with not reusing parts of your queries as 'fq', returning too many fields or a bunch of other things. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML solrconfig.xml (101K) http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml solr.xml (1K) http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129074.html To unsubscribe from How to reduce the search speed of solrcloud, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4129067code=c2F0aGlhLmJsYWNrc3RhckBnbWFpbC5jb218NDEyOTA2N3wtMjEyNDcwMTI5OA== schema.xml (81K) http://lucene.472066.n3.nabble.com/attachment/4129075/0/schema.xml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129075.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to reduce the search speed of solrcloud
Well, if the direct browser query is 1000ms and your client query is 6seconds, then it is not Solr itself you need to worry about first. Something must be wrong at the client. Trying timing that bit. Maybe it is writing from the client to your ultimate consumer that's the problem. Regards, Alex. P.s. You should probably trim your schema to get rid of all the example fields. Keep _version_ and _root_ but delete all the rest you don't actually use. Same with dynamic fields and all fieldType definitions you do not actually use. You can always reintroduce them later from the example schemas if something is missing. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:41 PM, Sathya sathia.blacks...@gmail.com wrote: Hi, I have attached my schema.xml file too. And you are right. I have 50 million documents. When i use solr browser to search a document, it will return within 1000 to 2000 ms. My query looks like this: http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true On 4/4/14, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129074...@n3.nabble.com wrote: What does your Solr query looks like (check the Solr backend log if you don't know)? And how many document is that? 50 million? Does not sound like much for 3 fields. And what's the definitions (schema.xml rather than solr.xml). And what happens if you issue the query directly to Solr rather than through the client? Is the speed much different? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:12 PM, Sathya sathia.blacks...@gmail.com wrote: Hi Alex, str name=id33026985/str str name=subjectComponent Audio\:A Shopping List/str str name=download_date2012-01-11 09:02:42.96/str This is what i am indexed in solr. I have only 3 fields in index. And i am just indexing id, subject and date of the news articles. Nearly 5 crore documents. Also i have attached my solrconfig and solr.xml file. If u need more information, pls let me know. On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129068...@n3.nabble.com wrote: Show a sample query string that does that (takes 6 seconds to return). Including all defaults you may have put in solrconfig.xml (if any). That might give us a hint which features you are using and what possible direction you could go in next. For the bonus points, enable debug flag and rows=1 parameter to see how big your documents themselves are. You may have issues with a particular non-cloud-friendly feature, with caches, with not reusing parts of your queries as 'fq', returning too many fields or a bunch of other things. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML solrconfig.xml (101K) http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml solr.xml (1K) http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129074.html To unsubscribe from How to reduce the search speed of solrcloud, visit http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4129067code=c2F0aGlhLmJsYWNrc3RhckBnbWFpbC5jb218NDEyOTA2N3wtMjEyNDcwMTI5OA== schema.xml (81K) http://lucene.472066.n3.nabble.com/attachment/4129075/0/schema.xml --
Query and field name with wildcard
In my index I have some fields which have the same prefix(rmDocumentTitle, rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not possible to specify a query like this: q = rm* : some_word Is there a way to do this without having to write a long list of ORs? Another question is if it is really not possible to search a word over the entire index. Something like this: q = * : some_word Thank you Francesco
Re: Query and field name with wildcard
Are you using eDisMax. That gives a lot of options, including field aliasing, including a single name to multiple fields: http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming (with example on p77 of my book http://www.packtpub.com/apache-solr-for-indexing-data/book :-) Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:52 PM, Croci Francesco Luigi (ID SWS) fcr...@id.ethz.ch wrote: In my index I have some fields which have the same prefix(rmDocumentTitle, rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not possible to specify a query like this: q = rm* : some_word Is there a way to do this without having to write a long list of ORs? Another question is if it is really not possible to search a word over the entire index. Something like this: q = * : some_word Thank you Francesco
Re: How to reduce the search speed of solrcloud
Hi, Sorry, i cant get u alex. Can you please explain me(if you can). Because now only i entered into solr. On Fri, Apr 4, 2014 at 2:20 PM, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129077...@n3.nabble.com wrote: Well, if the direct browser query is 1000ms and your client query is 6seconds, then it is not Solr itself you need to worry about first. Something must be wrong at the client. Trying timing that bit. Maybe it is writing from the client to your ultimate consumer that's the problem. Regards, Alex. P.s. You should probably trim your schema to get rid of all the example fields. Keep _version_ and _root_ but delete all the rest you don't actually use. Same with dynamic fields and all fieldType definitions you do not actually use. You can always reintroduce them later from the example schemas if something is missing. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:41 PM, Sathya [hidden email]http://user/SendEmail.jtp?type=nodenode=4129077i=0 wrote: Hi, I have attached my schema.xml file too. And you are right. I have 50 million documents. When i use solr browser to search a document, it will return within 1000 to 2000 ms. My query looks like this: http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true On 4/4/14, Alexandre Rafalovitch [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=1 wrote: What does your Solr query looks like (check the Solr backend log if you don't know)? And how many document is that? 50 million? Does not sound like much for 3 fields. And what's the definitions (schema.xml rather than solr.xml). And what happens if you issue the query directly to Solr rather than through the client? Is the speed much different? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:12 PM, Sathya [hidden email]http://user/SendEmail.jtp?type=nodenode=4129077i=2 wrote: Hi Alex, str name=id33026985/str str name=subjectComponent Audio\:A Shopping List/str str name=download_date2012-01-11 09:02:42.96/str This is what i am indexed in solr. I have only 3 fields in index. And i am just indexing id, subject and date of the news articles. Nearly 5 crore documents. Also i have attached my solrconfig and solr.xml file. If u need more information, pls let me know. On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=3 wrote: Show a sample query string that does that (takes 6 seconds to return). Including all defaults you may have put in solrconfig.xml (if any). That might give us a hint which features you are using and what possible direction you could go in next. For the bonus points, enable debug flag and rows=1 parameter to see how big your documents themselves are. You may have issues with a particular non-cloud-friendly feature, with caches, with not reusing parts of your queries as 'fq', returning too many fields or a bunch of other things. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129068.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML solrconfig.xml (101K) http://lucene.472066.n3.nabble.com/attachment/4129073/0/solrconfig.xml solr.xml (1K) http://lucene.472066.n3.nabble.com/attachment/4129073/1/solr.xml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129073.html Sent from the Solr - User mailing list archive at Nabble.com. ___ If you
Re: How to reduce the search speed of solrcloud
You said your request is 6 seconds when going through the SolrJ client. But it is 1 second (1000 ms) when going directly to Solr bypassing the SolrJ. So, the other 5 seconds must be added outside of Solr. Concentrate on that. Regarding your schema, you used example schema that has a lot of stuff you do not need. Here is what a very small schema looks like: https://github.com/arafalov/solr-indexing-book/blob/master/published/collection1/conf/schema.xml , so you can compare. That's an example from my book. You may find the book a fast way to get from your current state to early intermediate (no cloud examples, though). Contact me directly if you need a discount. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 4:11 PM, Sathya sathia.blacks...@gmail.com wrote: Hi, Sorry, i cant get u alex. Can you please explain me(if you can). Because now only i entered into solr. On Fri, Apr 4, 2014 at 2:20 PM, Alexandre Rafalovitch [via Lucene] ml-node+s472066n4129077...@n3.nabble.com wrote: Well, if the direct browser query is 1000ms and your client query is 6seconds, then it is not Solr itself you need to worry about first. Something must be wrong at the client. Trying timing that bit. Maybe it is writing from the client to your ultimate consumer that's the problem. Regards, Alex. P.s. You should probably trim your schema to get rid of all the example fields. Keep _version_ and _root_ but delete all the rest you don't actually use. Same with dynamic fields and all fieldType definitions you do not actually use. You can always reintroduce them later from the example schemas if something is missing. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:41 PM, Sathya [hidden email]http://user/SendEmail.jtp?type=nodenode=4129077i=0 wrote: Hi, I have attached my schema.xml file too. And you are right. I have 50 million documents. When i use solr browser to search a document, it will return within 1000 to 2000 ms. My query looks like this: http://10.10.1.14:5050/solr/set_recent_shard1_replica5/select?q=subjectindent=true On 4/4/14, Alexandre Rafalovitch [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=1 wrote: What does your Solr query looks like (check the Solr backend log if you don't know)? And how many document is that? 50 million? Does not sound like much for 3 fields. And what's the definitions (schema.xml rather than solr.xml). And what happens if you issue the query directly to Solr rather than through the client? Is the speed much different? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:12 PM, Sathya [hidden email]http://user/SendEmail.jtp?type=nodenode=4129077i=2 wrote: Hi Alex, str name=id33026985/str str name=subjectComponent Audio\:A Shopping List/str str name=download_date2012-01-11 09:02:42.96/str This is what i am indexed in solr. I have only 3 fields in index. And i am just indexing id, subject and date of the news articles. Nearly 5 crore documents. Also i have attached my solrconfig and solr.xml file. If u need more information, pls let me know. On Fri, Apr 4, 2014 at 1:15 PM, Alexandre Rafalovitch [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4129077i=3 wrote: Show a sample query string that does that (takes 6 seconds to return). Including all defaults you may have put in solrconfig.xml (if any). That might give us a hint which features you are using and what possible direction you could go in next. For the bonus points, enable debug flag and rows=1 parameter to see how big your documents themselves are. You may have issues with a particular non-cloud-friendly feature, with caches, with not reusing parts of your queries as 'fq', returning too many fields or a bunch of other things. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 2:31 PM, Sathya [hidden email] wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. -- View this message in context:
RE: tf and very short text fields
Hi - In this case Walter, iirc, was looking for two things: no normalization and no flat TF (1f for tf(float freq) 0). We know that k1 controls TF saturation but in BM25Similarity you can see that k1 is multiplied by the encoded norm value, taking b also into account. So setting k1 to zero effectively disabled length normalization and results in flat or binary TF. Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the field, term occurs three times in the field: 28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5 ), product of: 6.4 = boost 4.406719 = idf(docFreq=1, docCount=122) 1.0 = tfNorm, computed from: 1.5 = phraseFreq=1.5 0.0 = parameter k1 0.75 = parameter b 8.721312 = avgFieldLength 16.0 = fieldLength 27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5 ), product of: 6.4 = boost 4.406719 = idf(docFreq=1, docCount=122) 0.98619986 = tfNorm, computed from: 1.5 = phraseFreq=1.5 0.2 = parameter k1 0.75 = parameter b 8.721312 = avgFieldLength 16.0 = fieldLength You can clearly see the final TF norm being 1, despite the term frequency and length. Please correct my wrongs :) Markus -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Thursday 3rd April 2014 20:18 To: solr-user@lucene.apache.org Subject: Re: tf and very short text fields Hi Markus and Wunder, I'm missing the original context, but I don't think BM25 will solve this particular problem. The k1 parameter sets how quickly the contribution of tf to the score falls off with increasing tf. It would be helpful for making sure really long documents don't get too high a score, but I don't think it would help for very short documents without messing up its original design purpose. For BM25, if you want to turn off length normalization, you set b to 0. However, I don't think that will do what you want, since turning off normalization will mean that the score for new york, new york will be twice that of the score for new york since without normalization the tf in new york new york is twice that of new york. I think the earlier suggestion to override tfidfsimilarity and emit 1f in tf() is probably the best way to switch to eliminate using tf counts, assumming that is really what you want. Tom On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wun...@wunderwood.orgwrote: Thanks! We'll try that out and report back. I keep forgetting that I want to try BM25, so this is a good excuse. wunder On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io wrote: Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the calculation. So thats easy to play with without reindexing. Markus Jelsma markus.jel...@openindex.io schreef:Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema. Walter Underwood wun...@wunderwood.org schreef:And here is another peculiarity of short text fields. The movie New York, New York should not be twice as relevant for the query new york. Is there a way to use a binary term frequency rather than a count? wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Cannot run program svnversion when building lucene 4.7.1
Hi all. I am trying to build lucene 4.7.1 from the sources. I can compile without any issues but when I try to build the dist, lucene gives me Cannot run program svnversion ... The system cannot find the specified file. I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit. Where can I get this svnversion ? Thanks Puneet
Solr Search on Fields name
Hi, Thank for giving your important time. Problem : I am unable to find a way how can I search Key with OR operator like if I search Items having RuleA OR RuleE. Format of Indexed Data: result name=response numFound=27 start=0 maxScore=1.0 doc float name=score1.0/float . int name=RuleA4/int int name=RuleD2/int int name=RuleE2/int int name=RuleF2/int /doc Can any one help me out how can prepare SearchQuery for key search. Regards Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-on-Fields-name-tp4129119.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cannot run program svnversion when building lucene 4.7.1
Hi, When you install subversion, svnversion executable comes with that too. Did you install any svn client for Windows? On Friday, April 4, 2014 3:38 PM, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi all. I am trying to build lucene 4.7.1 from the sources. I can compile without any issues but when I try to build the dist, lucene gives me Cannot run program svnversion ... The system cannot find the specified file. I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit. Where can I get this svnversion ? Thanks Puneet
Re: Solr Search on Fields name
Hi Anurag, It seems that RuleA and RuleB are field names? in that case try this query q=RuleA:[* TO *] OR RuleB:[* TO *] Ahmet On Friday, April 4, 2014 4:15 PM, anuragwalia anuwaliaha...@gmail.com wrote: Hi, Thank for giving your important time. Problem : I am unable to find a way how can I search Key with OR operator like if I search Items having RuleA OR RuleE. Format of Indexed Data: result name=response numFound=27 start=0 maxScore=1.0 doc float name=score1.0/float . int name=RuleA4/int int name=RuleD2/int int name=RuleE2/int int name=RuleF2/int /doc Can any one help me out how can prepare SearchQuery for key search. Regards Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Search-on-Fields-name-tp4129119.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP parameters request: webapp=/solr path=/dataimport params={clean=falsecommand=full-importwt=javabinversion=2} We are getting a Java heap OOM exception when indexing (updating) 27 million records. If we increase the Java heap memory settings the problem goes away but we believe the problem has not been fixed and that we will eventually get the same OOM exception. We have other processes on the server that also require resources so we cannot continually increase the memory settings to resolve the OOM issue. We are trying to find a way to configure the SOLR instance to reduce or preferably eliminate the possibility of an OOM exception. We can reproduce the problem on a test machine. We set the Java heap memory size to 64MB to accelerate the exception. If we increase this setting the same problems occurs, just hours later. In the test environment, we are using the following parameters: JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m Normally we use the default solrconfig.xml file with only the following jar file references added: lib path=../../../../default/lib/common.jar / lib path=../../../../default/lib/webapp.jar / lib path=../../../../default/lib/commons-pool-1.4.jar / Using these values and trying to index 6 million records from the database, the Java Heap Out of Memory exception is thrown very quickly. We were able to complete a successful indexing by further modifying the solrconfig.xml and removing all or all but one copyfield tags from the schema.xml file. The following solrconfig.xml values were modified: ramBufferSizeMB6/ramBufferSizeMB mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce2/int int name=maxMergeAtOnceExplicit2/int int name=segmentsPerTier10/int int name=maxMergedSegmentMB150/int /mergePolicy autoCommit maxDocs15000/maxDocs !-- This tag was maxTime, before this -- openSearcherfalse/openSearcher /autoCommit Using our customized schema.xml file with two or more copyfield tags, the OOM exception is always thrown. Based on the errors, the problem occurs when the process was trying to do the merge. The error is provided below: Exception in thread Lucene Merge Thread #156 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146) at org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301) at org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478) Mar 12, 2014 12:17:40 AM org.apache.solr.common.SolrException log SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3971)
Re: Does sorting skip everything having to do with relevancy?
On 4/4/2014 12:48 AM, Alvaro Cabrerizo wrote: By default solr is using the sort parameter over the score field. So if you overwrite it using other sort field, yes solr will use the parameter you've provided. Remember, you can use multiple fields for sortinghttp://wiki.apache.org/solr/CommonQueryParameters#sort so you can make something like: sort score desc, your_field1 asc, your_field2 desc The score of documents is calculated on every query (it does not depend on the sort parameter or the debugQueryParameter) and the debubQuery is only a mechanism for showing (or hidding) how score was calculated. If you want to see a document score for a particular query (apart from the debugQuery) you can ask for it in the solr response adding the parameter *fl=*,score* to your request. These are things that I already know. What I want to know is whether Solr has code in place that will avoid wasting CPU cycles calculating the score that will never be displayed or used, *especially* the complex boost parameter that's in the request handler definition (solrconfig.xml). str name=boostmin(recip(abs(ms(NOW/HOUR,registered_date)),1.92901e-10,1.5,1.5),0.85)/str Do I need to send 'boost=' as a parameter (along with my sort) to get it to avoid that calculation? Thanks, Shawn
Re: Query and field name with wildcard
Hi, bq. possible to search a word over the entire index. You can a get list of all searchable fields (indexed=true) programmatically by https://wiki.apache.org/solr/LukeRequestHandler And then you can fed this list to qf parameter of (e)dismax. This could be implemented as a custom query parser plugin that searches a word over the entire index. Ahmet On Friday, April 4, 2014 12:08 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Are you using eDisMax. That gives a lot of options, including field aliasing, including a single name to multiple fields: http://wiki.apache.org/solr/ExtendedDisMax#Field_aliasing_.2F_renaming (with example on p77 of my book http://www.packtpub.com/apache-solr-for-indexing-data/book :-) Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 3:52 PM, Croci Francesco Luigi (ID SWS) fcr...@id.ethz.ch wrote: In my index I have some fields which have the same prefix(rmDocumentTitle, rmDocumentClass, rmDocumentSubclass, rmDocumentArt). Apparently it is not possible to specify a query like this: q = rm* : some_word Is there a way to do this without having to write a long list of ORs? Another question is if it is really not possible to search a word over the entire index. Something like this: q = * : some_word Thank you Francesco
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP parameters request: webapp=/solr path=/dataimport params={clean=falsecommand=full-importwt=javabinversion=2} We are getting a Java heap OOM exception when indexing (updating) 27 million records. If we increase the Java heap memory settings the problem goes away but we believe the problem has not been fixed and that we will eventually get the same OOM exception. We have other processes on the server that also require resources so we cannot continually increase the memory settings to resolve the OOM issue. We are trying to find a way to configure the SOLR instance to reduce or preferably eliminate the possibility of an OOM exception. We can reproduce the problem on a test machine. We set the Java heap memory size to 64MB to accelerate the exception. If we increase this setting the same problems occurs, just hours later. In the test environment, we are using the following parameters: JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m Normally we use the default solrconfig.xml file with only the following jar file references added: lib path=../../../../default/lib/common.jar / lib path=../../../../default/lib/webapp.jar / lib path=../../../../default/lib/commons-pool-1.4.jar / Using these values and trying to index 6 million records from the database, the Java Heap Out of Memory exception is thrown very quickly. We were able to complete a successful indexing by further modifying the solrconfig.xml and removing all or all but one copyfield tags from the schema.xml file. The following solrconfig.xml values were modified: ramBufferSizeMB6/ramBufferSizeMB mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce2/int int name=maxMergeAtOnceExplicit2/int int name=segmentsPerTier10/int int name=maxMergedSegmentMB150/int /mergePolicy autoCommit maxDocs15000/maxDocs !-- This tag was maxTime, before this -- openSearcherfalse/openSearcher /autoCommit Using our customized schema.xml file with two or more copyfield tags, the OOM exception is always thrown. Based on the errors, the problem occurs when the process was trying to do the merge. The error is provided below: Exception in thread Lucene Merge Thread #156 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146) at org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301) at org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.java:259) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:233) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3693) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3296) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:401) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:478) Mar 12, 2014 12:17:40 AM
Re: tf and very short text fields
Hi, Another dimple approach is: If you don't use phrase query or phrase boosting, you can set omitTermFreqAndPositions=true Ahmet On Friday, April 4, 2014 2:38 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - In this case Walter, iirc, was looking for two things: no normalization and no flat TF (1f for tf(float freq) 0). We know that k1 controls TF saturation but in BM25Similarity you can see that k1 is multiplied by the encoded norm value, taking b also into account. So setting k1 to zero effectively disabled length normalization and results in flat or binary TF. Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the field, term occurs three times in the field: 28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5 ), product of: 6.4 = boost 4.406719 = idf(docFreq=1, docCount=122) 1.0 = tfNorm, computed from: 1.5 = phraseFreq=1.5 0.0 = parameter k1 0.75 = parameter b 8.721312 = avgFieldLength 16.0 = fieldLength 27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5 ), product of: 6.4 = boost 4.406719 = idf(docFreq=1, docCount=122) 0.98619986 = tfNorm, computed from: 1.5 = phraseFreq=1.5 0.2 = parameter k1 0.75 = parameter b 8.721312 = avgFieldLength 16.0 = fieldLength You can clearly see the final TF norm being 1, despite the term frequency and length. Please correct my wrongs :) Markus -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Thursday 3rd April 2014 20:18 To: solr-user@lucene.apache.org Subject: Re: tf and very short text fields Hi Markus and Wunder, I'm missing the original context, but I don't think BM25 will solve this particular problem. The k1 parameter sets how quickly the contribution of tf to the score falls off with increasing tf. It would be helpful for making sure really long documents don't get too high a score, but I don't think it would help for very short documents without messing up its original design purpose. For BM25, if you want to turn off length normalization, you set b to 0. However, I don't think that will do what you want, since turning off normalization will mean that the score for new york, new york will be twice that of the score for new york since without normalization the tf in new york new york is twice that of new york. I think the earlier suggestion to override tfidfsimilarity and emit 1f in tf() is probably the best way to switch to eliminate using tf counts, assumming that is really what you want. Tom On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wun...@wunderwood.orgwrote: Thanks! We'll try that out and report back. I keep forgetting that I want to try BM25, so this is a good excuse. wunder On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io wrote: Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the calculation. So thats easy to play with without reindexing. Markus Jelsma markus.jel...@openindex.io schreef:Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema. Walter Underwood wun...@wunderwood.org schreef:And here is another peculiarity of short text fields. The movie New York, New York should not be twice as relevant for the query new york. Is there a way to use a binary term frequency rather than a count? wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: How to reduce the search speed of solrcloud
On 4/4/2014 1:31 AM, Sathya wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. How much total RAM do you have on the system, and how much total index data is on that system (adding up all the Solr cores)? You've already said that you have allocated 4GB of RAM for Solr. Later you said you had 50 million documents, and then you showed us a URL that looks like SolrCloud. I suspect that you don't have enough RAM left over to cache your index effectively -- the OS Disk Cache is too small. http://wiki.apache.org/solr/SolrPerformanceProblems Another possible problem, also discussed on that page, is that your Java heap is too small. Thanks, Shawn
Difference between [ TO *] and [* TO *] at Solr?
Hİ; What is the difference between [ TO *] and [* TO *] at Solr? (I tested it at 4.5.1 and numFounds are different. Thanks; Furkan KAMACI
Solr Search For Documents That Has Empty Content For a Given Particular Field
Hi; How can I find the documents that has empty content for a given field. I don't mean something like: -field:[* TO *] because it returns the documents that has not given particular field. I have documents something like: field1:some text, field2:some text, field : // this is the field that I want to learn which document has it. Thanks; Furkan KAMACI
Re: tf and very short text fields
Thanks Marcus, I was thinking about normalization and was absolutely wrong about setting K1 to zero. I should have taken a look at the algorithm and walked through setting K=0. (This is easier to do looking at the formula in wikipedia http://en.wikipedia.org/wiki/Okapi_BM25 than walking though the code.) When you set k1 to 0 it does just what you said i.e provides binary tf. That part of the formula returns 1 if the term is present and 0 if not. Which is I think what Wunder was trying to accomplish. Sorry about jumping in without double checking things first. Tom On Fri, Apr 4, 2014 at 7:38 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi - In this case Walter, iirc, was looking for two things: no normalization and no flat TF (1f for tf(float freq) 0). We know that k1 controls TF saturation but in BM25Similarity you can see that k1 is multiplied by the encoded norm value, taking b also into account. So setting k1 to zero effectively disabled length normalization and results in flat or binary TF. Here's an example output of k1 = 0 and k1 = 0.2. Norms or enabled on the field, term occurs three times in the field: 28.203003 = score(doc=0,freq=1.5 = phraseFreq=1.5 ), product of: 6.4 = boost 4.406719 = idf(docFreq=1, docCount=122) 1.0 = tfNorm, computed from: 1.5 = phraseFreq=1.5 0.0 = parameter k1 0.75 = parameter b 8.721312 = avgFieldLength 16.0 = fieldLength 27.813797 = score(doc=0,freq=1.5 = phraseFreq=1.5 ), product of: 6.4 = boost 4.406719 = idf(docFreq=1, docCount=122) 0.98619986 = tfNorm, computed from: 1.5 = phraseFreq=1.5 0.2 = parameter k1 0.75 = parameter b 8.721312 = avgFieldLength 16.0 = fieldLength You can clearly see the final TF norm being 1, despite the term frequency and length. Please correct my wrongs :) Markus -Original message- From:Tom Burton-West tburt...@umich.edu Sent: Thursday 3rd April 2014 20:18 To: solr-user@lucene.apache.org Subject: Re: tf and very short text fields Hi Markus and Wunder, I'm missing the original context, but I don't think BM25 will solve this particular problem. The k1 parameter sets how quickly the contribution of tf to the score falls off with increasing tf. It would be helpful for making sure really long documents don't get too high a score, but I don't think it would help for very short documents without messing up its original design purpose. For BM25, if you want to turn off length normalization, you set b to 0. However, I don't think that will do what you want, since turning off normalization will mean that the score for new york, new york will be twice that of the score for new york since without normalization the tf in new york new york is twice that of new york. I think the earlier suggestion to override tfidfsimilarity and emit 1f in tf() is probably the best way to switch to eliminate using tf counts, assumming that is really what you want. Tom On Tue, Apr 1, 2014 at 4:17 PM, Walter Underwood wun...@wunderwood.org wrote: Thanks! We'll try that out and report back. I keep forgetting that I want to try BM25, so this is a good excuse. wunder On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io wrote: Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the calculation. So thats easy to play with without reindexing. Markus Jelsma markus.jel...@openindex.io schreef:Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema. Walter Underwood wun...@wunderwood.org schreef:And here is another peculiarity of short text fields. The movie New York, New York should not be twice as relevant for the query new york. Is there a way to use a binary term frequency rather than a count? wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Solr Search For Documents That Has Empty Content For a Given Particular Field
Hi Furkan, q=fiel:fl=field works for me (4.7.0). Ahmet On Friday, April 4, 2014 5:50 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; How can I find the documents that has empty content for a given field. I don't mean something like: -field:[* TO *] because it returns the documents that has not given particular field. I have documents something like: field1:some text, field2:some text, field : // this is the field that I want to learn which document has it. Thanks; Furkan KAMACI
Re: Solr Search For Documents That Has Empty Content For a Given Particular Field
Hi; II tried it before but does not work 2014-04-04 18:08 GMT+03:00 Ahmet Arslan iori...@yahoo.com: Hi Furkan, q=fiel:fl=field works for me (4.7.0). Ahmet On Friday, April 4, 2014 5:50 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; How can I find the documents that has empty content for a given field. I don't mean something like: -field:[* TO *] because it returns the documents that has not given particular field. I have documents something like: field1:some text, field2:some text, field : // this is the field that I want to learn which document has it. Thanks; Furkan KAMACI
Re: Solr Search For Documents That Has Empty Content For a Given Particular Field
Hi, Weird, for type=string it works for me. What is the field type you are using? On Friday, April 4, 2014 6:25 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; II tried it before but does not work 2014-04-04 18:08 GMT+03:00 Ahmet Arslan iori...@yahoo.com: Hi Furkan, q=fiel:fl=field works for me (4.7.0). Ahmet On Friday, April 4, 2014 5:50 PM, Furkan KAMACI furkankam...@gmail.com wrote: Hi; How can I find the documents that has empty content for a given field. I don't mean something like: -field:[* TO *] because it returns the documents that has not given particular field. I have documents something like: field1:some text, field2:some text, field : // this is the field that I want to learn which document has it. Thanks; Furkan KAMACI
Strange behavior of edismax and mm=0 with long queries (bug?)
Hey, I am currently using solr to recognize songs and people from a list of user comments. My index stores the titles of the songs. At the moment my application builds word ngrams and fires a search with that query, which works well but is quite inefficient. So my thought was to simply use the collated comments as query. So it is a case where the query is much longer. I need to use mm=0 or mm=1. My plan was to use edismax as the pf2 and pf3 parameters should work well for my usecase. However when using longer queries, I get a strange behavior which can be seen in debugQuery. Here is an example: Collated Comments (used as query) I love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed. sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition This video clip makes me smile. Pure joy! so good! Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind. It's gorgeous. Flawless victory. Great number! Does anybody know the name of the piece? I believe it's called Sunny side of the street Maria is like, the best 'follow' I've ever seen. She's so amazing. Thanks so much Johnathan! Song name in Index Louis Armstrong - Sunny Side of The Street parsedquery_toString: +(((text:I) (text:love) (text:Henry) (text:so) (text:much.) (text:It) (text:is) (text:hard) (text:to) (text:tear) (text:your) (text:eyes) (text:away) (text:from) (text:Maria,) (text:but) (text:watch) (text:just) (text:his) (text:feet.) (text:You'll) (text:be) (text:amazed.) (text:sometimes) (text:pure) (text:skill) (text:can) (text:will) (text:a) (text:comp,) (text:sometimes) (text:pure) (text:joy) (text:can) (text:win...) (text:put) (text:them) (text:both) +(text:together) +(text:there) (text:is) (text:no) (text:competition) (text:This) (text:video) (text:clip) (text:makes) (text:me) (text:smile.) (text:Pure) (text:joy!) (text:so) (text:good!) (text:Who's) (text:the) (text:person) (text:that) (text:gave) (text:this) (text:a) (text:thumbs) (text:down?!?) (text:This) (text:is) (text:one) (text:of) (text:the) (text:best) (text:routines) (text:I've) (text:ever) (text:seen.) +(text:Period.) +(text:it's) (text:a) (text:competitionl!) (text:How) (text:is) (text:that) (text:possible?) (text:They're) (text:so) (text:good) (text:it) (text:boggles) (text:my) (text:mind.) (text:It's) (text:gorgeous.) (text:Flawless) (text:victory.) (text:Great) (text:number!) (text:Does) (text:anybody) (text:know) (text:the) (text:name) (text:of) (text:the) (text:piece?) (text:I) (text:believe) (text:it's) (text:called) (text:Sunny) (text:side) (text:of) (text:the) (text:street) (text:Maria) (text:is) (text:like,) (text:the) (text:best) (text:'follow') (text:I've) (text:ever) (text:seen.) (text:She's) (text:so) (text:amazing.) (text:Thanks) (text:so) (text:much) (text:Johnathan!))~1)/str This query generates 0 results. The reason is it expects terms together, there, Period., it's to be part of the document (see parsedquery above, all other terms are optional, those terms are must). Is there any reason for this behavior? If I use shorter queries it works flawlessly and returns the document. I've appended the whole query. Best, Nils ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime11/int /lst result name=response numFound=0 start=0 /result lst name=debug str name=rawquerystringI love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed. sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition This video clip makes me smile. Pure joy! so good! Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind. It's gorgeous. Flawless victory. Great number! Does anybody know the name of the piece? I believe it's called Sunny side of the street Maria is like, the best 'follow' I've ever seen. She's so amazing. Thanks so much Johnathan! /str str name=querystringI love Henry so much. It is hard to tear your eyes away from Maria, but watch just his feet. You'll be amazed. sometimes pure skill can will a comp, sometimes pure joy can win... put them both together and there is no competition This video clip makes me smile. Pure joy! so good! Who's the person that gave this a thumbs down?!? This is one of the best routines I've ever seen. Period. And it's a competitionl! How is that possible? They're so good it boggles my mind. It's gorgeous. Flawless victory. Great number! Does anybody know the name of the piece? I believe it's called Sunny side of the street Maria is like, the best 'follow' I've ever seen. She's so amazing. Thanks so much Johnathan!
Re: Solr Search For Documents That Has Empty Content For a Given Particular Field
: field : // this is the field that I want to learn which document has : it. How you (can) query for a field value like that is going to depend entirely on the FieldTYpe/Analyzer ... if it's a string field, of uses KeywordTokenizer then q=field: should find it -- if you use a more traditional analyzer then it probably didn't produce any terms for hte input and from Solr's perspective a document that was indexed using an empty string value is exactly the same as a document that had no value when index. In essenc,e your question is equivilent to asking How can i search for doc1, but not doc2, evne though i'm using LowerCaseAnalyzer which produces exactly the same indexe terms or both... doc1: Quick Fox doc2: quick fox -Hoss http://www.lucidworks.com/
Re: How to reduce the search speed of solrcloud
Hi shawn, I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in each machine. I am using solrcloud. My total index size is 50gb including 5 servers. Each server have 3 zookeepers. Still I didnt check about OS disk cache and heap memory. I will check and let u know shawn. If anything, pls let me know. Thank u shawn. On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] ml-node+s472066n4129150...@n3.nabble.com wrote: On 4/4/2014 1:31 AM, Sathya wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. How much total RAM do you have on the system, and how much total index data is on that system (adding up all the Solr cores)? You've already said that you have allocated 4GB of RAM for Solr. Later you said you had 50 million documents, and then you showed us a URL that looks like SolrCloud. I suspect that you don't have enough RAM left over to cache your index effectively -- the OS Disk Cache is too small. http://wiki.apache.org/solr/SolrPerformanceProblems Another possible problem, also discussed on that page, is that your Java heap is too small. Thanks, Shawn If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html Sent from the Solr - User mailing list archive at Nabble.com.
AUTO: Saravanan Chinnadurai is out of the office (returning 08/04/2014)
I am out of the office until 08/04/2014. Please email itsta...@actionimages.com for any urgent queries. Note: This is an automated response to your message Cannot run program svnversion when building lucene 4.7.1 sent on 4/4/2014 13:38:22. This is the only notification you will receive while this person is away. Action Images is a division of Reuters Limited and your data will therefore be protected in accordance with the Reuters Group Privacy / Data Protection notice which is available in the privacy footer at www.reuters.com Registered in England No. 145516 VAT REG: 397000555
Re: How to reduce the search speed of solrcloud
I am not sure if you setup your SolrCloud right. Can you also provide me with the version of Solr that you're running. Also, if you could tell me about how did you setup your SolrCloud cluster. Are the times consistent? Is this the only collection on the cluster? Also, if I am getting it right, you have 15 ZKs running. Correct me if I'm wrong, but if I'm not, you don't need that kind of a zk setup. On Fri, Apr 4, 2014 at 9:39 AM, Sathya sathia.blacks...@gmail.com wrote: Hi shawn, I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in each machine. I am using solrcloud. My total index size is 50gb including 5 servers. Each server have 3 zookeepers. Still I didnt check about OS disk cache and heap memory. I will check and let u know shawn. If anything, pls let me know. Thank u shawn. On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] ml-node+s472066n4129150...@n3.nabble.com wrote: On 4/4/2014 1:31 AM, Sathya wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. How much total RAM do you have on the system, and how much total index data is on that system (adding up all the Solr cores)? You've already said that you have allocated 4GB of RAM for Solr. Later you said you had 50 million documents, and then you showed us a URL that looks like SolrCloud. I suspect that you don't have enough RAM left over to cache your index effectively -- the OS Disk Cache is too small. http://wiki.apache.org/solr/SolrPerformanceProblems Another possible problem, also discussed on that page, is that your Java heap is too small. Thanks, Shawn If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
SOLR Jetty Server on Windows 2003
Hi , I am trying to install solr on the Windows 2003 with Jetty server. Form browser everything works , but when I try to acesss from another javascript Code in other machine I am not getting reponse. I am using Xmlhttprequest to get the response from server using javascript. Any Help...? --Ravi
RE: SOLR Jetty Server on Windows 2003
Are the requests cross domain? Is your browser giving errors about cross domain scripting restrictions in the browser? If you're doing cross domain browser stuff, Solr gives you the ability to do requests over JSONP which is a sneaky hack that gets around these issues. Check out my blog post for an example that uses angular: http://www.opensourceconnections.com/2013/08/25/instant-search-with-solr-and-angular/ Sent from my Windows Phone From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) Sent: 4/4/2014 1:51 PM To: solr-user@lucene.apache.org Subject: SOLR Jetty Server on Windows 2003 Hi , I am trying to install solr on the Windows 2003 with Jetty server. Form browser everything works , but when I try to acesss from another javascript Code in other machine I am not getting reponse. I am using Xmlhttprequest to get the response from server using javascript. Any Help...? --Ravi
Re: Cannot run program svnversion when building lucene 4.7.1
Hi. Yes I installed Tortoise svn. Regards Puneet On 4 Apr 2014 19:35, Ahmet Arslan iori...@yahoo.com wrote: Hi, When you install subversion, svnversion executable comes with that too. Did you install any svn client for Windows? On Friday, April 4, 2014 3:38 PM, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi all. I am trying to build lucene 4.7.1 from the sources. I can compile without any issues but when I try to build the dist, lucene gives me Cannot run program svnversion ... The system cannot find the specified file. I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit. Where can I get this svnversion ? Thanks Puneet
Re: Cannot run program svnversion when building lucene 4.7.1
Hi, I am not a windows user but if you installed that svnversion should be somewhere on disk. Probably right next to svn. Find/locate it by file search, and add its folder to your path. Once you do that you can invoke svnversion in command line. For example here is the executables in my computer under /opt/subversion/bin: svn svnadminsvnlook svnservesvnversion svn-tools svndumpfilter svnrdumpsvnsync On Friday, April 4, 2014 9:18 PM, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi. Yes I installed Tortoise svn. Regards Puneet On 4 Apr 2014 19:35, Ahmet Arslan iori...@yahoo.com wrote: Hi, When you install subversion, svnversion executable comes with that too. Did you install any svn client for Windows? On Friday, April 4, 2014 3:38 PM, Puneet Pawaia puneet.paw...@gmail.com wrote: Hi all. I am trying to build lucene 4.7.1 from the sources. I can compile without any issues but when I try to build the dist, lucene gives me Cannot run program svnversion ... The system cannot find the specified file. I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit. Where can I get this svnversion ? Thanks Puneet
How to see the value of long type (solr) ?
Hi, We use solr 3.6 to index a field of long type: fieldType name=long class=solr.TrieLongField ... Now for debugging purpose we need to see the original value (the field is not stored), but in luke we cannot see. 1/ is there a way to see original long type value (using luke or not) ? 2/ if we need to use lucene to search this field, what analyzer should we use ? Thanks very much for helps, Lisheng
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
In this case we are indexing an Oracle database. We do not include the data-config.xml in our distribution. We store the database information in the database.xml file. I have attached the database.xml file. When we use the default merge policy settings, we get the same results. We have not tried to dump the table to a comma separated file. We think that dumping this size table to disk will introduce other memory problems with big file management. We have not tested that case. On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP parameters request: webapp=/solr path=/dataimport params={clean=falsecommand=full-importwt=javabinversion=2} We are getting a Java heap OOM exception when indexing (updating) 27 million records. If we increase the Java heap memory settings the problem goes away but we believe the problem has not been fixed and that we will eventually get the same OOM exception. We have other processes on the server that also require resources so we cannot continually increase the memory settings to resolve the OOM issue. We are trying to find a way to configure the SOLR instance to reduce or preferably eliminate the possibility of an OOM exception. We can reproduce the problem on a test machine. We set the Java heap memory size to 64MB to accelerate the exception. If we increase this setting the same problems occurs, just hours later. In the test environment, we are using the following parameters: JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m Normally we use the default solrconfig.xml file with only the following jar file references added: lib path=../../../../default/lib/common.jar / lib path=../../../../default/lib/webapp.jar / lib path=../../../../default/lib/commons-pool-1.4.jar / Using these values and trying to index 6 million records from the database, the Java Heap Out of Memory exception is thrown very quickly. We were able to complete a successful indexing by further modifying the solrconfig.xml and removing all or all but one copyfield tags from the schema.xml file. The following solrconfig.xml values were modified: ramBufferSizeMB6/ramBufferSizeMB mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce2/int int name=maxMergeAtOnceExplicit2/int int name=segmentsPerTier10/int int name=maxMergedSegmentMB150/int /mergePolicy autoCommit maxDocs15000/maxDocs !-- This tag was maxTime, before this -- openSearcherfalse/openSearcher /autoCommit Using our customized schema.xml file with two or more copyfield tags, the OOM exception is always thrown. Based on the errors, the problem occurs when the process was trying to do the merge. The error is provided below: Exception in thread Lucene Merge Thread #156 org.apache.lucene.index.MergePolicy$MergeException: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:541) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:514) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNumeric(Lucene42DocValuesProducer.java:180) at org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeric(Lucene42DocValuesProducer.java:146) at org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCoreReaders.java:301)
[ JOB ] - Search Specialist, Bloomberg LP [ NY and London ]
http://jobs.bloomberg.com/job/New-York-Search-Technology-Specialist-Job-NY/45497500/ http://jobs.bloomberg.com/job/London-RD-News-Search-Backend-Developer-Job/50463600/ keeping it short here , feel free to talk to me with more questions -- Anirudha P. Jadhav
Re: Does sorting skip everything having to do with relevancy?
Hi, If you dont want to waste your cpu time, then comment the boost parameter in the query parser defined in your solrconfig.xml. If you cant do that, then you can overwrite it sending the boost parameter for example using the constant function (e.g. http:///...boost=1sort=your_sort). The parameter boost will be overwritten if it is not defined as an invariant. Regards. On Fri, Apr 4, 2014 at 4:12 PM, Shawn Heisey s...@elyograg.org wrote: On 4/4/2014 12:48 AM, Alvaro Cabrerizo wrote: By default solr is using the sort parameter over the score field. So if you overwrite it using other sort field, yes solr will use the parameter you've provided. Remember, you can use multiple fields for sortinghttp://wiki.apache.org/solr/CommonQueryParameters#sort so you can make something like: sort score desc, your_field1 asc, your_field2 desc The score of documents is calculated on every query (it does not depend on the sort parameter or the debugQueryParameter) and the debubQuery is only a mechanism for showing (or hidding) how score was calculated. If you want to see a document score for a particular query (apart from the debugQuery) you can ask for it in the solr response adding the parameter *fl=*,score* to your request. These are things that I already know. What I want to know is whether Solr has code in place that will avoid wasting CPU cycles calculating the score that will never be displayed or used, *especially* the complex boost parameter that's in the request handler definition (solrconfig.xml). str name=boostmin(recip(abs(ms(NOW/HOUR,registered_date)),1.92901e-10,1.5,1.5),0.85)/str Do I need to send 'boost=' as a parameter (along with my sort) to get it to avoid that calculation? Thanks, Shawn
Re: Does sorting skip everything having to do with relevancy?
On 4/4/2014 1:48 PM, Alvaro Cabrerizo wrote: If you dont want to waste your cpu time, then comment the boost parameter in the query parser defined in your solrconfig.xml. If you cant do that, then you can overwrite it sending the boost parameter for example using the constant function (e.g. http:///...boost=1sort=your_sort). The parameter boost will be overwritten if it is not defined as an invariant. Thank you for responding. I know how I can override the behavior, what I want to find out is whether or not it's necessary to do so -- if it's not necessary because Solr skips it, then everything is good. If it is necessary, I can open an issue in Jira asking for Solr to get smarter. That way everyone benefits and they don't have to do anything except upgrade Solr. Thanks, Shawn
Distributed tracing for Solr via adding HTTP headers?
We have some metadata -- e.g. a request UUID -- that we log to every log line using Log4J's MDC [1]. The UUID logging allows us to connect any log lines we have for a given request across servers. Sort of like Zipkin [2]. Currently we're using EmbeddedSolrServer without sharding, so adding the UUID is fairly simple, since everything is in one process and one thread. But, we're testing a sharded HTTP implementation and running into some difficulties getting this data passed around in a way that lets us trace all log lines generated by a request to its UUID. The first thing I tried was to add the UUID by adding it to the SolrParams. This achieves the goal of getting those values logged on the shards if a request is successful, but we miss having those values in the MDC if there are other log lines before the final log line. E.g. an Exception in a custom component. My current thought is that sending HTTP headers with diagnostic information would be very useful. Those could be placed in the MDC even before handing off to work to SolrDispatchFilter, so that any Solr problem will have the proper logging. I.e. every additional header added to a Solr request gets a Solr- prefix. On the server, we look for those headers and add them to the SLF4J MDC[3]. Here's a patch [4] that does this that we're testing out. Is this a good idea? Would anyone else find this useful? If so, I'll open a ticket. --Gregg [1] http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/MDC.html [2] http://twitter.github.io/zipkin/ [3] http://www.slf4j.org/api/org/slf4j/MDC.html [4] https://gist.github.com/greggdonovan/9982327
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
In case the attached database.xml file didn't show up, I have pasted in the contents below: dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL user=admin password=admin readOnly=false batchSize=100 / document entity name=full-index query= select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc / field column=ADDRESSALLADDRTYPECD name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc / field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc / field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc / field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc / field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc / field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc / field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc / /entity !-- Varaibles -- !-- '${dataimporter.last_index_time}' -- /document /dataConfig On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In this case we are indexing an Oracle database. We do not include the data-config.xml in our distribution. We store the database information in the database.xml file. I have attached the database.xml file. When we use the default merge policy settings, we get the same results. We have not tried to dump the table to a comma separated file. We think that dumping this size table to disk will introduce other memory problems with big file management. We have not tested that case. On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP parameters request: webapp=/solr path=/dataimport params={clean=falsecommand=full-importwt=javabinversion=2} We are getting a Java heap OOM exception when indexing (updating) 27 million records. If we increase the Java heap memory settings the problem goes away but we believe the problem has not been fixed and that we will eventually get the same OOM exception. We have other processes on the server that also require resources so we cannot continually increase the memory settings to resolve the OOM issue. We are trying to find a way to configure the SOLR instance to reduce or preferably eliminate the possibility of an OOM exception. We can reproduce the problem on a test machine. We set the Java heap memory size to 64MB to accelerate the exception. If we increase this setting the same problems occurs, just hours later. In the test environment, we are using the following parameters: JVMARGS=-XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m Normally we use the default solrconfig.xml file with only the following jar file references added:
Re: Does sorting skip everything having to do with relevancy?
Hello Shawn, I suppose SolrIndexSearcher.buildTopDocsCollector() doesn't create a Collector which calls score() in this case. Hence, it shouldn't waste CPU. Just my impression. Haven't you tried to check it supplying some weird formula, which throws exception? On Sat, Apr 5, 2014 at 12:02 AM, Shawn Heisey s...@elyograg.org wrote: On 4/4/2014 1:48 PM, Alvaro Cabrerizo wrote: If you dont want to waste your cpu time, then comment the boost parameter in the query parser defined in your solrconfig.xml. If you cant do that, then you can overwrite it sending the boost parameter for example using the constant function (e.g. http:///...boost=1sort=your_sort). The parameter boost will be overwritten if it is not defined as an invariant. Thank you for responding. I know how I can override the behavior, what I want to find out is whether or not it's necessary to do so -- if it's not necessary because Solr skips it, then everything is good. If it is necessary, I can open an issue in Jira asking for Solr to get smarter. That way everyone benefits and they don't have to do anything except upgrade Solr. Thanks, Shawn -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr join and lucene scoring
On Thu, Apr 3, 2014 at 1:42 PM, m...@preselect-media.com wrote: Hello, referencing to this issue: https://issues.apache.org/jira/browse/SOLR-4307 Is it still not possible with the solr query time join to use scoring? It's not implemented still. https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L549 Do I still have to write my own plugin or is there a plugin somewhere I could use? I never wrote a plugin for solr before, so I would prefer if I don't have to start from scratch. The right approach from my POV is to use Lucene's join https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.javain new QParser, but solving the impedance between Lucene and Solr, might be tricky. THX, Moritz -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Filter query with multiple raw/literal ORs
On Fri, Apr 4, 2014 at 4:08 AM, Yonik Seeley yo...@heliosearch.com wrote: Try adding a space before the first term, so the default lucene query parser will be used: Yonik, I'm curious, whether it a feature? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
Hi, Can you remove auto commit for bulk import. Commit at the very end? Ahmet On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In case the attached database.xml file didn't show up, I have pasted in the contents below: dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL user=admin password=admin readOnly=false batchSize=100 / document entity name=full-index query= select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc / field column=ADDRESSALLADDRTYPECD name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc / field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc / field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc / field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc / field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc / field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc / field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc / /entity !-- Varaibles -- !-- '${dataimporter.last_index_time}' -- /document /dataConfig On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In this case we are indexing an Oracle database. We do not include the data-config.xml in our distribution. We store the database information in the database.xml file. I have attached the database.xml file. When we use the default merge policy settings, we get the same results. We have not tried to dump the table to a comma separated file. We think that dumping this size table to disk will introduce other memory problems with big file management. We have not tested that case. On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP parameters request: webapp=/solr path=/dataimport params={clean=falsecommand=full-importwt=javabinversion=2} We are getting a Java heap OOM exception when indexing (updating) 27 million records. If we increase the Java heap memory settings the problem goes away but we believe the problem has not been fixed and that we will eventually get the same OOM exception. We have other processes on the server that also require resources so we cannot continually increase the memory settings to resolve the OOM issue. We are trying to find a way to configure the SOLR instance to reduce or preferably eliminate the possibility of an OOM exception. We can reproduce the problem on a test machine. We set the Java heap memory size to 64MB to accelerate the exception. If we increase this setting the same problems occurs, just hours later. In the test environment, we are using the following
Re: Filter query with multiple raw/literal ORs
On Fri, Apr 4, 2014 at 5:28 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: On Fri, Apr 4, 2014 at 4:08 AM, Yonik Seeley yo...@heliosearch.com wrote: Try adding a space before the first term, so the default lucene query parser will be used: Yonik, I'm curious, whether it a feature? Yep, it was completely on purpose that I required local parameters to be left-justified. It left an easy way to escape the normal local params processing when looking for the query type. For example, if you want to ensure that your custom parser is used, and you have defType=myCustomQParser then all you have to do is add a space before the query parameter (which shouldn't mess up any sort of natural language query parser). -Yonik http://heliosearch.org - solve Solr GC pauses with off-heap filters and fieldcache
Re: Does sorting skip everything having to do with relevancy?
On 4/4/2014 3:13 PM, Mikhail Khludnev wrote: I suppose SolrIndexSearcher.buildTopDocsCollector() doesn't create a Collector which calls score() in this case. Hence, it shouldn't waste CPU. Just my impression. Haven't you tried to check it supplying some weird formula, which throws exception? I didn't think of that. That's a good idea -- as long as there's not independent code that checks the function in addition to the code that actually runs it. With the following parameters added to an edismax query that otherwise works, I get an exception. It works if I change the e to 5. sort=registered_date ascboost=sum(5,e) I will take Alvaro's suggestion and add boost=1 to queries that use a sort parameter. It's probably a good idea to file that Jira. Thanks, Shawn
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
We would be happy to try that. That sounds counter intuitive for the high volume of records we have. Can you help me understand how that might solve our problem? On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you remove auto commit for bulk import. Commit at the very end? Ahmet On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In case the attached database.xml file didn't show up, I have pasted in the contents below: dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL user=admin password=admin readOnly=false batchSize=100 / document entity name=full-index query= select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc / field column=ADDRESSALLADDRTYPECD name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc / field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc / field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc / field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc / field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc / field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc / field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc / /entity !-- Varaibles -- !-- '${dataimporter.last_index_time}' -- /document /dataConfig On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In this case we are indexing an Oracle database. We do not include the data-config.xml in our distribution. We store the database information in the database.xml file. I have attached the database.xml file. When we use the default merge policy settings, we get the same results. We have not tried to dump the table to a comma separated file. We think that dumping this size table to disk will introduce other memory problems with big file management. We have not tested that case. On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP parameters request: webapp=/solr path=/dataimport params={clean=falsecommand=full-importwt=javabinversion=2} We are getting a Java heap OOM exception when indexing (updating) 27 million records. If we increase the Java heap memory settings the problem goes away but we believe the problem has not been fixed and that we will eventually get the same OOM exception. We have other processes on the server that also require resources so we cannot continually increase the memory settings to resolve the OOM
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
I might have forgot to mention that we are using the DataImportHandler. I think we know how to remove auto commit. How would we force a commit at the end? On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: We would be happy to try that. That sounds counter intuitive for the high volume of records we have. Can you help me understand how that might solve our problem? On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you remove auto commit for bulk import. Commit at the very end? Ahmet On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In case the attached database.xml file didn't show up, I have pasted in the contents below: dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL user=admin password=admin readOnly=false batchSize=100 / document entity name=full-index query= select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc / field column=ADDRESSALLADDRTYPECD name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc / field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc / field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc / field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc / field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc / field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc / field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc / /entity !-- Varaibles -- !-- '${dataimporter.last_index_time}' -- /document /dataConfig On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In this case we are indexing an Oracle database. We do not include the data-config.xml in our distribution. We store the database information in the database.xml file. I have attached the database.xml file. When we use the default merge policy settings, we get the same results. We have not tried to dump the table to a comma separated file. We think that dumping this size table to disk will introduce other memory problems with big file management. We have not tested that case. On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP parameters request: webapp=/solr path=/dataimport params={clean=falsecommand=full-importwt=javabinversion=2} We are getting a Java heap OOM exception when indexing (updating) 27 million records. If we increase the Java heap memory settings the problem goes
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
Hi, This may not solve your problem but generally it is recommended to disable auto commit and transaction logs for bulk indexing. And issue one commit at the very end. Do you tlogs enabled? I see commit failed in the error message thats why I am offering this. And regarding comma separated values, with this approach you focus on just solr importing process. You separate data acquisition phrase. And it is very fast load even big csv files http://wiki.apache.org/solr/UpdateCSV I have never experienced OOM during indexing, I suspect data acquisition has role in it. Ahmet On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: We would be happy to try that. That sounds counter intuitive for the high volume of records we have. Can you help me understand how that might solve our problem? On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you remove auto commit for bulk import. Commit at the very end? Ahmet On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In case the attached database.xml file didn't show up, I have pasted in the contents below: dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL user=admin password=admin readOnly=false batchSize=100 / document entity name=full-index query= select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc / field column=ADDRESSALLADDRTYPECD name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc / field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc / field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc / field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc / field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc / field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc / field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc / /entity !-- Varaibles -- !-- '${dataimporter.last_index_time}' -- /document /dataConfig On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In this case we are indexing an Oracle database. We do not include the data-config.xml in our distribution. We store the database information in the database.xml file. I have attached the database.xml file. When we use the default merge policy settings, we get the same results. We have not tried to dump the table to a comma separated file. We think that dumping this size table to disk will introduce other memory problems with big file management. We have not tested that case. On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^ -Dcom.sun.management.jmxremote.port=1092 ^ -Dcom.sun.management.jmxremote.ssl=false ^ -Dcom.sun.management.jmxremote.authenticate=false ^ -jar start.jar *SOLR indexing HTTP
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
Hi, To disable auto commit remove both autoCommit and autoSoftCommit parts/definitions from solrconfig.xml To disable tlog remove updateLog str name=dir${solr.ulog.dir:}/str /updateLog from solrconfig.xml To commit at the end use commit=true parameter. ?commit=truecommand=full-import There is a checkbox for this in data import admin page. On Saturday, April 5, 2014 1:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: I might have forgot to mention that we are using the DataImportHandler. I think we know how to remove auto commit. How would we force a commit at the end? On Fri, Apr 4, 2014 at 3:18 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: We would be happy to try that. That sounds counter intuitive for the high volume of records we have. Can you help me understand how that might solve our problem? On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you remove auto commit for bulk import. Commit at the very end? Ahmet On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In case the attached database.xml file didn't show up, I have pasted in the contents below: dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL user=admin password=admin readOnly=false batchSize=100 / document entity name=full-index query= select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID name=ADDRESS_ACCT_ALL.RECORD_ID_abc / field column=ADDRESSALLADDRTYPECD name=ADDRESS_ACCT_ALL.ADDR_TYPE_CD_abc / field column=ADDRESSALLLONGITUDE name=ADDRESS_ACCT_ALL.LONGITUDE_abc / field column=ADDRESSALLLATITUDE name=ADDRESS_ACCT_ALL.LATITUDE_abc / field column=ADDRESSALLADDRNAME name=ADDRESS_ACCT_ALL.ADDR_NAME_abc / field column=ADDRESSALLCITY name=ADDRESS_ACCT_ALL.CITY_abc / field column=ADDRESSALLSTATE name=ADDRESS_ACCT_ALL.STATE_abc / field column=ADDRESSALLEMAILADDR name=ADDRESS_ACCT_ALL.EMAIL_ADDR_abc / /entity !-- Varaibles -- !-- '${dataimporter.last_index_time}' -- /document /dataConfig On Fri, Apr 4, 2014 at 11:55 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In this case we are indexing an Oracle database. We do not include the data-config.xml in our distribution. We store the database information in the database.xml file. I have attached the database.xml file. When we use the default merge policy settings, we get the same results. We have not tried to dump the table to a comma separated file. We think that dumping this size table to disk will introduce other memory problems with big file management. We have not tested that case. On Fri, Apr 4, 2014 at 7:25 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Which database are you using? Can you send us data-config.xml? What happens when you use default merge policy settings? What happens when you dump your table to Comma Separated File and fed that file to solr? Ahmet On Friday, April 4, 2014 5:10 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: The ramBufferSizeMB was set to 6MB only on the test system to make the system crash sooner. In production that tag is commented out which I believe forces the default value to be used. On Thu, Apr 3, 2014 at 5:46 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, out of curiosity, why did you set ramBufferSizeMB to 6? Ahmet On Friday, April 4, 2014 3:27 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: *Main issue: Full Indexing is Causing a Java Heap Out of Memory Exception *SOLR/Lucene version: *4.2.1* *JVM version: Java(TM) SE Runtime Environment (build 1.7.0_07-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.3-b01, mixed mode) *Indexer startup command: set JVMARGS=-XX:MaxPermSize=364m -Xss256K -Xmx6144m -Xms6144m java %JVMARGS% ^
Re: Searching multivalue fields.
I had already tested with omitTermFreqAndPositions=false . I still got the same error. Is there something that I am overlooking? On Fri, Apr 4, 2014 at 2:45 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, Add omitTermFreqAndPositions=false attribute to fieldType definitions. fieldType name=string class=solr.StrField omitTermFreqAndPositions=false sortMissingLast=true / fieldType name=int class=solr.TrieIntField omitTermFreqAndPositions=false precisionStep=0 positionIncrementGap=0/ You don't need termVectors for this. 1.2: omitTermFreqAndPositions attribute introduced, true by default except for text fields. And please reply to solr user mail, so others can use the threat later on. Ahmet On Saturday, April 5, 2014 12:18 AM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hey Ahmet, Sorry it took some time to test this. But schema definition seem to conflict with SpanQuery. I get following error when I use Spans field OrderLineType was indexed without position data; cannot run SpanTermQuery (term=11) I changed field definition in the schema but can't find the right attribute to set this. My last attempt was with following definition field name=OrderLineType type=string indexed=true stored=true multiValued=true *termVectors=true termPositions=true termOffsets=true*/ Any ideas what I am doing wrong? Thanks, -Vijay On Wed, Mar 26, 2014 at 1:54 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, After reading the documentation it seems that following query is what you are after. It will return OrderId:345 without matching OrderId:123 SpanQuery q1 = new SpanTermQuery(new Term(BookingRecordId, 234)); SpanQuery q2 = new SpanTermQuery(new Term(OrderLineType, 11)); SpanQuery q2m new FieldMaskingSpanQuery(q2, BookingRecordId); Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false); Ahmet On Wednesday, March 26, 2014 10:39 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Vijay, I personally don't understand joins very well. Just a guess may be FieldMaskingSpanQuery could be used? http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html Ahmet On Wednesday, March 26, 2014 9:46 PM, Vijay Kokatnur kokatnur.vi...@gmail.com wrote: Hi, I am bumping this thread again one last time to see if anyone has a solution. In it's current state, our application is storing child items as multivalue fields. Consider some orders, for example - { OrderId:123 BookingRecordId : [145, 987, *234*] OrderLineType : [11, 12, *13*] . } { OrderId:345 BookingRecordId : [945, 882, *234*] OrderLineType : [1, 12, *11*] . } { OrderId:678 BookingRecordId : [444] OrderLineType : [11] . } Here, If you look up for an Order with BookingRecordId: 234 And OrderLineType:11. You will get two orders with orderId : 123 and 345, which is correct. You have two arrays in both the orders that satisfy this condition. However, for OrderId:123, the value at 3rd index of OrderLineType array is 13 and not 11( this is for OrderId:345). So orderId 123 should be excluded. This is what I am trying to achieve. I got some suggestions from a solr-user to use FieldsCollapsing, Join, Block-join or string concatenation. None of these approaches can be used without re-indexing schema. Has anyone found a non-invasive solution for this? Thanks, -Vijay
Re: Difference between [ TO *] and [* TO *] at Solr?
What kind of field are you using? Not quite sure what would happen with a date or numeric field for instance. On Fri, Apr 4, 2014 at 10:28 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hİ; What is the difference between [ TO *] and [* TO *] at Solr? (I tested it at 4.5.1 and numFounds are different. Thanks; Furkan KAMACI
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
Guessing that the attachments won't work, I am pasting one file in each of four separate emails. database.xml dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test.abcdata.com:1521:ORCL user=admin password=admin readOnly=false / document entity name=full-index query= select NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORACLE.ADDRESS_ALL' as SOLR_CATEGORY, NVL(cast(ORACLE.ADDRESS_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORACLE.ADDRESS_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORACLE.ADDRESS_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORACLE.ADDRESS_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORACLE.ADDRESS_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORACLE.ADDRESS_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORACLE.ADDRESS_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORACLE.ADDRESS_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORACLE.ADDRESS_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID name=ADDRESS_ALL.RECORD_ID_abc / field column=ADDRESSALLADDRTYPECD name=ADDRESS_ALL.ADDR_TYPE_CD_abc / field column=ADDRESSALLLONGITUDE name=ADDRESS_ALL.LONGITUDE_abc / field column=ADDRESSALLLATITUDE name=ADDRESS_ALL.LATITUDE_abc / field column=ADDRESSALLADDRNAME name=ADDRESS_ALL.ADDR_NAME_abc / field column=ADDRESSALLCITY name=ADDRESS_ALL.CITY_abc / field column=ADDRESSALLSTATE name=ADDRESS_ALL.STATE_abc / field column=ADDRESSALLEMAILADDR name=ADDRESS_ALL.EMAIL_ADDR_abc / /entity !-- Varaibles -- !-- '${dataimporter.last_index_time}' -- /document /dataConfig On Fri, Apr 4, 2014 at 4:57 PM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: Does this user list allow attachments? I have four files attached (database.xml, error.txt, schema.xml, solrconfig.xml). We just ran the process again using the parameters you suggested, but not to a csv file. It errored out quickly. We are working on the csv file run. Removed both autoCommit and autoSoftCommit parts/definitions from solrconfig.xml Disabled tlog by removing updateLog str name=dir${solr.ulog.dir:}/str /updateLog from solrconfig.xml Used commit=true parameter. ?commit=truecommand=full-import On Fri, Apr 4, 2014 at 3:29 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, This may not solve your problem but generally it is recommended to disable auto commit and transaction logs for bulk indexing. And issue one commit at the very end. Do you tlogs enabled? I see commit failed in the error message thats why I am offering this. And regarding comma separated values, with this approach you focus on just solr importing process. You separate data acquisition phrase. And it is very fast load even big csv files http://wiki.apache.org/solr/UpdateCSV I have never experienced OOM during indexing, I suspect data acquisition has role in it. Ahmet On Saturday, April 5, 2014 1:18 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: We would be happy to try that. That sounds counter intuitive for the high volume of records we have. Can you help me understand how that might solve our problem? On Fri, Apr 4, 2014 at 2:34 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Can you remove auto commit for bulk import. Commit at the very end? Ahmet On Saturday, April 5, 2014 12:16 AM, Candygram For Mongo candygram.for.mo...@gmail.com wrote: In case the attached database.xml file didn't show up, I have pasted in the contents below: dataConfig dataSource name=org_only type=JdbcDataSource driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@test2.abc.com:1521:ORCL user=admin password=admin readOnly=false batchSize=100 / document entity name=full-index query= select NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(100)), 'null') as SOLR_ID, 'ORCL.ADDRESS_ACCT_ALL' as SOLR_CATEGORY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.RECORD_ID as varchar2(255)), ' ') as ADDRESSALLROWID, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_TYPE_CD as varchar2(255)), ' ') as ADDRESSALLADDRTYPECD, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LONGITUDE as varchar2(255)), ' ') as ADDRESSALLLONGITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.LATITUDE as varchar2(255)), ' ') as ADDRESSALLLATITUDE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.ADDR_NAME as varchar2(255)), ' ') as ADDRESSALLADDRNAME, NVL(cast(ORCL.ADDRESS_ACCT_ALL.CITY as varchar2(255)), ' ') as ADDRESSALLCITY, NVL(cast(ORCL.ADDRESS_ACCT_ALL.STATE as varchar2(255)), ' ') as ADDRESSALLSTATE, NVL(cast(ORCL.ADDRESS_ACCT_ALL.EMAIL_ADDR as varchar2(255)), ' ') as ADDRESSALLEMAILADDR from ORCL.ADDRESS_ACCT_ALL field column=SOLR_ID name=id / field column=SOLR_CATEGORY name=category / field column=ADDRESSALLROWID
Re: Full Indexing is Causing a Java Heap Out of Memory Exception
error.txt below Java Platform Detected x64 Java Platform Detected -XX:MaxPermSize=64m -Xss256K -Xmx64m -Xms64m -XX:+HeapDumpOnOutOfMemoryError -XX:+CreateMinidumpOnCrash 2014-04-04 15:49:43.341:INFO:oejs.Server:jetty-8.1.8.v20121106 2014-04-04 15:49:43.353:INFO:oejdp.ScanningAppProvider:Deployment monitor D:\AbcData\V12\application server\server\indexer\example\contexts at interval 0 2014-04-04 15:49:43.358:INFO:oejd.DeploymentManager:Deployable added: D:\AbcData\V12\application server\server\indexer\example\contexts\solr-jetty-context.xml 2014-04-04 15:49:43.989:INFO:oejw.StandardDescriptorProcessor:NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet Null identity service, trying login service: null Finding identity service: null 2014-04-04 15:49:44.011:INFO:oejsh.ContextHandler:started o.e.j.w.WebAppContext{/solr,file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr-webapp/webapp/},D:\AbcData\V12\application server\server\indexer\example/webapps/solr.war 2014-04-04 15:49:44.012:INFO:oejsh.ContextHandler:started o.e.j.w.WebAppContext{/solr,file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr-webapp/webapp/},D:\AbcData\V12\application server\server\indexer\example/webapps/solr.war Apr 04, 2014 3:49:44 PM org.apache.solr.servlet.SolrDispatchFilter init INFO: SolrDispatchFilter.init() Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: JNDI not configured for solr (NoInitialContextEx) Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer$Initializer initialize INFO: looking for solr.xml: D:\AbcData\V12\application server\server\indexer\example\solr\solr.xml Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer init INFO: New CoreContainer 1879341237 Apr 04, 2014 3:49:44 PM org.apache.solr.core.CoreContainer load INFO: Loading CoreContainer using Solr Home: 'solr/' Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader init INFO: new SolrResourceLoader for directory: 'solr/' Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/apache-log4j-extras-1.2.17.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/jtds-1.2.5.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/log4j-1.2.17.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/msbase.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/mssqlserver.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/msutil.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/ojdbc6.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/slf4j-api-1.7.5.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/slf4j-nop-1.7.5.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/D:/AbcData/V12/applicationserver/server/indexer/example/solr/lib/sqljdbc4.jar' to classloader Apr 04, 2014 3:49:44 PM org.apache.solr.handler.component.HttpShardHandlerFactory getParameter INFO: Setting socketTimeout to: 0 Apr 04, 2014 3:49:44 PM org.apache.solr.handler.component.HttpShardHandlerFactory getParameter INFO: Setting urlScheme to: http:// Apr 04, 2014 3:49:44 PM org.apache.solr.handler.component.HttpShardHandlerFactory getParameter INFO: Setting connTimeout to: 0 Apr 04, 2014 3:49:44 PM org.apache.solr.handler.component.HttpShardHandlerFactory getParameter INFO: Setting maxConnectionsPerHost to: 20 Apr 04, 2014 3:49:44 PM org.apache.solr.handler.component.HttpShardHandlerFactory getParameter INFO: Setting corePoolSize to: 0 Apr 04, 2014 3:49:44 PM org.apache.solr.handler.component.HttpShardHandlerFactory getParameter INFO: Setting maximumPoolSize to: 2147483647 Apr 04, 2014 3:49:44 PM
Re: Distributed tracing for Solr via adding HTTP headers?
I like the idea. No comments about implementation, leave it to others. But if it is done, maybe somebody very familiar with logging can also review Solr's current logging config. I suspect it is not optimized for troubleshooting at this point. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Sat, Apr 5, 2014 at 3:16 AM, Gregg Donovan gregg...@gmail.com wrote: We have some metadata -- e.g. a request UUID -- that we log to every log line using Log4J's MDC [1]. The UUID logging allows us to connect any log lines we have for a given request across servers. Sort of like Zipkin [2]. Currently we're using EmbeddedSolrServer without sharding, so adding the UUID is fairly simple, since everything is in one process and one thread. But, we're testing a sharded HTTP implementation and running into some difficulties getting this data passed around in a way that lets us trace all log lines generated by a request to its UUID.
Re: SOLR Jetty Server on Windows 2003
You might be hitting http://en.wikipedia.org/wiki/Cross-origin_resource_sharing . Something like http://www.telerik.com/fiddler or Wireshark may allow you to see network traffic if you don't have other means. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Sat, Apr 5, 2014 at 12:49 AM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi , I am trying to install solr on the Windows 2003 with Jetty server. Form browser everything works , but when I try to acesss from another javascript Code in other machine I am not getting reponse. I am using Xmlhttprequest to get the response from server using javascript. Any Help...? --Ravi
Re: How to reduce the search speed of solrcloud
And 50 million records of 3 fields each should not become 50Gb of data. Something smells wrong there. Do you have unique IDs setup? Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Sat, Apr 5, 2014 at 12:48 AM, Anshum Gupta ans...@anshumgupta.net wrote: I am not sure if you setup your SolrCloud right. Can you also provide me with the version of Solr that you're running. Also, if you could tell me about how did you setup your SolrCloud cluster. Are the times consistent? Is this the only collection on the cluster? Also, if I am getting it right, you have 15 ZKs running. Correct me if I'm wrong, but if I'm not, you don't need that kind of a zk setup. On Fri, Apr 4, 2014 at 9:39 AM, Sathya sathia.blacks...@gmail.com wrote: Hi shawn, I have indexed 50 million data in 5 servers. 3 servers have 8gb ram. One have 24gb and another one have 64gb ram. I was allocate 4 gb ram to solr in each machine. I am using solrcloud. My total index size is 50gb including 5 servers. Each server have 3 zookeepers. Still I didnt check about OS disk cache and heap memory. I will check and let u know shawn. If anything, pls let me know. Thank u shawn. On Friday, April 4, 2014, Shawn Heisey-4 [via Lucene] ml-node+s472066n4129150...@n3.nabble.com wrote: On 4/4/2014 1:31 AM, Sathya wrote: Hi All, Hi All, I am new to Solr. And i dont know how to increase the search speed of solrcloud. I have indexed nearly 4 GB of data. When i am searching a document using java with solrj, solr takes more 6 seconds to return a query result. Any one please help me to reduce the search query time to less than 500 ms. i have allocate the 4 GB ram for solr. Please let me know for further details about solrcloud config. How much total RAM do you have on the system, and how much total index data is on that system (adding up all the Solr cores)? You've already said that you have allocated 4GB of RAM for Solr. Later you said you had 50 million documents, and then you showed us a URL that looks like SolrCloud. I suspect that you don't have enough RAM left over to cache your index effectively -- the OS Disk Cache is too small. http://wiki.apache.org/solr/SolrPerformanceProblems Another possible problem, also discussed on that page, is that your Java heap is too small. Thanks, Shawn If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129150.html To unsubscribe from How to reduce the search speed of solrcloud, click here. NAML -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-reduce-the-search-speed-of-solrcloud-tp4129067p4129173.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
Re: Solr Search For Documents That Has Empty Content For a Given Particular Field
And one solution is to use UpdateRequestProcessor that will create a separate binary field for presence/absence and query on that instead. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 4, 2014 at 11:13 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : field : // this is the field that I want to learn which document has : it. How you (can) query for a field value like that is going to depend entirely on the FieldTYpe/Analyzer ... if it's a string field, of uses KeywordTokenizer then q=field: should find it -- if you use a more traditional analyzer then it probably didn't produce any terms for hte input and from Solr's perspective a document that was indexed using an empty string value is exactly the same as a document that had no value when index. In essenc,e your question is equivilent to asking How can i search for doc1, but not doc2, evne though i'm using LowerCaseAnalyzer which produces exactly the same indexe terms or both... doc1: Quick Fox doc2: quick fox -Hoss http://www.lucidworks.com/
Re: Cannot run program svnversion when building lucene 4.7.1
: I am trying to build lucene 4.7.1 from the sources. I can compile without : any issues but when I try to build the dist, lucene gives me : Cannot run program svnversion ... The system cannot find the specified : file. : : I am compiling on Windows 7 64-bit using java version 1.7.0.45 64-bit. That's ... strange. the build system *attempts* to include the svnversion info in the build artifacts, but it is explicitly designed to not fail if svnversion can't be run. Can you pleae file a bug, note in the description your specific OS setup, and include as an attachment the full build logs you get from ant that give you this error? ideally run ant using the -v option. worst case scenerio: you should be able to override the svnversion.exe build property to some simple command that doesn't output much (not sure what a good command to use on windows might be - i would use something like whoami on linux if i din't have svn installed). the command would be something like this... ant -Dsvnversion.exe=whoami dist -Hoss http://www.lucidworks.com/
Re: Difference between [ TO *] and [* TO *] at Solr?
And we can debate what it should or shouldn't be (and just check the code!) - and a clear contract is quite desirable, but this is starting to smell like an XY Problem - what is the user really trying to query - stated simply in English. -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Friday, April 4, 2014 5:38 PM To: solr-user@lucene.apache.org Subject: Re: Difference between [ TO *] and [* TO *] at Solr? What kind of field are you using? Not quite sure what would happen with a date or numeric field for instance. On Fri, Apr 4, 2014 at 10:28 AM, Furkan KAMACI furkankam...@gmail.com wrote: Hİ; What is the difference between [ TO *] and [* TO *] at Solr? (I tested it at 4.5.1 and numFounds are different. Thanks; Furkan KAMACI