Re: DIH ConcurrentModificationException
This is fixed in trunk. 2009/5/5 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com hi Walter, it needs synchronization. I shall open a bug. On Mon, May 4, 2009 at 7:31 PM, Walter Ferrara walters...@gmail.com wrote: I've got a ConcurrentModificationException during a cron-ed delta import of DIH, I'm using multicore solr nightly from hudson 2009-04-02_08-06-47. I don't know if this stacktrace maybe useful to you, but here it is: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(Unknown Source) at java.util.LinkedHashMap$EntryIterator.next(Unknown Source) at java.util.LinkedHashMap$EntryIterator.next(Unknown Source) at org.apache.solr.handler.dataimport.DataImporter.getStatusMessages(DataImporter.java:384) at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:210 ) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) of-course due to the nature of this exception I doubt it can be reproduced easily (this is the only one I've got, and the croned job runned a lot of times), but maybe should a synchronized be put somewhere? ciao, Walter -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Regards, Shalin Shekhar Mangar.
Re: Getting access to current core's conf dir
On Tue, May 5, 2009 at 11:11 AM, Amit Nithian anith...@gmail.com wrote: I am trying to get at the configuration directory in an implementation of the SolrEventListener. Implement SolrCoreAware and use solrCore.getResourceLoader().getConfigDir() -- Regards, Shalin Shekhar Mangar.
Externalize database parameters from data-config.xml
I have a spring-ibatis project running in my development environment. Now i am setting solr search as part of the application. Everything works fine as expected and solr is providing good results. The only problem i am having is that i have to set the database parameters including the username and password in the data-config.xml. Two drawbacks in this aproach is that 1) the db parameters are been exposed to outside 2) dynamically loading the database params on demand (as i have multiple databases) I think this can be solved by using the db configurations that is defined in the spring configs. Is there anyway to achieve this. Or atleast can i externalise these parameters from the data-config.xml, so that i can encrypt the password. Thanks con -- View this message in context: http://www.nabble.com/Externalize-database-parameters-from-data-config.xml-tp23384483p23384483.html Sent from the Solr - User mailing list archive at Nabble.com.
Spellcheck.build
Hi I have imported/indexed around half a million rows from my database into solr and then rebuilt the spellchecker. I've also setup the delta-import to handle and new or changed rows from the database. Do I need to rebuild the spellchecker each time I run the delta-import? Regards Andrew
Re: Spellcheck.build
Hi, I suppose if the new records contain terms which are not yet found in the spellcheck index/dictionary, it should be rebuilt. Cheers, On Tue, 2009-05-05 at 11:49 +0100, Andrew McCombe wrote: Hi I have imported/indexed around half a million rows from my database into solr and then rebuilt the spellchecker. I've also setup the delta-import to handle and new or changed rows from the database. Do I need to rebuild the spellchecker each time I run the delta-import? Regards Andrew
Re: Externalize database parameters from data-config.xml
If Solr is a part of your application, then why not have tokens in your data-config.xml as place holders for db username, password etc which can be replaced with the actual values as a part of your project build/deploy task. Cheers Avlesh On Tue, May 5, 2009 at 3:32 PM, con convo...@gmail.com wrote: I have a spring-ibatis project running in my development environment. Now i am setting solr search as part of the application. Everything works fine as expected and solr is providing good results. The only problem i am having is that i have to set the database parameters including the username and password in the data-config.xml. Two drawbacks in this aproach is that 1) the db parameters are been exposed to outside 2) dynamically loading the database params on demand (as i have multiple databases) I think this can be solved by using the db configurations that is defined in the spring configs. Is there anyway to achieve this. Or atleast can i externalise these parameters from the data-config.xml, so that i can encrypt the password. Thanks con -- View this message in context: http://www.nabble.com/Externalize-database-parameters-from-data-config.xml-tp23384483p23384483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spellcheck.build
On Tue, May 5, 2009 at 4:19 PM, Andrew McCombe eupe...@gmail.com wrote: I have imported/indexed around half a million rows from my database into solr and then rebuilt the spellchecker. I've also setup the delta-import to handle and new or changed rows from the database. Do I need to rebuild the spellchecker each time I run the delta-import? Yes, you'd need to rebuild the index. Also look at buildOnCommit or buildOnOptimize configuration parameter in the spell check configuration. http://wiki.apache.org/solr/SpellCheckComponent#head-4375b11a78463f5f8b70967074d0787ea3778592 -- Regards, Shalin Shekhar Mangar.
Re: Externalize database parameters from data-config.xml
There are two options. 1) pass on the user name and password as request parameters and use the request parameters in the datasource dataSource user=x password=${dataimporter.request.pwd} / where pwd is a request parameter passed 2) if you can create jndi datasources in the appserver use the jndiName attribute in dataSource On Tue, May 5, 2009 at 3:32 PM, con convo...@gmail.com wrote: I have a spring-ibatis project running in my development environment. Now i am setting solr search as part of the application. Everything works fine as expected and solr is providing good results. The only problem i am having is that i have to set the database parameters including the username and password in the data-config.xml. Two drawbacks in this aproach is that 1) the db parameters are been exposed to outside 2) dynamically loading the database params on demand (as i have multiple databases) I think this can be solved by using the db configurations that is defined in the spring configs. Is there anyway to achieve this. Or atleast can i externalise these parameters from the data-config.xml, so that i can encrypt the password. Thanks con -- View this message in context: http://www.nabble.com/Externalize-database-parameters-from-data-config.xml-tp23384483p23384483.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Wildcard with Double Quotes Query
Hi, I am searching for English Portal using double quotes and I am getting all the records which contains English Portal as together anywhere in any field. for e.g. records are appearing which have, English Portal, English Portal Sacromanto, Core English Portal etc. Problem is, if I am passing only nglish Portal then it is not returning any of these results. Is there any way I can pass the wildcards as prefix and suffix with this search string and get the desired results. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard with Double Quotes Query
I don't remember the answer, but I'm sure this has been discussed many times on the mailing list. Have you tried searching that? You're essentially asking about wildcarded phrase queries Best Erick On Tue, May 5, 2009 at 9:52 AM, dabboo ag...@sapient.com wrote: Hi, I am searching for English Portal using double quotes and I am getting all the records which contains English Portal as together anywhere in any field. for e.g. records are appearing which have, English Portal, English Portal Sacromanto, Core English Portal etc. Problem is, if I am passing only nglish Portal then it is not returning any of these results. Is there any way I can pass the wildcards as prefix and suffix with this search string and get the desired results. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard with Double Quotes Query
Hi Eric, I searched but couldnt find anything related. I am still looking in some threads to find out if I can get somthing related. I would appreciate if you can please provide me some pointers. Thanks, Amit Garg Erick Erickson wrote: I don't remember the answer, but I'm sure this has been discussed many times on the mailing list. Have you tried searching that? You're essentially asking about wildcarded phrase queries Best Erick On Tue, May 5, 2009 at 9:52 AM, dabboo ag...@sapient.com wrote: Hi, I am searching for English Portal using double quotes and I am getting all the records which contains English Portal as together anywhere in any field. for e.g. records are appearing which have, English Portal, English Portal Sacromanto, Core English Portal etc. Problem is, if I am passing only nglish Portal then it is not returning any of these results. Is there any way I can pass the wildcards as prefix and suffix with this search string and get the desired results. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23388979.html Sent from the Solr - User mailing list archive at Nabble.com.
Multi-index Design
Hi All, I'm [still!] evaluating Solr and setting up a PoC. The requirements are to index the following objects: - people - name, status, date added, address, profile, other people specific fields like group... - organisations - name, status, date added, address, profile, other organisational specific fields like size... - products - name, status, date added, profile, other product specific fields like product groups.. AND...I need to isolate indexes to a number of dynamic domains (customerA, customerB...) that will grow over time. So, my initial thoughts are to do the following: - flatten the searchable objects as much as I can - use a type field to distinguish - into a single index - use multi-core approach to segregate domains of data So, a couple questions on this: 1) Is this approach/design sensible and do others use it? 2) By flattening the data we will only index common fields; is it unreasonable to do a second database search and union the results when doing advanced searches on non indexed fields? Do others do this? 3) I've read that I can dynamically add a new core - this fits well with the ability to dynamically add new domains; how scaliable is this approach? Would it be unreasonable to have 20-30 dynaimically created cores? I guess, redundancy aside and given our one core per domain approach, we could easily spill onto other physical servers without the need for replication? Thanks again for your help! rotis
Re: Wildcard with Double Quotes Query
I am using dismax request to achieve this. Though I am able to do wildcard search with dismax but I am not sure if I can do the wildcard with phrase. Please suggest. Amit Erick Erickson wrote: I don't remember the answer, but I'm sure this has been discussed many times on the mailing list. Have you tried searching that? You're essentially asking about wildcarded phrase queries Best Erick On Tue, May 5, 2009 at 9:52 AM, dabboo ag...@sapient.com wrote: Hi, I am searching for English Portal using double quotes and I am getting all the records which contains English Portal as together anywhere in any field. for e.g. records are appearing which have, English Portal, English Portal Sacromanto, Core English Portal etc. Problem is, if I am passing only nglish Portal then it is not returning any of these results. Is there any way I can pass the wildcards as prefix and suffix with this search string and get the desired results. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23389978.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-index Design
That is how we do it at Netflix. --wunder On 5/5/09 7:59 AM, Chris Masters roti...@yahoo.com wrote: 1) Is this approach/design sensible and do others use it?
Re: Multi-index Design
More precisely, we use a single core, flat schema, with a type field. wunder On 5/5/09 8:48 AM, Walter Underwood wunderw...@netflix.com wrote: That is how we do it at Netflix. --wunder On 5/5/09 7:59 AM, Chris Masters roti...@yahoo.com wrote: 1) Is this approach/design sensible and do others use it?
Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm
Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm http://www.meetup.com/NOVA-Lucene-Solr-Meetup/ Join us for an evening of presentations and discussion on Lucene/Solr, the Apache Open Source Search Engine/Platform, featuring: Erik Hatcher, Lucid Imagination, Apache Lucene/Solr PMC: Solr power your data: How to get up an running in 20 minutes or less Ryan McKinley: Apache Lucene/Solr PMC: Geo Search with Solr and Voyager Dan Chudnov, Library of Congress: The World Digital Library -- Solr searches across time and space Aaron McCurry, Near Infinity: Using Lucene as primary store for structured data store that horizontally scales to billions of records 4 presentations, followed by QA / panel discussion. We'll have some food and beverages. RSVP -- seats are limited -- at http://www.meetup.com/NOVA-Lucene-Solr-Meetup/ Hosted by: Near Infinity Sponsored by: Lucid Imagination Questions: ta...@lucidimagination.co
Master Slave data distribution | rsync fail issue
Hi, I am facing an issue while performing snapshot pulling thru Snappuller script from slave server : We have the setup of multicores on Master Solr and Slave Solr servers. Scenario , 2 cores are set : i) CORE_WWW.ABCD.COM ii) CORE_WWW.XYZ.COM rsync-enable and rsync-start script run from CORE_WWW.ABCD.COM on master server. Thus rsyncd.commf file got generated on CORE_WWW.ABCD.COM only , but not on CORE_WWW.XYZ.COM. Rsyncd.conf of CORE_WWW.ABCD.COM : rsyncd.conf file uid = webuser gid = webuser use chroot = no list = no pid file = /opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/logs/rsyncd.pid log file = /opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/logs/rsyncd.log [solr] path = /opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/data comment = Solr rsync error used to get generated while doing the pulling of master server snapshot of a particular core CORE_WWW.XYZ.COM from slave end, for core CORE_WWW.ABCD.COM snappuller occured without any error. Also, this issue is coming only when snapshot are generated at master end thru the way given below: A) Snapshot are generated automatically by editing “${SOLR_HOME}/solr/conf/solrconfig.xml” to let either commit index or optimize index trigger the snapshooter (search “postCommit” and “postOptimize” to find the configuration section). Sample of solrconfig.xml entry on Master server End: I) listener event=postCommit class=solr.RunExecutableListener str name=exe/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/snapshooter/str str name=dir/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.ABCD.COM/bin/str bool name=waittrue/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener same way done for core CORE_WWW.XYZ.COM solrConfig.xml. II) The dataDir tag remains commented on both the cores .XML on master server. Log sample for more clearity : rsyncd.log of the core CORE_WWW.XYZ.COM: 2009/05/01 15:48:40 command: ./rsyncd-start 2009/05/01 15:48:40 [15064] rsyncd version 2.6.3 starting, listening on port 18983 2009/05/01 15:48:40 rsyncd started with data_dir=/opt/apache-tomcat-6.0.18/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.XYZ.COm/data and accepting requests 2009/05/01 15:50:36 [15195] rsync on solr/snapshot.20090501153311/ from deltrialmac.mac1.com (10.210.7.191) 2009/05/01 15:50:36 [15195] rsync: link_stat snapshot.20090501153311/. (in solr) failed: No such file or directory (2) 2009/05/01 15:50:36 [15195] rsync error: some files could not be transferred (code 23) at main.c(442) 2009/05/01 15:52:23 [15301] rsync on solr/snapshot.20090501155030/ from delpearsondm.sapient.com (10.210.7.191) 2009/05/01 15:52:23 [15301] wrote 3438 bytes read 290 bytes total size 2779 2009/05/01 16:03:31 [15553] rsync on solr/snapshot.20090501160112/ from deltrialmac.mac1.com (10.210.7.191) 2009/05/01 16:03:31 [15553] rsync: link_stat snapshot.20090501160112/. (in solr) failed: No such file or directory (2) 2009/05/01 16:03:31 [15553] rsync error: some files could not be transferred (code 23) at main.c(442) 2009/05/01 16:04:27 [15674] rsync on solr/snapshot.20090501160054/ from deltrialmac.mac1.com (10.210.7.191) 2009/05/01 16:04:27 [15674] wrote 4173214 bytes read 290 bytes total size 4174633 I m unable to figure out that from where /. gets appeneded at the end snapshot.20090501153311/. Snappuller.log 2009/05/04 16:55:43 started by solrUser 2009/05/04 16:55:43 command: /opt/apache-solr-1.3.0/example/solr/multicore/CORE_WWW.PUFFINBOOKS.CA/bin/snappuller -u webuser 2009/05/04 16:55:52 pulling snapshot snapshot.20090504164935 2009/05/04 16:56:09 rsync failed 2009/05/04 16:56:24 failed (elapsed time: 41 sec) Error shown on console : rsync: link_stat snapshot.20090504164935/. (in solr) failed: No such file or directory (2) client: nothing to do: perhaps you need to specify some filenames or the --recursive option? rsync error: some files could not be transferred (code 23) at main.c(723) B) The same issue is not coming while manually running the Snapshot script after reguler interval of time at Master server and then running Snappuller script at slave end for multiple cores. The postCommit/postOptimize part of solrConfig.xml has been commented. Here also rsync script run thru the core CORE_WWW.ABCD.COM. Snappuller and snapinstaller occurred successfully. Thanks in advance. -- View this message in context: http://www.nabble.com/Master-Slave-data-distribution-%7C-rsync-fail-issue-tp23391580p23391580.html Sent from the Solr - User mailing list archive at Nabble.com.
OutOfMemory error
I am having frequent OutOfMemory error on our slaves server. SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 Exception in thread [ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 We are running weblogic and java version is 1.5. We set the heap size to 1.5GB? What's the recommendation for this issue? Thanks Francis
Upgrading from 1.2.0 to 1.3.0
What's the best way to upgrade solr from 1.2.0 to 1.3.0 ? We have the current index that our users search running on 1.2.0 Solr version. We would like to upgrade it to 1.3.0? We have Master/Slaves env. What's the best way to upgrade it without affecting the search? Do we need to do it on master or slaves first? Thanks Francis
MoreLikeThis sort
Hello, I am trying to sort MoreLikeThis results by a date field instead of relevance. Regular sort parameters don't seem to have any effect on the results and I can't find any mlt.sort or similar parameters in MoreLikeThis handler. My conclusion is that MoreLikeThis does not have a sort alternative to relevance, is that the correct conclusion. Thanks, Yogy
Re: OutOfMemory error
I'm guessing (and it's only a guess) that you have some field that's a datestamp and that you're sorting on it in your warmup queries??? If so, there are possibilities. It would help a lot if you'd tell us more about the structure of your index and what your autowarm queries look like, otherwise there's not much information here to go on Best Erick On Tue, May 5, 2009 at 1:00 PM, Francis Yakin fya...@liquid.com wrote: I am having frequent OutOfMemory error on our slaves server. SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 Exception in thread [ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 We are running weblogic and java version is 1.5. We set the heap size to 1.5GB? What's the recommendation for this issue? Thanks Francis
Re: OutOfMemory error
Hi Francis, How big are your caches? Please paste the relevant part of the config. Which of your fields do you sort by? Paste definitions of those fields from schema.xml, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin fya...@liquid.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, May 5, 2009 1:00:07 PM Subject: OutOfMemory error I am having frequent OutOfMemory error on our slaves server. SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 Exception in thread [ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 We are running weblogic and java version is 1.5. We set the heap size to 1.5GB? What's the recommendation for this issue? Thanks Francis
Re: Wildcard with Double Quotes Query
I don't think you can do wildcard with a phrase. A path for that is sitting in Lucene's JIRA. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: dabboo ag...@sapient.com To: solr-user@lucene.apache.org Sent: Tuesday, May 5, 2009 11:35:40 AM Subject: Re: Wildcard with Double Quotes Query I am using dismax request to achieve this. Though I am able to do wildcard search with dismax but I am not sure if I can do the wildcard with phrase. Please suggest. Amit Erick Erickson wrote: I don't remember the answer, but I'm sure this has been discussed many times on the mailing list. Have you tried searching that? You're essentially asking about wildcarded phrase queries Best Erick On Tue, May 5, 2009 at 9:52 AM, dabboo wrote: Hi, I am searching for English Portal using double quotes and I am getting all the records which contains English Portal as together anywhere in any field. for e.g. records are appearing which have, English Portal, English Portal Sacromanto, Core English Portal etc. Problem is, if I am passing only nglish Portal then it is not returning any of these results. Is there any way I can pass the wildcards as prefix and suffix with this search string and get the desired results. Please suggest. Thanks, Amit Garg -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23387746.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Wildcard-with-Double-Quotes-Query-tp23387746p23389978.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-index Design
Chris, 1) I'd put different types of data in different cores/instances, unless you relly need to search them all together. By using only common attributes you are kind of killing the richness of data and your ability to do something useful with it. 2) I'd triple-check the do a second database search and union the results when doing advanced searches on non indexed field part if you are dealing with non-trivial query rate. 3) Some people have thousands of Solr cores. Not sure on how many machines, but it's all a function of data size, hardware specs, query complexity and rate. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Chris Masters roti...@yahoo.com To: solr-user@lucene.apache.org Sent: Tuesday, May 5, 2009 10:59:40 AM Subject: Multi-index Design Hi All, I'm [still!] evaluating Solr and setting up a PoC. The requirements are to index the following objects: - people - name, status, date added, address, profile, other people specific fields like group... - organisations - name, status, date added, address, profile, other organisational specific fields like size... - products - name, status, date added, profile, other product specific fields like product groups.. AND...I need to isolate indexes to a number of dynamic domains (customerA, customerB...) that will grow over time. So, my initial thoughts are to do the following: - flatten the searchable objects as much as I can - use a type field to distinguish - into a single index - use multi-core approach to segregate domains of data So, a couple questions on this: 1) Is this approach/design sensible and do others use it? 2) By flattening the data we will only index common fields; is it unreasonable to do a second database search and union the results when doing advanced searches on non indexed fields? Do others do this? 3) I've read that I can dynamically add a new core - this fits well with the ability to dynamically add new domains; how scaliable is this approach? Would it be unreasonable to have 20-30 dynaimically created cores? I guess, redundancy aside and given our one core per domain approach, we could easily spill onto other physical servers without the need for replication? Thanks again for your help! rotis
Using UUID for unique key
Hi, I've a distributed Solr instances. I'm using Java's UUID (UUID.randomUUID()) to generate the unique id for my documents. Before adding unique key I was able to commit 50K records in 15sec (pretty constant over the growing index), after adding unique key it's taking over 35 sec for 50k and the time is increasing as the index size grows. Here is my schema setting for unique key, field name=id type=string indexed=true stored=true required=true omitNorms=true compressed=false/ Why is commit taking so long? Should I not be using UUID key for unique keys? What are other options - timestamp etc.? Thanks, -vivek
RE: OutOfMemory error
Here is cache in solrconfig.xml !-- Cache used by SolrIndexSearcher for filters (DocSets), unordered sets of *all* documents that match a query. When a new searcher is opened, its caches may be prepopulated or autowarmed using data from caches in the old searcher. autowarmCount is the number of items to prepopulate. For LRUCache, the autowarmed items will be the most recently accessed items. Parameters: class - the SolrCache implementation (currently only LRUCache) size - the maximum number of entries in the cache initialSize - the initial capacity (number of entries) of the cache. (seel java.util.HashMap) autowarmCount - the number of entries to prepopulate from and old cache. -- filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. -- queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ !-- If true, stored fields that are not requested will be loaded lazily. This can result in a significant speed improvement if the usual case is to not load all stored fields, especially if the skipped fields are large compressed text fields. -- enableLazyFieldLoadingtrue/enableLazyFieldLoading !-- Example of a generic cache. These caches may be accessed by name through SolrIndexSearcher.getCache(),cacheLookup(), and cacheInsert(). The purpose is to enable easy caching of user/application level data. The regenerator argument should be specified as an implementation of solr.search.CacheRegenerator if autowarming is desired. -- !-- cache name=myUserCache class=solr.LRUCache size=4096 initialSize=1024 autowarmCount=1024 regenerator=org.mycompany.mypackage.MyRegenerator / -- !-- An optimization that attempts to use a filter to satisfy a search. If the requested sort does not include score, then the filterCache will be checked for a filter matching the query. If found, the filter will be used as the source of document ids, and then the sort will be applied to that. useFilterForSortedQuerytrue/useFilterForSortedQuery -- !-- An optimization for use with the queryResultCache. When a search is requested, a superset of the requested number of document ids are collected. For example, if a search for a particular query requests matching documents 10 through 19, and queryWindowSize is 50, then documents 0 through 50 will be collected and cached. Any further requests in that range can be satisfied via the cache. -- queryResultWindowSize10/queryResultWindowSize !-- This entry enables an int hash representation for filters (DocSets) when the number of items in the set is less than maxSize. For smaller sets, this representation is more memory efficient, more efficient to iterate over, and faster to take intersections. -- HashDocSet maxSize=3000 loadFactor=0.75/ !-- boolToFilterOptimizer converts boolean clauses with zero boost into cached filters if the number of docs selected by the clause exceeds the threshold (represented as a fraction of the total index) -- boolTofilterOptimizer enabled=true cacheSize=32 threshold=.05/ !-- a newSearcher event is fired whenever a new searcher is being prepared and there is a current searcher handling requests (aka registered). -- !-- QuerySenderListener takes an array of NamedList and executes a local query request for each NamedList in sequence. -- !-- listener event=newSearcher class=solr.QuerySenderListener r...@solrslave06 conf]# cat solrconfig.xml | grep -i cache !-- Cache used by SolrIndexSearcher for filters (DocSets), When a new searcher is opened, its caches may be prepopulated or autowarmed using data from caches in the old searcher. autowarmCount is the number of items to prepopulate. For LRUCache, class - the SolrCache implementation (currently only LRUCache) size - the maximum number of entries in the cache the cache. (seel java.util.HashMap) and old cache. filterCache class=solr.LRUCache !-- queryResultCache caches results of searches - ordered lists of queryResultCache
Re: OutOfMemory error
Hi, Timestamp is your most likely source of the problem. Round that as much as you can or use tdate field type (you'll need to grab the nightly build). How many documents are in this index - 1.5GB is a relatively large heap. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin fya...@liquid.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tuesday, May 5, 2009 1:50:07 PM Subject: RE: OutOfMemory error Here is cache in solrconfig.xml class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ class=solr.LRUCache size=512 initialSize=512 autowarmCount=256/ class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ true 10 class=solr.LRUCache class=solr.LRUCache If the requested sort does not include score, then the filterCache into cached filters if the number of docs selected by the clause exceeds 1-2 for read-only slaves, higher for masters w/o cache warming. -- every xsltCacheLifetimeSeconds. 5 And here is in schema.xml Sort artist name used by mp3 store to sort artist title for search -- omitNorms=true/ multiValued=true/ stored=true/ stored=false/ stored=false/ stored=false/ default=NOW multiValued=false/ omitNorms=true/ omitNorms=true/ !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric ordering, so that range queries work correctly. -- omitNorms=true/ sortMissingLast=true omitNorms=true/ sortMissingLast=true omitNorms=true/ sortMissingLast=true omitNorms=true/ ignoreCase=true expand=true/ words=stopwords.txt/ generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ protected=protwords.txt/ ignoreCase=true expand=true/ words=stopwords.txt/ generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ protected=protwords.txt/ -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, May 05, 2009 10:32 AM To: solr-user@lucene.apache.org Subject: Re: OutOfMemory error Hi Francis, How big are your caches? Please paste the relevant part of the config. Which of your fields do you sort by? Paste definitions of those fields from schema.xml, too. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Francis Yakin To: solr-user@lucene.apache.org Sent: Tuesday, May 5, 2009 1:00:07 PM Subject: OutOfMemory error I am having frequent OutOfMemory error on our slaves server. SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@aca6b9cb:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34279632, Num elements: 8569904 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@f9947c35:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 SEVERE: Error during auto-warming of key:org.apache.solr.search.queryresult...@d938cfa3:java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 34431488, Num elements: 8607868 Exception in thread [ACTIVE] ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '8' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [STANDBY] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 Exception in thread [ACTIVE] ExecuteThread: '13' for queue: 'weblogic.kernel.Default (self-tuning)' java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object size: 8208, Num elements: 8192 We are running weblogic and java version is 1.5. We set the heap size to 1.5GB? What's the
Re: Using UUID for unique key
You really had nothing in uniqueKey element in schema.xml at first? I'm not looking at Solr code right now, but it could be the lack of the cost of that lookup that made things faster. Now you have a lookup + generation + more data to pass through analyzer + write out, though I can't imagine how that would make things 2x slower. You didn't say whether you cleared the old index after adding UUID key did you do that? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, May 5, 2009 1:49:21 PM Subject: Using UUID for unique key Hi, I've a distributed Solr instances. I'm using Java's UUID (UUID.randomUUID()) to generate the unique id for my documents. Before adding unique key I was able to commit 50K records in 15sec (pretty constant over the growing index), after adding unique key it's taking over 35 sec for 50k and the time is increasing as the index size grows. Here is my schema setting for unique key, required=true omitNorms=true compressed=false/ Why is commit taking so long? Should I not be using UUID key for unique keys? What are other options - timestamp etc.? Thanks, -vivek
Re: Multi-index Design
Chris Masters schrieb: - flatten the searchable objects as much as I can - use a type field to distinguish - into a single index - use multi-core approach to segregate domains of data Some newbie questions: (1) What is a type field? Is it to designate different types of documents, e.g. product descriptions and forum postings? (2) Would I include such a type field in the data I send to the update facility and maybe configure Solr to take special action depending on the value of the update field? (3) Like, write the processing results to a domain dedicated to that type of data that I could limit my search to, as per Otis' post? (4) And is that what's called a core here? (5) Or, failing (3), and lumping everything together in one search domain (core?), would I use that type field to limit my search to a particular type of data? Michael Ludwig
Re: Using UUID for unique key
On Tue, May 5, 2009 at 1:49 PM, vivek sar vivex...@gmail.com wrote: I've a distributed Solr instances. I'm using Java's UUID (UUID.randomUUID()) to generate the unique id for my documents. Before adding unique key I was able to commit 50K records in 15sec (pretty constant over the growing index), after adding unique key it's taking over 35 sec for 50k and the time is increasing as the index size grows. Using unique keys will be slower than not using them... it's extra work that Lucene needs to do - internally it needs to do searches on the ids to delete any previous versions. -Yonik http://www.lucidimagination.com
Re: Multi-index Design
1 - A field that is called type which is probably a string field that you index values such as people, organization, product. 2 - Yes, for each document you are indexing, you will include it's type, ie. person 3, 4, 5 - You would have a core for each domain. Each domain will then have it's own index that contains documents of all types. See http://wiki.apache.org/solr/MultipleIndexes . Thanks, Matt Weber On May 5, 2009, at 11:14 AM, Michael Ludwig wrote: Chris Masters schrieb: - flatten the searchable objects as much as I can - use a type field to distinguish - into a single index - use multi-core approach to segregate domains of data Some newbie questions: (1) What is a type field? Is it to designate different types of documents, e.g. product descriptions and forum postings? (2) Would I include such a type field in the data I send to the update facility and maybe configure Solr to take special action depending on the value of the update field? (3) Like, write the processing results to a domain dedicated to that type of data that I could limit my search to, as per Otis' post? (4) And is that what's called a core here? (5) Or, failing (3), and lumping everything together in one search domain (core?), would I use that type field to limit my search to a particular type of data? Michael Ludwig
Re: Using UUID for unique key
I did clean up the indexes and re-started the index process from scratch (new index file). As another test if I use simple numeric counter for unique id the index speed is fast (within 20 sec for commit 50k records). I'm thinking UUID might not be the way to go for unique id - I'll look into using sequence# instead. Thanks, -vivek On Tue, May 5, 2009 at 11:03 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: You really had nothing in uniqueKey element in schema.xml at first? I'm not looking at Solr code right now, but it could be the lack of the cost of that lookup that made things faster. Now you have a lookup + generation + more data to pass through analyzer + write out, though I can't imagine how that would make things 2x slower. You didn't say whether you cleared the old index after adding UUID key did you do that? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, May 5, 2009 1:49:21 PM Subject: Using UUID for unique key Hi, I've a distributed Solr instances. I'm using Java's UUID (UUID.randomUUID()) to generate the unique id for my documents. Before adding unique key I was able to commit 50K records in 15sec (pretty constant over the growing index), after adding unique key it's taking over 35 sec for 50k and the time is increasing as the index size grows. Here is my schema setting for unique key, required=true omitNorms=true compressed=false/ Why is commit taking so long? Should I not be using UUID key for unique keys? What are other options - timestamp etc.? Thanks, -vivek
Jira issue Solr-948
Hi all, I was wondering if anyone had used the new helper methods in SolrPluginutils added as part of Solr-948https://issues.apache.org/jira/browse/SOLR-948. I tried the same implementation with Solr 1.3 and everything works correctly, but for one issue. In the response XML, the numFound is always 0 even though the result shows the docs. I fixed this by setting the numFound on the solrDocumentList. Let me know if anyone has come across this issue. Thanks, Kalyan Manepalli
RE: Multi-index Design
That's how we do it in Orbitz. We use type field to separate content, review and promotional information in one single index. And then we use the last-components to plugin these data together. Only thing that we haven't yet tested is the scalability of this model, since our data is small. Thanks, Kalyan Manepalli -Original Message- From: Chris Masters [mailto:roti...@yahoo.com] Sent: Tuesday, May 05, 2009 10:00 AM To: solr-user@lucene.apache.org Subject: Multi-index Design Hi All, I'm [still!] evaluating Solr and setting up a PoC. The requirements are to index the following objects: - people - name, status, date added, address, profile, other people specific fields like group... - organisations - name, status, date added, address, profile, other organisational specific fields like size... - products - name, status, date added, profile, other product specific fields like product groups.. AND...I need to isolate indexes to a number of dynamic domains (customerA, customerB...) that will grow over time. So, my initial thoughts are to do the following: - flatten the searchable objects as much as I can - use a type field to distinguish - into a single index - use multi-core approach to segregate domains of data So, a couple questions on this: 1) Is this approach/design sensible and do others use it? 2) By flattening the data we will only index common fields; is it unreasonable to do a second database search and union the results when doing advanced searches on non indexed fields? Do others do this? 3) I've read that I can dynamically add a new core - this fits well with the ability to dynamically add new domains; how scaliable is this approach? Would it be unreasonable to have 20-30 dynaimically created cores? I guess, redundancy aside and given our one core per domain approach, we could easily spill onto other physical servers without the need for replication? Thanks again for your help! rotis
Re: Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm
Dear Erik, It would be great if you can upload the presentation online. It would help all of us. And if possible video too. Warm Regards, Allahbaksh On Tue, May 5, 2009 at 11:40 PM, Lukáš Vlček lukas.vl...@gmail.com wrote: Hello,any plans to upload these presentations on the web (or even better release video recordings)? Lukas On Tue, May 5, 2009 at 6:49 PM, Erik Hatcher e...@ehatchersolutions.com wrote: Lucene/Solr Meetup / May 20th, Reston VA, 6-8:30 pm http://www.meetup.com/NOVA-Lucene-Solr-Meetup/ Join us for an evening of presentations and discussion on Lucene/Solr, the Apache Open Source Search Engine/Platform, featuring: Erik Hatcher, Lucid Imagination, Apache Lucene/Solr PMC: Solr power your data: How to get up an running in 20 minutes or less Ryan McKinley: Apache Lucene/Solr PMC: Geo Search with Solr and Voyager Dan Chudnov, Library of Congress: The World Digital Library -- Solr searches across time and space Aaron McCurry, Near Infinity: Using Lucene as primary store for structured data store that horizontally scales to billions of records 4 presentations, followed by QA / panel discussion. We'll have some food and beverages. RSVP -- seats are limited -- at http://www.meetup.com/NOVA-Lucene-Solr-Meetup/ Hosted by: Near Infinity Sponsored by: Lucid Imagination Questions: ta...@lucidimagination.co -- http://blog.lukas-vlcek.com/ -- Allahbaksh Mohammedali Asadullah, Software Engineering Technology Labs, Infosys Technolgies Limited, Electronic City, Hosur Road, Bangalore 560 100, India. (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927. Fax: 91-80-28520362 | Mobile: 91-9845505322.