Re: Solr 4.2.1 + Distribution scripts (rsync) Issue
Hi Hoss, Thanks for your reply, Please find answers to your questions below. *Well, for starters -- have you considered at least looking into using the java based Replicationhandler instead of the rsync scripts?* - There was an attempt to to implement java based replication but it was very slow and so that option was discarded and instead rsync was used. This was done couple of years ago and till Feb of this year, we were using Solr 1.4. I upgraded solr to 4.0 with rsync, however due to time and resource constraint rsync alternative was not evaluated and it can't be done even today - only in next release, we'll be doing solrcloud. My setup looks like below - this was working correctly with Solr 1.4, Solr 4.0 versions. 1) Index Feeder applications feeds indexes to indexer boxes. 2) A cron job that runs every minute on indexer boxes (commiter), commits the indexes (commit) and invokes snapshooter to create snapshot. rsync daemon running on indexer boxes. 3) Another cron job runs on search boxes every minute, which pulls the snapshot (using snappuller), installs it on search boxes (snapinstaller) which also notifies search to open a new searcher (commit) Additionally, there is a cron job that runs every morning at 4 am on indexer boxes which optimises the index (optimize) and cleans the snapshots until a day (snapcleaner). This is as per http://wiki.apache.org/solr/SolrCollectionDistributionScripts *Which config is this, your indexer or your searcher? (i'm assuming it's the searcher since i don't see any postCommit commands to exec snapshooter but i wanted to sanity check that wasn't a simple explanation for your problem)* - Because of this set up, I do not have any post commit setup in solrconfig.xml. - This solrconfig.xml is used for both indexer and searcher boxes. I can see that after my upgrade to Solr 4.2.1, all these scripts behave normally just that I do not see the updates getting refreshed on search boxes unless I restart. * * *What exactly does your manual commit command look like? * - This is by using commit script under bin directory (commit -h localhost -p 8983) - I have also tried URL based commit as you had mentioned but no luck *Are you doing this on the indexer box or the searcher boxes? * - I executed manual commit on searcher boxes, the indexer boxes do show the commit and updates correctly. *what is the HTTP response from this comment? what do the logs show when you do this? * - I have attached the logs, please note that I have enabled the openSearcher for testing. Thanks, please let me know if I'm missing something. I remembered people not getting their deletes and the workaround was to add _version_ field in schema, which I had done but no luck. I know it might be unrelated but I am just trying all my options. Thanks again, Sandeep On 5 June 2013 00:41, Chris Hostetter hossman_luc...@fucit.org wrote: : However, we haven't yet implemented SolrCloud and still relying on : distribution scripts - rsync, indexpuller mechanism. Well, for starters -- have you considered at least looking into using hte java based Replicationhandler instead of the rsync scripts? Script based replication has not been actively maintained since java replication was added back in Solr 1.4! : I see that the indexes are getting created on indexer boxes, snapshots : being created and then pulled across to search boxes. The snapshots are : getting installed on search boxes as well. There are no errors in the : scripts logs and this process works well. : However, when I check the update in solr console (on search boxes), I do : not see the updated result. The updates do not appear in search boxes even : after manual commit. Only after a *restart* of the search application : (deployed in tomcat) I can see the updated results. What exactly does your manual commit command look like? Are you doing this on the indexer box or the searcher boxes? what is the HTTP response from this comment? what do the logs show when you do this? It's possible that some internal changes in Solr relating to NRT improvements may have optimized away re-opening on commit if solr doesn't think the index has changed -- but i doubt it. because I just tried a simple test using the 4.3.0 example where i manually simulated snapinstaller replacing hte index files with a newer index and issued http://localhost:8983/solr/update?commit=true; and solr loaded up that new index and started searching it -- so i suspect the devil is in the details of your setup. you're sure each of the snapshooter, snappuller, snapinstaller scripts are executing properly? : I have done minimal changes for the upgrade in solrconfig.xml and is pasted : below. Please can someone take a look and let me know what the issue is. : The same config was working fine on Solr 4.0 (as well as Solr 1.4.1). which config is this, your indexer or your searcher? (i'm assuming it's the searcher since i don't see any postCommit commands
Re: Setting up Solr
On 6/4/2013 11:48 PM, Aaron Greenspan wrote: I thought I'd document my process of getting set up with Solr 4.3.0 on a Linux server in case it's of use to anyone. I'm a moderately experienced Linux system administrator, so without passing judgment (at least for now), let me just say that I found getting Solr to work to be extremely difficult--more difficult than just about any other package I've ever dealt with, including ones I've built from source. Thank you for your feedback. Solr has always had a high learning curve, and you've pointed out a lot of places we can improve things. We have a number of Jira issues that specifically deal with something called Developer Curb Appeal. I think it's pretty clear that we need to tackle a bunch of things we could call Newcomer Curb Appeal. I can work on filing some issues, some of which will address code, some of which will address the docs included with Solr and the wiki pages referenced there. I realize that the software is at version 4.3, but the UI isn't - it is brand new. The old UI in 3.x and earlier versions was a place to go for information, but you couldn't actually DO anything that would make changes. Historically, this is the reason for the admin UI - making test queries, watching statistics, and gathering information. The ability to make changes is very recent. The UI you've seen first appeared in 4.0.0, released last October. It was a complete rewrite in an entirely new language. The old one was JSP, the new one is javascript. On requiring a username/password: Solr doesn't include any security mechanisms. We leave that to other software written by people who do security really well. It can be handled by the servlet container, or a proxy. Solr should not be directly reachable by users. The intended usage is to have your website process user-entered text to turn it into a query and make sure it's clean before sending it to your Solr server(s), which should be only reachable from behind the firewall. Even if you use the servlet container's security features to really lock things down - block access to the admin UI, the update handler, and anything else that might get you into trouble, if someone can get directly to the query interface, it's relatively easy to send denial of service queries. Most attempts to detect and block DoS would also block legitimate queries that just happen to be slow. Solr is a java servlet. Servlet containers have historically used XML config files, so as a natural consequence, Solr uses XML config files. XML does allow for very precise and multi-layered configurations, but it can be very confusing. Version 4.4 will take the first tentative steps towards moving away from XML. The central config is still XML, but the individual cores won't be: http://wiki.apache.org/solr/Core%20Discovery%20%284.4%20and%20beyond%29 http://wiki.apache.org/solr/Solr.xml%204.4%20and%20beyond There is only one specific problem that I will attempt to address in this reply. At this point, any advice I might give is probably too little too late. If I'm wrong and you do want some additional specific help, let me know. When you duplicate collection1 to make a new core, it is enough to simply duplicate the main directory and the conf subdirectory. I am aware as I write this that there is probably no documentation that states this clearly. Thanks, Shawn
Re: Two instances of solr - the same datadir?
Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication master Googling turned out only this http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/71912 - no pointers there. But If I am approaching the problem wrongly, please don't hesitate to 're-educate' me :) Thanks! roman
Indexing Heavy dataset
Hi, I am trying to index a heavy dataset with 1 particular field really too heavy... However, As I start, I get Memory warning and rollback (OutOfMemoryError). So, I have learned that we can use -Xmx1024m option with java command to start the solr and allocate more memory to the heap. My question is, that since this could also become insufficient later, so it the issue related to cacheing? here is my cache block in solrconfig: filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ I am thinking like maybe I need to turn of the cache for documentClass. Anyone got a better idea? Or perhaps there is another issue here? Just to let you know, until I added that very heavy db field for indexing, everything was just fine... -- Regards, Raheel Hasan
Heap space problem with mlt query
Hi , I am having solr index of 80GB with 1 million documents .Each document of aprx. 500KB . I have a machine with 16GB ram. I am running mlt query on 3-5 fields of theses document . I am getting solr out of memory problem . Exception in thread main java.lang.OutOfMemoryError: Java heap space My Solr config is : ramBufferSizeMB128/ramBufferSizeMB maxMergeDocs100/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout I also checked with ramBuffer size of 256MB. Please provide me suggestion regarding this. Thanks Varsha -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Heap space problem with mlt query
and I just asked a similar question just 1 sec ago On Wed, Jun 5, 2013 at 2:07 PM, Varsha Rani varsha.ya...@orkash.com wrote: Hi , I am having solr index of 80GB with 1 million documents .Each document of aprx. 500KB . I have a machine with 16GB ram. I am running mlt query on 3-5 fields of theses document . I am getting solr out of memory problem . Exception in thread main java.lang.OutOfMemoryError: Java heap space My Solr config is : ramBufferSizeMB128/ramBufferSizeMB maxMergeDocs100/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout I also checked with ramBuffer size of 256MB. Please provide me suggestion regarding this. Thanks Varsha -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Raheel Hasan
Re: Heap space problem with mlt query
Varsha, Unless I'm mistaken, the ramBufferSizeMB param is used to do buffering of document before write them to disk. Can you post the cache config that you have in the solrconfig.xml, what version are you using? -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, June 5, 2013 at 10:09 AM, Raheel Hasan wrote: and I just asked a similar question just 1 sec ago On Wed, Jun 5, 2013 at 2:07 PM, Varsha Rani varsha.ya...@orkash.com (mailto:varsha.ya...@orkash.com) wrote: Hi , I am having solr index of 80GB with 1 million documents .Each document of aprx. 500KB . I have a machine with 16GB ram. I am running mlt query on 3-5 fields of theses document . I am getting solr out of memory problem . Exception in thread main java.lang.OutOfMemoryError: Java heap space My Solr config is : ramBufferSizeMB128/ramBufferSizeMB maxMergeDocs100/maxMergeDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout I also checked with ramBuffer size of 256MB. Please provide me suggestion regarding this. Thanks Varsha -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com). -- Regards, Raheel Hasan
Re: Heap space problem with mlt query
Hi yriveiro, I am using Solr version3.6. My cache config is below : filterCache class=solr.FastLRUCache size=131072 initialSize=4096 autowarmCount=2048 cleanupThread=true/ queryResultCache class=solr.FastLRUCache size=131072 initialSize=4096 autowarmCount=2048 cleanupThread=true/ documentCache class=solr.FastLRUCache size=131072 initialSize=4096 autowarmCount=2048 cleanupThread=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068282.html Sent from the Solr - User mailing list archive at Nabble.com.
different Solr Logging for CONSOLE and FILE
Hi, I have a small question about solr logging. In resourceslog4j.properties, we have *log4j.rootLogger=INFO, file, CONSOLE* However, what I want is: *log4j.rootLogger=INFO, file * and *log4j.rootLogger=WARN, CONSOLE* (both simultaneously). Is it possible? -- Regards, Raheel Hasan
Re: Heap space problem with mlt query
Varsha, How is the size of your jvm heap? Other question is the document cache. The documentCache does cache of document objects fetched from the disk (http://wiki.apache.org/solr/SolrCaching#documentCache), if each document has 500KB aprx. and you configure a cache of 131072 size, you are caching 131072 * (size document object), this can be a lot of ram …, Try decrease the documentCache size. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, June 5, 2013 at 10:28 AM, Varsha Rani wrote: Hi yriveiro, I am using Solr version3.6. My cache config is below : filterCache class=solr.FastLRUCache size=131072 initialSize=4096 autowarmCount=2048 cleanupThread=true/ queryResultCache class=solr.FastLRUCache size=131072 initialSize=4096 autowarmCount=2048 cleanupThread=true/ documentCache class=solr.FastLRUCache size=131072 initialSize=4096 autowarmCount=2048 cleanupThread=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068282.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: different Solr Logging for CONSOLE and FILE
Am 05.06.2013 11:28, schrieb Raheel Hasan: Hi, I have a small question about solr logging. In resourceslog4j.properties, we have *log4j.rootLogger=INFO, file, CONSOLE* However, what I want is: *log4j.rootLogger=INFO, file * and *log4j.rootLogger=WARN, CONSOLE* (both simultaneously). Is it possible? You can use: log4j.rootLogger=INFO, file, CONSOLE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=WARN
Re: different Solr Logging for CONSOLE and FILE
OK thanks... it works... :D Also I found that we could put both of them and it will also work: log4j.rootLogger=INFO, file log4j.rootLogger=WARN, CONSOLE On Wed, Jun 5, 2013 at 2:42 PM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Am 05.06.2013 11:28, schrieb Raheel Hasan: Hi, I have a small question about solr logging. In resourceslog4j.properties, we have *log4j.rootLogger=INFO, file, CONSOLE* However, what I want is: *log4j.rootLogger=INFO, file * and *log4j.rootLogger=WARN, CONSOLE* (both simultaneously). Is it possible? You can use: log4j.rootLogger=INFO, file, CONSOLE log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=WARN -- Regards, Raheel Hasan
Files included from the default SolrConfig
Hi, I am trying to optimize solr. The default solrConfig that comes with solrcollection1 has a lot of libs included I dont really need. Perhaps if someone could help we identifying the purpose. (I only import from DIH): Please tell me whats in these: contrib/extraction/lib solr-cell- contrib/clustering/lib solr-clustering- contrib/langid/lib/ solr-langid -- Regards, Raheel Hasan
Sole instance state is down in cloud mode
Hi, When i start a core in solr-cloud im getting below message in log I have setup zookeeper separately and uploaded the config files. When i start the solr instance in cloud mode, state is down. INFO: Update state numShards=null message={ operation:state, numShards:null, shard:shard1, roles:null, *state:down,* core:core1, collection:core1, node_name:x:9980_solr, base_url:http://x:9980/solr} Jun 5, 2013 6:10:48 AM org.apache.solr.common.cloud.ZkStateReader$2 process INFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 1) When i hit the url , i am getting left pane of the solr admin and righ side its keep on loading, any help ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/Sole-instance-state-is-down-in-cloud-mode-tp4068298.html Sent from the Solr - User mailing list archive at Nabble.com.
data-import problem
Hello Solr-Friends, I have a problem with my current solr configuration. I want to import two tables into solr. I got it to work for the first table, but the second table doesn't get imported (no errormessage, 0 rows skipped). I have two tables called name and title and i want to load their fields called id, name and id title (two id colums that have nothing to do with each other) This is in my data-config.xml: document entity name=name query=SELECT id, name FROM name/entity /document document entity name=title query=SELECT id AS titleid, title FROM name/entity /document and this is in my schema.xml: field name=id type=string indexed=true stored=true / field name=id type=string indexed=true stored=true / field name=id type=string indexed=true stored=true /
data-import problem
Hello Solr-Friends, I have a problem with my current solr configuration. I want to import two tables into solr. I got it to work for the first table, but the second table doesn't get imported (no errormessage, 0 rows skipped). I have two tables called name and title and i want to load their fields called id, name and id title (two id colums that have nothing to do with each other) This is in my data-config.xml: document entity name=name query=SELECT id, name FROM name/entity /document document entity name=title query=SELECT id AS titleid, title FROM name/entity /document and this is in my schema.xml: field name=id type=string indexed=true stored=true / field name=name type=text_general indexed=true stored=true / field name=titleid type=string indexed=true stored=true / field name=title type=text_general indexed=true stored=true / dynamicField name=* type=ignored multiValued=true / /fields uniqueKeyid/uniqueKey /schema I chose that unique key only because solr asked for it. In my SolrAdmin Scheme-Browser I can see three fields id, name and title, but titleid is missing and title itself is empy with no entries. I don't know how to get it work to index two seperate lists. I hope someone can help, thank you!
Re: Query Elevation Component
davers wrote I want to elevate certain documents differently depending a a certain fq parameter in the request. I've read of somebody coding solr to do this but no code was shared. Where would I start looking to implement this feature myself? Davers, I am also looking into this feature. Care to tell where did you see this discussed? I could not find anything. Also, did you manage to implement this somehow? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856p4068308.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Heap space problem with mlt query
Hi yriveiro, When i was using document cache size= 131072, i got exception in 5000-6000 mlt queries. But once i done document cache size=16384, i got same problem in 1500-2000 mlt queries. -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068313.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Files included from the default SolrConfig
1. SolrCell (ExtractingRequestHandler) - extract and index content from rich documents, such as PDF, Office docs, HTML (uses Tika) 2. Clustering - for result clustering. 3. Language identification (two update processors) - analyzes text of fields to determine language code. None of those is mandatory, which is why they have separate libs. -- Jack Krupansky -Original Message- From: Raheel Hasan Sent: Wednesday, June 05, 2013 5:57 AM To: solr-user@lucene.apache.org Subject: Files included from the default SolrConfig Hi, I am trying to optimize solr. The default solrConfig that comes with solrcollection1 has a lot of libs included I dont really need. Perhaps if someone could help we identifying the purpose. (I only import from DIH): Please tell me whats in these: contrib/extraction/lib solr-cell- contrib/clustering/lib solr-clustering- contrib/langid/lib/ solr-langid -- Regards, Raheel Hasan
Receiving unexpected Faceting results.
Consider the following Solr query: select?q=*:*fq=tags:dotan-*facet=truefacet.field=tagsrows=0 The 'tags' field is a multivalue field. I would expect the previous query to return only tags that begin with the string 'dotan-' such as: dotan-home dotan-work ...but not strings which do not begin with (or even contain) the string in question. However, I am getting these results: lst name=discoapi_tags int name=dotan-home14/int int name=dotan-work13/int int name=beer0/int int name=beatles0/int /lst It _may_ be that the 'beer' and 'beatles' tags were once attached to the same documents as are attached the 'dotan-home' and/or 'dotan-work'. I've done a bit of experimenting on this Solr install, so I cannot be sure. However, considering that they are in fact 0 results for those two, I would not expect them to show up at all, even if they ever were attached to (i.e. once a value in the multiValue field) any of the results that match the filter query. So, the questions are: 1) How can I check if ever the multiValue fields for a particular document (given its uniqueKey id) ever contains a specific value. Alternatively, how can I see all the values that the document ever had for the field. I don't expect this to actually be possible, but I ask if it is, i.e. by examining certain aspects of the Solr index with a text editor. 2) If those spurious results are appearing does that mean necessarily that those values for the multivalued field were in fact once in the multivalued field for documents matching the filter query? Thus, the answer to the previous question would be to simply run a query for the id of the document in question, and facet on the multivalued field with a large limit. 3) How to have Solr return only those faceting values for the field that in fact begin with 'dotan-', even if a document has other tags such as 'beatles'? 4) How to have Solr return only those faceting values which are larger than 0? Thank you! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Receiving unexpected Faceting results.
3) Use the parameter facet.prefix, e.g, facet.prefix=dotan-. Note: this particular case will not work if the field you're facetting on is tokenised (with - being used as a taken separator). 4) Use the parameter facet.mincount - looks like you want to set it to 1, instead of the default which is 0.
Re: Receiving unexpected Faceting results.
Hi Dotan, I think all you need to do is add: facet.mincount=1 i.e. select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags rows=0facet.mincount=1 Note that you can do it per field as well: select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags rows=0f.tags.facet.mincount=1 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount On Wed, Jun 5, 2013 at 8:27 AM, Dotan Cohen dotanco...@gmail.com wrote: Consider the following Solr query: select?q=*:*fq=tags:dotan-*facet=truefacet.field=tagsrows=0 The 'tags' field is a multivalue field. I would expect the previous query to return only tags that begin with the string 'dotan-' such as: dotan-home dotan-work ...but not strings which do not begin with (or even contain) the string in question. However, I am getting these results: lst name=discoapi_tags int name=dotan-home14/int int name=dotan-work13/int int name=beer0/int int name=beatles0/int /lst It _may_ be that the 'beer' and 'beatles' tags were once attached to the same documents as are attached the 'dotan-home' and/or 'dotan-work'. I've done a bit of experimenting on this Solr install, so I cannot be sure. However, considering that they are in fact 0 results for those two, I would not expect them to show up at all, even if they ever were attached to (i.e. once a value in the multiValue field) any of the results that match the filter query. So, the questions are: 1) How can I check if ever the multiValue fields for a particular document (given its uniqueKey id) ever contains a specific value. Alternatively, how can I see all the values that the document ever had for the field. I don't expect this to actually be possible, but I ask if it is, i.e. by examining certain aspects of the Solr index with a text editor. 2) If those spurious results are appearing does that mean necessarily that those values for the multivalued field were in fact once in the multivalued field for documents matching the filter query? Thus, the answer to the previous question would be to simply run a query for the id of the document in question, and facet on the multivalued field with a large limit. 3) How to have Solr return only those faceting values for the field that in fact begin with 'dotan-', even if a document has other tags such as 'beatles'? 4) How to have Solr return only those faceting values which are larger than 0? Thank you! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com -- Brendan Grainger www.kuripai.com
Search for misspelled words in corpus
Hi, I have a problem where our text corpus on which we need to do search contains many misspelled words. Same word could also be misspelled in several different ways. It could also have documents that have correct spellings However, the search term that we give in query would always be correct spelling. Now when we search on a term, we would like to get all the documents that contain both correct and misspelled forms of the search term. We tried fuzzy search, but it doesn't work as per our expectations. It returns any close match, not specifically misspelled words. For example, if I'm searching for a word like fight, I would like to return the documents that have words like figth and feight, not documents with words like sight and light. Is there any suggested approach for doing this? regards, Kamesh
Re: Receiving unexpected Faceting results.
On Wed, Jun 5, 2013 at 3:38 PM, Raymond Wiker rwi...@gmail.com wrote: 3) Use the parameter facet.prefix, e.g, facet.prefix=dotan-. Note: this particular case will not work if the field you're facetting on is tokenised (with - being used as a taken separator). 4) Use the parameter facet.mincount - looks like you want to set it to 1, instead of the default which is 0. Perfect, thank you Raymond! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Heap space problem with mlt query
Did you try reducing filter and query cache. They are fairly large too unless you really need them to be cached for your use cache. Do you have that many distinct filter queries hitting solr for the size you have defined for filterCache? Are you doing any sorting? as this will chew up a lot of memory because of lucene's internal field cache -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068326.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Receiving unexpected Faceting results.
On Wed, Jun 5, 2013 at 3:41 PM, Brendan Grainger brendan.grain...@gmail.com wrote: Hi Dotan, I think all you need to do is add: facet.mincount=1 i.e. select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags rows=0facet.mincount=1 Note that you can do it per field as well: select?q=*:*fq=tags:dotan-*facet=truefacet.field=tags rows=0f.tags.facet.mincount=1 http://wiki.apache.org/solr/SimpleFacetParameters#facet.mincount Thanks, Brendan. I will review the available Facet Parameters, which I really should have thought to do before posting as it is already bookmarked!
Re: different Solr Logging for CONSOLE and FILE
On 6/5/2013 3:46 AM, Raheel Hasan wrote: OK thanks... it works... :D Also I found that we could put both of them and it will also work: log4j.rootLogger=INFO, file log4j.rootLogger=WARN, CONSOLE If this completely separates INFO from WARN and ERROR, then you would want to rethink and probably use what Bernd suggested. I don't know if this is what happens. It's easier to understand a logfile if you can see errors, warnings, and informational messages together in context. If the more severe messages are only logged to CONSOLE, then you lose them. Even if you then redirect the console to a file outside of Solr, you would need to try and piece the full log together based on timestamps from two files, and sometimes things happen too fast for that, even if you're logging with millisecond accuracy. Thanks, Shawn
Re: Indexing Heavy dataset
On 6/5/2013 3:08 AM, Raheel Hasan wrote: Hi, I am trying to index a heavy dataset with 1 particular field really too heavy... However, As I start, I get Memory warning and rollback (OutOfMemoryError). So, I have learned that we can use -Xmx1024m option with java command to start the solr and allocate more memory to the heap. My question is, that since this could also become insufficient later, so it the issue related to cacheing? here is my cache block in solrconfig: filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ I am thinking like maybe I need to turn of the cache for documentClass. Anyone got a better idea? Or perhaps there is another issue here? Exactly how big is this field? Do you need this giant field returned with your results, or is it just there for searching? Caches of size 512, especially with autowarm disabled, are probably not a major cause for concern, unless the big field is big enough so that 512 of them is really really huge. If that's the case, I would reduce the size of your documentCache, not turn it off. The value of ramBufferSizeMB elsewhere in your config is more likely to affect how much RAM gets used during indexing. The default for this field as of Solr 4.1.0 is 100. Most people can reduce this value. I'm writing a reply to another thread where you are participating, with info that will likely be useful for you too. Look for that. Thanks, Shawn
Re: Heap space problem with mlt query
On 6/5/2013 3:07 AM, Varsha Rani wrote: Hi , I am having solr index of 80GB with 1 million documents .Each document of aprx. 500KB . I have a machine with 16GB ram. I am running mlt query on 3-5 fields of theses document . I am getting solr out of memory problem . This wiki page has relevant info for your situation. As you are reading it, it might not seem relevant, but I'll try to point things out. http://wiki.apache.org/solr/SolrPerformanceProblems The memory that is getting exhausted here is heap memory. You probably need a larger java heap. The settings that your other replies have talked about do affect how much heap gets used, but they do not increase it. That is a java commandline option that must be applied to the command that starts the servlet container which runs Solr. For 500KB documents, you probably want a ramBufferSizeMB of 64-128. You probably want to greatly reduce the size of your documentCache, and possibly the other caches as well. Your autowarm counts are very high - you'll want to reduce those so that your cache warming time is low when you commit and open a new searcher. With an index size of 80GB, you'll probably need a heap size of 8GB. Depending on how you use Solr, you might need more. If you read the wiki page carefully, you'll also realize that in addition to this heap memory, you need additional memory to cache your index - between 40 and 80GB of additional memory. The absolute minimum server size you want here is 48GB, and 128GB would be *much* better. Reducing your index size might be a critical step. Do you need to store all fields? Most people don't need all the fields in order to display the top N search results. When showing a detail page to the user, most people can get the bulk of their data from another data store by using an ID value retrieved from Solr. The performance problems that come from your disk cache being too small can carry over into OutOfMemory exceptions that you wouldn't otherwise get, because it makes indexing and queries take too long. When they take too long, you can end up doing too many of them at the same time, chewing up additional memory. Thanks, Shawn
Re: Setting up Solr
On Wed, Jun 5, 2013 at 1:48 AM, Aaron Greenspan aar...@thinkcomputer.com wrote: I say this not because I enjoy starting flame wars or because I have the time to participate in them--I don't. I realize that there's a long history to Solr and I am the new kid who doesn't get it. Nonetheless, that doesn't change the way it works, and many users will be just like me. So just know that I'd just like to see Solr improve--frankly, I need it to--and if these issues were not already glaringly obvious, they should be now. This! Seriously, I think this feedback is valuable and I have recently gone through a similar experience. This is why I have written a book specifically targeting people who basically got their first (example) collection running and are now stuck on how to get the second (first 'real one') do what they want. The book is available for pre-orders at: http://www.packtpub.com/apache-solr-for-indexing-data/book (out in a couple more days) and a bunch of sample configurations that go with it are at: https://github.com/arafalov/solr-indexing-book On specific points, I do agree that we need to make Admin WebUI to have the first/only core pre-selected. If nobody has created a JIRA for this yet, I will. And, I think, perhaps we need absolutely minimal solr configuration shipping in Solr distribution. With a single '*' field and so on. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Re: Indexing Heavy dataset
ok thanks for the reply The field having values like 60kb each Furthermore, I have realized that the issue is with MySQL as its not processing this table when a where is applied Secondly, I have turned this field to *stored=false* and now the *select/ * is fast working again On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey s...@elyograg.org wrote: On 6/5/2013 3:08 AM, Raheel Hasan wrote: Hi, I am trying to index a heavy dataset with 1 particular field really too heavy... However, As I start, I get Memory warning and rollback (OutOfMemoryError). So, I have learned that we can use -Xmx1024m option with java command to start the solr and allocate more memory to the heap. My question is, that since this could also become insufficient later, so it the issue related to cacheing? here is my cache block in solrconfig: filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ I am thinking like maybe I need to turn of the cache for documentClass. Anyone got a better idea? Or perhaps there is another issue here? Exactly how big is this field? Do you need this giant field returned with your results, or is it just there for searching? Caches of size 512, especially with autowarm disabled, are probably not a major cause for concern, unless the big field is big enough so that 512 of them is really really huge. If that's the case, I would reduce the size of your documentCache, not turn it off. The value of ramBufferSizeMB elsewhere in your config is more likely to affect how much RAM gets used during indexing. The default for this field as of Solr 4.1.0 is 100. Most people can reduce this value. I'm writing a reply to another thread where you are participating, with info that will likely be useful for you too. Look for that. Thanks, Shawn -- Regards, Raheel Hasan
Re: Indexing Heavy dataset
some values in the field are up to a 1M as well On Wed, Jun 5, 2013 at 7:27 PM, Raheel Hasan raheelhasan@gmail.comwrote: ok thanks for the reply The field having values like 60kb each Furthermore, I have realized that the issue is with MySQL as its not processing this table when a where is applied Secondly, I have turned this field to *stored=false* and now the * select/* is fast working again On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey s...@elyograg.org wrote: On 6/5/2013 3:08 AM, Raheel Hasan wrote: Hi, I am trying to index a heavy dataset with 1 particular field really too heavy... However, As I start, I get Memory warning and rollback (OutOfMemoryError). So, I have learned that we can use -Xmx1024m option with java command to start the solr and allocate more memory to the heap. My question is, that since this could also become insufficient later, so it the issue related to cacheing? here is my cache block in solrconfig: filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ I am thinking like maybe I need to turn of the cache for documentClass. Anyone got a better idea? Or perhaps there is another issue here? Exactly how big is this field? Do you need this giant field returned with your results, or is it just there for searching? Caches of size 512, especially with autowarm disabled, are probably not a major cause for concern, unless the big field is big enough so that 512 of them is really really huge. If that's the case, I would reduce the size of your documentCache, not turn it off. The value of ramBufferSizeMB elsewhere in your config is more likely to affect how much RAM gets used during indexing. The default for this field as of Solr 4.1.0 is 100. Most people can reduce this value. I'm writing a reply to another thread where you are participating, with info that will likely be useful for you too. Look for that. Thanks, Shawn -- Regards, Raheel Hasan -- Regards, Raheel Hasan
Re: Setting up Solr
If we see the UI of other cloud base softwares like couchbase or riak, they are more intuitive than solr's UI. Of course the UI is brand new and need a lot of improvements. Per example the possibility of select a existing config from zookeeper when you are using the wizard to create a collection. Even more, a section to upload a config from de UI without use the cryptical zkClient script. Regards, -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, June 5, 2013 at 3:21 PM, Alexandre Rafalovitch wrote: On Wed, Jun 5, 2013 at 1:48 AM, Aaron Greenspan aar...@thinkcomputer.com (mailto:aar...@thinkcomputer.com) wrote: I say this not because I enjoy starting flame wars or because I have the time to participate in them--I don't. I realize that there's a long history to Solr and I am the new kid who doesn't get it. Nonetheless, that doesn't change the way it works, and many users will be just like me. So just know that I'd just like to see Solr improve--frankly, I need it to--and if these issues were not already glaringly obvious, they should be now. This! Seriously, I think this feedback is valuable and I have recently gone through a similar experience. This is why I have written a book specifically targeting people who basically got their first (example) collection running and are now stuck on how to get the second (first 'real one') do what they want. The book is available for pre-orders at: http://www.packtpub.com/apache-solr-for-indexing-data/book (out in a couple more days) and a bunch of sample configurations that go with it are at: https://github.com/arafalov/solr-indexing-book On specific points, I do agree that we need to make Admin WebUI to have the first/only core pre-selected. If nobody has created a JIRA for this yet, I will. And, I think, perhaps we need absolutely minimal solr configuration shipping in Solr distribution. With a single '*' field and so on. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
copyField generates multiple values encountered for non multiValued field
I have the exact same problem as the guy here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E AFAICS he did not get an answer. Is this a known issue? What can I do other than doing what copyField should do in my application? I am using solr 4.0.0. Thanks, Robert
data-import problem
Hello Solr-Friends, I have a problem with my current solr configuration. I want to import two tables into solr. I got it to work for the first table, but the second table doesn't get imported (no errormessage, 0 rows skipped). I have two tables called name and title and i want to load their fields called id, name and id title (two id colums that have nothing to do with each other) This is in my data-config.xml: document entity name=name query=SELECT id, name FROM name/entity /document document entity name=title query=SELECT id AS titleid, title FROM name/entity /document and this is in my schema.xml: field name=id type=string indexed=true stored=true / field name=name type=text_general indexed=true stored=true / field name=titleid type=string indexed=true stored=true / field name=title type=text_general indexed=true stored=true / dynamicField name=* type=ignored multiValued=true / /fields uniqueKeyid/uniqueKey /schema I chose that unique key only because solr asked for it. In my SolrAdmin Scheme-Browser I can see three fields id, name and title, but titleid is missing and title itself is empy with no entries. I don't know how to get it work to index two seperate lists. I hope someone can help, thank you! PS: I am sorry if this mail reached you twice. I sent it the first time when I was not registered yet and don't know if the mail was received. Sending now again after registration to mailing list.
Re: copyField generates multiple values encountered for non multiValued field
I think the suggestion I have seen is that copyField should be index-only and - therefore - will not be returned. It is primarily there to make searching easier by aggregating fields or to provide alternative analyzer pipeline. Can you make your copyField destination not stored? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Jun 5, 2013 at 10:37 AM, Robert Krüger krue...@lesspain.de wrote: I have the exact same problem as the guy here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E AFAICS he did not get an answer. Is this a known issue? What can I do other than doing what copyField should do in my application? I am using solr 4.0.0. Thanks, Robert
Re: Setting up Solr
We have a number of Jira issues that specifically deal with something called Developer Curb Appeal. I think it's pretty clear that we need to tackle a bunch of things we could call Newcomer Curb Appeal. I can work on filing some issues, some of which will address code, some of which will address the docs included with Solr and the wiki pages referenced there. I have filed the master issue. I will file some linked issues over the next few days. All ideas and patches welcome. https://issues.apache.org/jira/browse/SOLR-4901 The wiki is our primary documentation. Updates are appreciated. In order to edit the wiki, you must create an account and ask on this mailing list for it to be added to the contributors group. Thanks, Shawn
Re: Solr - ORM like layer
Sorry for opening a new thread. As i sent first message w/o subscribing the mailing list, i couldn't find a possible solution to reply original thread. The messaging stream is attached below. Actually the requirement came up from such a scenario: We collect some xml documents from some external resources and need to parse those xml docs and index some part of them. But those xml docs have different roots and attributes in. So we generate all possible classes for each root type via JAXB. As each document have different informative values, each of them should be indexed into seperate solr instances. The module we wrote simply generates a solr schema template with respect to all aggregative objects in root object (recursively) except annotation @SolrIndexIgnore owners. And also we are able to generate a SolrDocument from given object and index it to specified solr instance. While retrieving results from solr, we generate a list of this object's, from SolrDocument instances. Hibernate configuration for Lucene indexing is a bit different i thought, as we are able to generate solr-schema from given object. Best. -Original Message- From: Tuğcem Oral Sent: Tuesday, June 04, 2013 8:57 AM To: solr-user@lucene.apache.org Subject: Solr - ORM like layer Hi folks, I wonder that there exist and ORM like layer for solr such that it generates the solr schema from given complex object type and index given list of corresponding objects. I wrote a simple module for that need in one of my projects and happyly ready to generalize it and contribute to solr, if there's not such a module exists or in progress. Thanks all. -- TO Solr doesn't support complex objects directly - you must flatten and otherwise denormalize them. If you do want to store something like a graph in Solr, make each node a separate document (and try to avoid the temptation to play games with dynamic and multivalued fields). But if you have a tool to automatically flatten and denormalize complex objects and graphs and database joins, great. Please describe what it actually does in a little more (but not excessive) detail. -- Jack Krupansky -Original Message- From: Tuğcem Oral Sent: Tuesday, June 04, 2013 8:57 AM To: solr-user@lucene.apache.org Subject: Solr - ORM like layer Hi folks, I wonder that there exist and ORM like layer for solr such that it generates the solr schema from given complex object type and index given list of corresponding objects. I wrote a simple module for that need in one of my projects and happyly ready to generalize it and contribute to solr, if there's not such a module exists or in progress. Thanks all. -- TO If by ORM you mean Object Relational Mapping, Hibernate has annotations for Lucene and if my memory doesn't betray me I think you can configure a Solr server at Hibernate config. I have successfully mapped POJO's to Lucene and done text search, it all happens like magic once your annotations and configuration is right. Hope that helps, Guido. On 04/06/13 13:57, Tuğcem Oral wrote: Hi folks, I wonder that there exist and ORM like layer for solr such that it generates the solr schema from given complex object type and index given list of corresponding objects. I wrote a simple module for that need in one of my projects and happyly ready to generalize it and contribute to solr, if there's not such a module exists or in progress. Thanks all. -- TO
Phrase matching with set union as opposed to set intersection on query terms
How would one write a query which should perform set union on the search terms (term1 OR term2 OR term3), and yet also perform phrase matching if both terms are found? I tried a few variants of the following, but in every case I am getting set intersection on the search terms: select?q={!q.op=OR}text:term1 term2~10 Thus, if term1 matches 10 documents and term2 matches 20 documents, then SET UNION would include all of the documents that have either term1 and/or term2. That means that between 20-30 results should be returned. Conversely, SET INTERSECTION would return only results with _both_ term1 _and_ term2, which could be between 0-10 documents. Note that in the application, users will be searching for any arbitrary number of terms, in fact they will be entering phrases. I can limit these phrases to 140 characters if needed. Thank you in advance! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Phrase matching with set union as opposed to set intersection on query terms
On 6/5/2013 9:03 AM, Dotan Cohen wrote: How would one write a query which should perform set union on the search terms (term1 OR term2 OR term3), and yet also perform phrase matching if both terms are found? I tried a few variants of the following, but in every case I am getting set intersection on the search terms: select?q={!q.op=OR}text:term1 term2~10 A phrase search by definition will require all terms to be present. Even though it is multiple terms, conceptually it is treated as a single term. It sounds like what you are after is what edismax can do. If you define the pf field in addition to the qf field, Solr will do something pretty amazing - it will automatically construct a phrase query from a non-phrase query and search with it against multiple fields. Done correctly, this means that an exact match will be listed first in the results. http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 Thanks, Shawn
Re: SpatialRecursivePrefixTreeFieldType Spatial Searching
Everything is working great now. Thanks David On Wed, Jun 5, 2013 at 12:07 AM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: maxDistErr should be like 0.3 based on earlier parts of this discussion since your data is to one of a couple hours of the day, not whole days. If it was whole days, you would use 1. Changing this requires a re-index. So does changing worldBounds if you do so. distErrPct should be 0. Changing it does not require a re-index because you are indexing points, not other shapes. This only affects other shapes. Speaking of that slight buffer to the query shape I said in my last email, it should be half of maxDistErr, whatever you set that to. So use like 0.1. ~ David Chris Atkinson wrote Hi David, Thanks for your continued help. I think that you have nailed it on the head for me. I'm 100% sure that I had previously tried that query without success. I'm not sure if perhaps I had wrong distErrPct or maxDistErr values... It's getting late, so I'm going to call it a night (I'm on GMT), but I'll put your example into practice tomorrow and get confirmation that it's working as expected. I'll keep playing around with the distErrPct values as well. Do I need to do a reindex if I change these values? (I think yes?) On Tue, Jun 4, 2013 at 10:44 PM, Smiley, David W. lt; dsmiley@ gt; wrote: So availability is the absence of any other document's indexed time duration overlapping with your availability query duration. So I think you should negate an overlaps query. The overlaps query looks like: Intersects(-Inf start end Inf). And remember the slight buffering needed as described on the wiki. You'd add a small fraction to the start time and subtract a small fraction from the end time, so that you don't accidentally match a document that is adjacent. -availability_spatial:Intersects( 0 30.5 114.5 3650 ) Does that work against your data? If it doesn't, can you conjecture why it doesn't work based on a sample point in a document that it matched, or a document that should have matched but didn't? ~ David On 6/4/13 3:31 PM, Chris Atkinson lt; chrisacky@ gt; wrote: Here is an example I have tried. So let's assume that I want to checkIn on the 30th day, and leave on the 115th day. My query would be: -availability_spatial:Intersects( 30 0 3650 115 ) However, that wouldn't match anything. Here is an example document below so you can see. (I've not negated the spatial field in the filter query so you can see the field coordinates) In case the formatting is bad: See here http://pastie.org/pastes/8006249/text ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status 0 /int int name=QTime 1 /int lst name=params str name=flavailability_spatial /str str name=indent true /str str name=qid:38197 /str str name=_ 1370374172298 /str str name=wt xml /str str name=fq availability_spatial:Intersects( 30 0 3650 115 ) /str /lst /lst result name=response numFound=1 start=0 doc arr name=availability_spatial str 147.6 163.4 /str str 164.6 178.4 / str str 192.6 220.4 /str str 241.6 264.4 /str /arr /doc /result / response On Tue, Jun 4, 2013 at 8:14 PM, Chris Atkinson lt; chrisacky@ gt; wrote: Thanks David. Query times are really quick and my index is only 20Mb now which is about what I would expect. I'm having some problems figuring out what type of query I want to find *Available* properties with this new points system. I'm storing bookings against each document. So I have X Y coordinates, where X will be the check in of a previous booking, and Y will be the departure. So for example illustrative purposes, a weeks booking from 10th January to the 17th, would be X Y = 10 17 field name=booking 10 17 /field field name=booking 22 27 /field I might have several bookings. Now, I want to find available properties with my search, but I'm just not sure on the ordering of the end/start in the polygon Intersect. I've looked at this document very carefully and tried to draw it all out on paper. https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-2013011 7/ Here are the suggestions: q=fieldX:Intersects(-ƒ end start ƒ) q=fieldX:Intersects(-ƒ start end ƒ) q=fieldX:Intersects(start -ƒ ƒ end) All of these, are great for finding the existance of a field coordinate, but I need to make sure that the property is available. So I thought I could use one of these three queries in the negative by using -fieldX:Inter but none of those work. Can you shine some light on what I might be missing? What
Re: copyField generates multiple values encountered for non multiValued field
Try describing your own symptom in your own words - because his issue related to Solr 1.4. I mean, where exactly are you setting allowDuplicates=false?? And why do you think it has anything to do with adding documents to Solr? Solr 1.4 did not have atomic update, so sending the exact same document twice would not result in a change in the index (unless you had a date field with a value of NOW.) Copy field only uses values from the current document. -- Jack Krupansky -Original Message- From: Robert Krüger Sent: Wednesday, June 05, 2013 10:37 AM To: solr-user@lucene.apache.org Subject: copyField generates multiple values encountered for non multiValued field I have the exact same problem as the guy here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E AFAICS he did not get an answer. Is this a known issue? What can I do other than doing what copyField should do in my application? I am using solr 4.0.0. Thanks, Robert
Re: Phrase matching with set union as opposed to set intersection on query terms
term1 OR term2 OR term1 term2^2 term1 OR term2 OR term1 term2~10^2 The latter would rank documents with the terms nearby higher, and the adjacent terms highest. term1 OR term2 OR term1 term2~10^2 OR term1 term2^20 OR term2 term1^20 To further boost adjacent terms. But the edismax pf/pf2/pf3 options might be good enough for you. -- Jack Krupansky -Original Message- From: Shawn Heisey Sent: Wednesday, June 05, 2013 11:10 AM To: solr-user@lucene.apache.org Subject: Re: Phrase matching with set union as opposed to set intersection on query terms On 6/5/2013 9:03 AM, Dotan Cohen wrote: How would one write a query which should perform set union on the search terms (term1 OR term2 OR term3), and yet also perform phrase matching if both terms are found? I tried a few variants of the following, but in every case I am getting set intersection on the search terms: select?q={!q.op=OR}text:term1 term2~10 A phrase search by definition will require all terms to be present. Even though it is multiple terms, conceptually it is treated as a single term. It sounds like what you are after is what edismax can do. If you define the pf field in addition to the qf field, Solr will do something pretty amazing - it will automatically construct a phrase query from a non-phrase query and search with it against multiple fields. Done correctly, this means that an exact match will be listed first in the results. http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 Thanks, Shawn
Solr 4.3 with Internationalization.
Guys, I am going to use the Solr4.3 to my Shopping cart project. So I need to support my website with two languages(English and French). So I want some guide for implement the internationalization with the Slor4.3. Please guide with some sample configuration to support the French language with Solr4.3. Thanks in advance. Guru. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-with-Internationalization-tp4068368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2.1 higher memory footprint vs Solr 3.5
Shawn: You're right, I thought I'd seen it as a field option but I think I was confusing really old solr. Thanks for catching, having gotten it wrong once I'm sure I'll remember it better for next time! Erick On Tue, Jun 4, 2013 at 1:57 PM, SandeepM skmi...@hotmail.com wrote: Thanks Eric and Shawn, Your explanations help understand where SOLR may be spending its time. Sounds like compression can be a CPU and heap hog. (I'll try to confirm this with the heapdumps) Initially we tried to keep the JVM heap sizes the same on both Solr 3.5 and 4.2.1, which was around 3GB ,which 3.5 handled well even with a 200QPS load. Moving to 4.2.1 with the same heap size instantly killed the Server. Changing the JVM to 6GB (double) did not help either. We were seeing higher CPU and higher heap usage. We later changed cache settings so as to reduce their sizes, increased the JVM to 8GB and we see an improvement. But over time, we do see that the Heap utilization slowly climbs as the 200QPS test is allowed to run, and sometimes leads to max heap being exceeded from the JConsole. So we see the jagged edge waveform which keeps climbing (GC cycles don't completely collect memory over time). Our test has a short capture from real traffic and we are replaying that via solrmeter. Thanks. Regards, -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068150.html Sent from the Solr - User mailing list archive at Nabble.com.
Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string
Hi, Is it possible to configure solr to suggest the indexed string for all the searches of the substring of the string? Thanks, Prathik
No files added to classloader from lib
Hi, I downloaded Solr 4.3 and I am attempting to run and configure a separate Solr instance under Jetty. I copied the Solr dist directory contents to a directory called solrDist under the single core db that I was running. I then attempted to get the DataImportHandler using the following in my solrconfig.xml: lib dir=solrDist/ regex=apache-solr-dataimporthandler-.*\.jar / In the log file, I see a lot of messages that the Jar Files in solrDist were added to the classloader. E.g. ……. 534 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-clustering-4.3.0.jar' to classloader 534 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-core-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-extras-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-langid-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-solrj-4.3.0.jar' to classloader . However in the end I get the following Warning: 570 [coreLoadExecutor-3-thread-1] WARN org.apache.solr.core.SolrResourceLoader - No files added to classloader from lib: solrDist/ (resolved as: C:\Users\MyUsername\Documents\Jetty\Jetty9\solr\db\solrDist). Why is this? I thought the Jar Files were added to the classloader, but the last messages seems to say that none were added. I know that this is a warning, but I am just curious. I’d be grateful to anyone who has an idea regarding this. Thank you, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with zkcli.sh linkconfig
Sounds like a bug - we probably don't have a test that updates a link - if you can make a JIRA issue, I'll be happy to look into it soon. - Mark On Jun 4, 2013, at 8:16 AM, Shawn Heisey s...@elyograg.org wrote: I've got Solr 4.2.1 running SolrCloud. I need to change the config set associated with a collection. I'm having a problem with this. Here's the command that I'm running, domain name redacted: /opt/mbsolr4/cloud-scripts/zkcli.sh -cp /opt/mbsolr4/lib/ext/slf4j-api-1.7.2.jar:/opt/mbsolr4/lib/ext/slf4j-log4j12-1.7.2.jar -z mbzoo1.REDACTED.com:2181,mbzoo2.REDACTED.com:2181,mbzoo3.REDACTED.com:2181/mbsolr1 -collection twotest -confname mbtestcfg -cmd linkconfig Here's part of the resulting log: Jun 04, 2013 9:08:44 AM org.apache.solr.cloud.ZkController linkConfSet INFO: Load collection config from:/collections/p Jun 04, 2013 9:08:44 AM org.apache.solr.common.cloud.SolrZkClient makePath INFO: makePath: /collections/p It partially creates a new collection named p, which is not referenced on my commandline. This partial collection IS linked to the config set that I referenced. The same thing happens if I use -c and -n instead of -collection and -confname. Am I doing something wrong, or is this a bug? Will I need to recreate the collection as a workaround? Thanks, Shawn
Re: Phrase matching with set union as opposed to set intersection on query terms
On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey s...@elyograg.org wrote: On 6/5/2013 9:03 AM, Dotan Cohen wrote: How would one write a query which should perform set union on the search terms (term1 OR term2 OR term3), and yet also perform phrase matching if both terms are found? I tried a few variants of the following, but in every case I am getting set intersection on the search terms: select?q={!q.op=OR}text:term1 term2~10 A phrase search by definition will require all terms to be present. Even though it is multiple terms, conceptually it is treated as a single term. It sounds like what you are after is what edismax can do. If you define the pf field in addition to the qf field, Solr will do something pretty amazing - it will automatically construct a phrase query from a non-phrase query and search with it against multiple fields. Done correctly, this means that an exact match will be listed first in the results. http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 Thanks, Shawn Thank you Shawn, this pretty much does what I need it to do: select?defType=edismaxq={!q.op=OR}search_field:term1 term2pf=search_field I'm reviewing the Edismax page now. Is there any other documentation that I should review? I have found the Edismax page at the wonderful lucidworks site, but if there are any other documentation that I should review to squeeze the most out of Edismax thenI would love to know about it. http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser Thank you very much! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Two instances of solr - the same datadir?
Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1 But this is not an ideal solution; I'd like for the read-only server to discover index changes on its own. Any pointers? Thanks, roman On Tue, Jun 4, 2013 at 2:01 PM, Roman Chyla roman.ch...@gmail.com wrote: Hello, I need your expert advice. I am thinking about running two instances of solr that share the same datadirectory. The *reason* being: indexing instance is constantly building cache after every commit (we have a big cache) and this slows it down. But indexing doesn't need much RAM, only the search does (and server has lots of CPUs) So, it is like having two solr instances 1. solr-indexing-master 2. solr-read-only-master In the solrconfig.xml I can disable update components, It should be fine. However, I don't know how to 'trigger' index re-opening on (2) after the commit happens on (1). Ideally, the second instance could monitor the disk and re-open disk after new files appear there. Do I have to implement custom IndexReaderFactory? Or something else? Please note: I know about the replication, this usecase is IMHO slightly different - in fact, write-only-master (1) is also a replication
Re: No files added to classloader from lib
apache-solr-dataimporthandler-.*\.jar - note that the apache- prefix has been removed from Solr jar files. -- Jack Krupansky -Original Message- From: O. Olson Sent: Wednesday, June 05, 2013 12:01 PM To: solr-user@lucene.apache.org Subject: No files added to classloader from lib Hi, I downloaded Solr 4.3 and I am attempting to run and configure a separate Solr instance under Jetty. I copied the Solr dist directory contents to a directory called solrDist under the single core db that I was running. I then attempted to get the DataImportHandler using the following in my solrconfig.xml: lib dir=solrDist/ regex=apache-solr-dataimporthandler-.*\.jar / In the log file, I see a lot of messages that the Jar Files in solrDist were added to the classloader. E.g. ……. 534 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-clustering-4.3.0.jar' to classloader 534 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-core-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-dataimporthandler-extras-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-langid-4.3.0.jar' to classloader 535 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - Adding 'file:/C:/Users/MyUsername/Documents/Jetty/Jetty9/solr/db/lib/solr-solrj-4.3.0.jar' to classloader . However in the end I get the following Warning: 570 [coreLoadExecutor-3-thread-1] WARN org.apache.solr.core.SolrResourceLoader - No files added to classloader from lib: solrDist/ (resolved as: C:\Users\MyUsername\Documents\Jetty\Jetty9\solr\db\solrDist). Why is this? I thought the Jar Files were added to the classloader, but the last messages seems to say that none were added. I know that this is a warning, but I am just curious. I’d be grateful to anyone who has an idea regarding this. Thank you, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase matching with set union as opposed to set intersection on query terms
On Wed, Jun 5, 2013 at 6:23 PM, Jack Krupansky j...@basetechnology.com wrote: term1 OR term2 OR term1 term2^2 term1 OR term2 OR term1 term2~10^2 The latter would rank documents with the terms nearby higher, and the adjacent terms highest. term1 OR term2 OR term1 term2~10^2 OR term1 term2^20 OR term2 term1^20 To further boost adjacent terms. But the edismax pf/pf2/pf3 options might be good enough for you. Thank you Jack. I suppose that I could write a script in PHP to create such a query string from an arbitrary-length phrase, but it wouldn't be pretty! Edismax does in fact meet my need, though. Thanks! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string
ngrams? See: http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html -- Jack Krupansky -Original Message- From: Prathik Puthran Sent: Wednesday, June 05, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string Hi, Is it possible to configure solr to suggest the indexed string for all the searches of the substring of the string? Thanks, Prathik
Re: Solr 4.2.1 higher memory footprint vs Solr 3.5
/So we see the jagged edge waveform which keeps climbing (GC cycles don't completely collect memory over time). Our test has a short capture from real traffic and we are replaying that via solrmeter./ Any idea why the memory climbs over time. The GC should cleanup after data is shipped back. Could there be a memory leak in SOLR? Appreciate any help. Thanks. -- Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-2-1-higher-memory-footprint-vs-Solr-3-5-tp4067879p4068378.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase matching with set union as opposed to set intersection on query terms
Is there any other documentation that I should review? It's in the works! Within a week or two. -- Jack Krupansky -Original Message- From: Dotan Cohen Sent: Wednesday, June 05, 2013 12:06 PM To: solr-user@lucene.apache.org Subject: Re: Phrase matching with set union as opposed to set intersection on query terms On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey s...@elyograg.org wrote: On 6/5/2013 9:03 AM, Dotan Cohen wrote: How would one write a query which should perform set union on the search terms (term1 OR term2 OR term3), and yet also perform phrase matching if both terms are found? I tried a few variants of the following, but in every case I am getting set intersection on the search terms: select?q={!q.op=OR}text:term1 term2~10 A phrase search by definition will require all terms to be present. Even though it is multiple terms, conceptually it is treated as a single term. It sounds like what you are after is what edismax can do. If you define the pf field in addition to the qf field, Solr will do something pretty amazing - it will automatically construct a phrase query from a non-phrase query and search with it against multiple fields. Done correctly, this means that an exact match will be listed first in the results. http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 Thanks, Shawn Thank you Shawn, this pretty much does what I need it to do: select?defType=edismaxq={!q.op=OR}search_field:term1 term2pf=search_field I'm reviewing the Edismax page now. Is there any other documentation that I should review? I have found the Edismax page at the wonderful lucidworks site, but if there are any other documentation that I should review to squeeze the most out of Edismax thenI would love to know about it. http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser Thank you very much! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Create index on few unrelated table in Solr
Yes. My ID field is uniquekey. How can I don't override each other? -- View this message in context: http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054p4068371.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
Maybe problem is two document declare in data-config.xml. You will try change this one. document entity name=name query=SELECT id, name FROM name/entity entity name=title query=SELECT id AS titleid, title FROM name/entity /document -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068306p4068373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multitable import - uniqueKey
Hehe. Yes my all tables ID field names are different. For example: I have 5 table. These names are 'admin, account, group, checklist' admin=id -uniquekey account=account_id -uniquekey group=group_id -uniquekey checklist=id-uniquekey Also I thought last entity overwrite other entities. I'm sorry. I don't understand your this example. Now I try to use below config. ### data-config.xml entity name=admin query=select *from admin dataSource=ds-1 field column=id name=id / entity name=checklist query=select *from checklist dataSource=ds-1 field column=id name=id / entity name=groups query=select *from groups dataSource=ds-1 field column=group_id name=id / entity name=account query=select *from accounts dataSource=ds-1 field column=account_id name=id / Then my schema.xml field name=id stored=true type=string multiValued=false indexed=true/ uniqueKeyid/uniqueKey How can I don't overwrite other entities? Please assist me on this example. -- View this message in context: http://lucene.472066.n3.nabble.com/Multitable-import-uniqueKey-tp4067796p4068384.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string
ngrams won't work here. If I index all the ngrams of the string and when I try to search for some string it would suggest all the ngrams as well. Eg: Dictionary contains the word Jason Bourne and you index all the ngrams of the above word. When I try to search for Jason solr suggests all the ngrams having the word Jason. Instead of just suggesting Jason Bourne lucene suggests Jason B, Jason Bo, Jason Bou, Jason Bour, Jason Bourn, Jason Bourne. What should I do so that I only get Jason Bourne as the suggestion when the uses searches any substring of this (Bour, Bourne etc). On Wed, Jun 5, 2013 at 9:39 PM, Jack Krupansky j...@basetechnology.comwrote: ngrams? See: http://lucene.apache.org/core/**4_3_0/analyzers-common/org/** apache/lucene/analysis/ngram/**NGramFilterFactory.htmlhttp://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html -- Jack Krupansky -Original Message- From: Prathik Puthran Sent: Wednesday, June 05, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string Hi, Is it possible to configure solr to suggest the indexed string for all the searches of the substring of the string? Thanks, Prathik
Re: Multitable import - uniqueKey
: How can I don't overwrite other entities? : Please assist me on this example. I'm confused, you sent this in direct reply to my last message, which contained the following... 1) a paragraph describing the general approach to solving this type of problem... You can use TemplateTransformer to create a synthetic ID for each entity using some constant value combined with the auto-increment value from your DB, for example... 2) a link to an article i wrote a while back dicussing how to solve the exact problem you are having... http://searchhub.org/2011/02/12/solr-powered-isfdb-part-4/ 3) links to specific commits in a github repo where there is a working example of using DIH to index multiple types of documents from differnet tables in a single Solr index. The commits i linked to show *exactly* which changes are needed to go from indexing a single entity to indexing two entities w/o conflicting ids... https://github.com/lucidimagination/isfdb-solr/commit/85d7caf19746399755f6f1c39f48a654da3c5b11 https://github.com/lucidimagination/isfdb-solr/commit/26e945747404125ce5b835e2157c6e2612ff2387 ...did you look at any of this? did you try it? do you have any pecific quesions baout this approach? -Hoss
Configuring seperate db-data-config.xml per shard
Hi, We have a setup where we have 3 shards in a collection, and each shard in the collection need to load different sets of data That is Shard1- will contain data only for Entity1 Shard2 - will contain data for entity2 shard3- will contain data for entity3 So in this case,. the db-data-config.xml can't be same for three shards so it can;'t be uploaded in zookeeper. Is there any way, where we can mantain db-data-config.xml inside each shard's folder and make our shards to refer to this db-data-config.xml(during data import), rather than looking for this file in zookeepers repository Thanks in Advance Radha -- View this message in context: http://lucene.472066.n3.nabble.com/Configuring-seperate-db-data-config-xml-per-shard-tp4068383.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Create index on few unrelated table in Solr
Please don't create new threads re-asking the same questions -- especailly when the existing thread is only a day old, and still actively getting responses. it just increases the overall noise of of the list, and results in multiple people wasting their time providing you with the same answers... http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3ccaldws-wknmwuralhhmmmtth+7noy1ewu0z-shtmwcoaxzes...@mail.gmail.com%3E http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3Calpine.DEB.2.02.1306041534070.2959@frisbee%3E : Date: Tue, 4 Jun 2013 02:10:52 -0700 (PDT) : From: sodoo first...@yahoo.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Create index on few unrelated table in Solr : : I want to create index few tables. All tables not related. : : In data-config.xml, that I created to create index : : dataConfig : dataSource type=JdbcDataSource : name=ds-1 : driver=com.mysql.jdbc.Driver : url=jdbc:mysql://localhost/testdb : user=root : password=***/ : document : entity name=admin query=select *from admin dataSource=ds-1 : field column=id name=id / : field column=name name=name / : field column=mail name=mail / : /entity : : entity name=checklist query=select *from checklist : dataSource=ds-1 : field column=id name=id / : field column=title name=title / : field column=connect name=connect / : /entity : : entity name=account query=select *from account dataSource=ds-1 : field column=id name=id / : field column=name name=name / : field column=code name=code / : /entity : /document : : And I have register schema.xml these fields. : I tried to make full import but unfortunately only the last entity is : indexed. Other entities are not index. : : What should I do to import all the entities? : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
Re: Phrase matching with set union as opposed to set intersection on query terms
select?defType=edismaxq={!q.op=OR}search_field:term1 term2pf=search_field Is there any way to perform a fuzzy search with this method? I have tried appending ~1 to every term in the search like so: select?defType=edismaxq={!q.op=OR}search_field:term1~1%20term2~1pf=search_field However, two issues: 1) It doesn't work! The results are identical to the results given when not appending ~1 to every term (or ~3). 2) If at all possible, I would rather define the 'fuzzyness' elsewhere. Right now I would have to mangle the user-input in order to add the ~1 to the end of each term. Note that the ExtendedDisMax page does in fact mention that fuzziness is supported: http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Query Elevation Component
I have not implemented it yet. And I forget the exact webpage I found. But there was a person on that page discussing the same problem and said it was easy to implement a solution for it but he did not share his solution. If you figure it out let me know. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Elevation-Component-tp4056856p4068394.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: facet.missing=true returns null records with zero count also
Hoss, We rely heavily on facet.mincount because once a user has selected a facet, it doesn't make sense for us to show that facet field to him and let him filter again with the same facet. Also, when a facet has only one value, it doesn't make sense to show it to the user, since searching with that facet is just going to give the same result set again. So when facet.missing does not work with facet.mincount, it is a bit of a hassle for us Will work on handling it in our program.Thank you for the clarification - Rahul On Wed, Jun 5, 2013 at 12:32 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : that facet value and see all documents. I thought facet.missing=true was : the answer. ... : facquery.setFacetMinCount(1); Hmm, yeah -- it looks like facet.missing doesn't take facet.mincount into consideration. I don't remember if that was intentional or not, but as a special case one-off count it seems like a toss up as to wether it would be more or less surprising to hide it if it's below the mincount. (it's very similar to doing one off facet.query for example, and those are always included in the response and don't consider the facet.mincount either) In general, this seems like a low impact thing though, correct? i mean: the main advantage of facet.mincount is to reduce what could be a very large amount of useless data from being stream from the server-client, particularly in the case of using facet.sort where you really need the consraints eliminated server side in order to get the sort=limit applied correctly. but with the facet.missing value, it's just a single value per field that can easily be ignored by the client if it's not desired because of the mincount. or to put it another way: the amount of work needed to ignor this on the client, is less then the amount of work to make it configurable to ignore it on the server. -Hoss
Re: copyField generates multiple values encountered for non multiValued field
OK, I have two fields defined as follows: field name=name type=string indexed=true stored=true multiValued=false / field name=name2 type=string_ci indexed=true stored=true multiValued=false / and this copyField directive copyField source=name dest=name2/ I updated the Index using SolrJ and got the exact same error message that is in the subject. However, while waiting for feedback I built a workaround at the application level and now reconstructing the original state, to be able to answer you, I have different behaviour. What happens now is that the field name2 is populated with multiple values although it is not defined as multiValued (see above). Although this is strange, it is consistent with the earlier problem in that copyField does not seem to overwrite the existing field values. I may be using it incorrectly (it's the first time I am using copyField) but the docs in the wiki did not say anything about an overwrite option. Cheers, Robert On Wed, Jun 5, 2013 at 5:16 PM, Jack Krupansky j...@basetechnology.com wrote: Try describing your own symptom in your own words - because his issue related to Solr 1.4. I mean, where exactly are you setting allowDuplicates=false?? And why do you think it has anything to do with adding documents to Solr? Solr 1.4 did not have atomic update, so sending the exact same document twice would not result in a change in the index (unless you had a date field with a value of NOW.) Copy field only uses values from the current document. -- Jack Krupansky -Original Message- From: Robert Krüger Sent: Wednesday, June 05, 2013 10:37 AM To: solr-user@lucene.apache.org Subject: copyField generates multiple values encountered for non multiValued field I have the exact same problem as the guy here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E AFAICS he did not get an answer. Is this a known issue? What can I do other than doing what copyField should do in my application? I am using solr 4.0.0. Thanks, Robert
Re: java.lang.NumberFormatException when adding latitude,longitude using DIH
Thanks a lot for your response Hoss.. I thought about using scriptTransformer too but just thought of checking if there is any other way to do that.. Btw, for some reason the values are getting overridden even though its a multivalued field.. Not sure where I am going wrong!!! for latlong values - 33.7209548950195,34.474838 -117.176193237305,-117.573463 The below value is getting indexed.. arr name=geo str34.474838,-117.573463/str /arr *Script transformer:* -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-when-adding-latitude-longitude-using-DIH-tp4068223p4068401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Two instances of solr - the same datadir?
So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master=http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.comwrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to why replication doesn't meet your requirements? It was geared explicitly for this purpose, including the automatic discovery of changes to the data on the index master. Jason On Jun 4, 2013, at 1:50 PM, Roman Chyla roman.ch...@gmail.com wrote: OK, so I have verified the two instances can run alongside, sharing the same datadir All update handlers are unaccessible in the read-only master updateHandler class=solr.DirectUpdateHandler2 enable=${solr.can.write:true} java -Dsolr.can.write=false . And I can reload the index manually: curl http://localhost:5005/solr/admin/cores?wt=jsonaction=RELOADcore=collection1
Re: Create index on few unrelated table in Solr
Okey. I'm so sorry. I will not create same task in separate topic next time. -- View this message in context: http://lucene.472066.n3.nabble.com/Create-index-on-few-unrelated-table-in-Solr-tp4068054p4068405.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Sole instance state is down in cloud mode
Are you using IE? If so, you might want to try using Firefox. -Original Message- From: sathish_ix [mailto:skandhasw...@inautix.co.in] Sent: Wednesday, June 05, 2013 6:16 AM To: solr-user@lucene.apache.org Subject: Sole instance state is down in cloud mode Hi, When i start a core in solr-cloud im getting below message in log I have setup zookeeper separately and uploaded the config files. When i start the solr instance in cloud mode, state is down. INFO: Update state numShards=null message={ operation:state, numShards:null, shard:shard1, roles:null, *state:down,* core:core1, collection:core1, node_name:x:9980_solr, base_url:http://x:9980/solr} Jun 5, 2013 6:10:48 AM org.apache.solr.common.cloud.ZkStateReader$2 process INFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 1) When i hit the url , i am getting left pane of the solr admin and righ side its keep on loading, any help ? Thanks, Sathish -- View this message in context: http://lucene.472066.n3.nabble.com/Sole-instance-state-is-down-in-cloud-mode-tp4068298.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase matching with set union as opposed to set intersection on query terms
There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with nice examples. On 06/05/2013 12:13 PM, Jack Krupansky wrote: Is there any other documentation that I should review? It's in the works! Within a week or two. -- Jack Krupansky -Original Message- From: Dotan Cohen Sent: Wednesday, June 05, 2013 12:06 PM To: solr-user@lucene.apache.org Subject: Re: Phrase matching with set union as opposed to set intersection on query terms On Wed, Jun 5, 2013 at 6:10 PM, Shawn Heisey s...@elyograg.org wrote: On 6/5/2013 9:03 AM, Dotan Cohen wrote: How would one write a query which should perform set union on the search terms (term1 OR term2 OR term3), and yet also perform phrase matching if both terms are found? I tried a few variants of the following, but in every case I am getting set intersection on the search terms: select?q={!q.op=OR}text:term1 term2~10 A phrase search by definition will require all terms to be present. Even though it is multiple terms, conceptually it is treated as a single term. It sounds like what you are after is what edismax can do. If you define the pf field in addition to the qf field, Solr will do something pretty amazing - it will automatically construct a phrase query from a non-phrase query and search with it against multiple fields. Done correctly, this means that an exact match will be listed first in the results. http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 Thanks, Shawn Thank you Shawn, this pretty much does what I need it to do: select?defType=edismaxq={!q.op=OR}search_field:term1 term2pf=search_field I'm reviewing the Edismax page now. Is there any other documentation that I should review? I have found the Edismax page at the wonderful lucidworks site, but if there are any other documentation that I should review to squeeze the most out of Edismax thenI would love to know about it. http://docs.lucidworks.com/display/solr/The+Extended+DisMax+Query+Parser Thank you very much! -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: java.lang.NumberFormatException when adding latitude,longitude using DIH
That was a very silly mistake. I forgot to add the values to array before putting it inside row..the below code works.. Thanks a lot... -- View this message in context: http://lucene.472066.n3.nabble.com/java-lang-NumberFormatException-when-adding-latitude-longitude-using-DIH-tp4068223p4068410.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase matching with set union as opposed to set intersection on query terms
On Wed, Jun 5, 2013 at 9:04 PM, Eustache Felenc eustache.fel...@idilia.com wrote: There is also http://wiki.apache.org/solr/SolrRelevancyCookbook with nice examples. Thank you. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Pivot Facets refining datetime, bleh
This may be more suitable on the dev-list, but distributed pivot facets is a very powerful feature. The Jira issue for this is SOLR-2894 ( https://issues.apache.org/jira/browse/SOLR-2894). I have done some testing of the last patch for this issue, and it is as Andrew says: Everything but datetime fields works just fine. There are no error messages for datetime fields when used in a SolrCloud setup, the expected values are just not there. Best, Stein J. Gran On Thu, May 30, 2013 at 5:49 PM, Andrew Muldowney andrew.muldowne...@gmail.com wrote: I've been trying to get into how distributed field facets do their work but I haven't been able to uncover how they deal with this issue. Currently distrib pivot facets does a getTermCounts(first_field) to populate a list at the level its working on. When putting together the data structure we set up a BytesRef, fill it in with the value using the FieldType.ReadableToIndexed call and then add the FieldType.ToObject of that bytesRef and associated field. --From getTermCounts comes fieldValue-- termval = new BytesRef(); ftype.readableToIndexed(fieldValue, termval); pivot.add( value, ftype.toObject(sfield, termval) ); This works great for everything but datetime, as datetime's .ToObject turns it into a human readable string that is unconvertable -at least in my investigation. I've tried to use the FieldType.ToInternal but that also fails on the human readable datetime format. My original idea was to skip the aformentioned block of code and just straight add the fieldValue to the data structure. This caused some pivot facet tests to return wonky results, I'm not sure if I should go down the path of trying to figure out those problems or if there is a different approach I should be taking. Any general guidance on how distributed field facets deals with this would be much appreciated.
Re: data-import problem
Thanks so far. This change makes Solr work over the title-entries too, yay! Unfortunatly they don't get processed(skipped rows). In my log it says missing required field id for every entry. I checked my schema.xml. In there id is not set as a required field. removing the uniquekey-property also leads to no improvement. Any further ideas? Am 05.06.2013 18:01, schrieb sodoo: Maybe problem is two document declare in data-config.xml. You will try change this one. document entity name=name query=SELECT id, name FROM name/entity entity name=title query=SELECT id AS titleid, title FROM name/entity /document -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068306p4068373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
On Jun 5, 2013, at 20:39 , Stavros Delisavas stav...@delisavas.de wrote: Thanks so far. This change makes Solr work over the title-entries too, yay! Unfortunatly they don't get processed(skipped rows). In my log it says missing required field id for every entry. I checked my schema.xml. In there id is not set as a required field. removing the uniquekey-property also leads to no improvement. Any further ideas? You need a field to hold a unique identifier for the document, and your data-import setup must ensure that that specific fields gets a unique identifier. Unique here means unique across all documents, no matter where they come from.
Re: data-import problem
On 6 June 2013 00:09, Stavros Delisavas stav...@delisavas.de wrote: Thanks so far. This change makes Solr work over the title-entries too, yay! Unfortunatly they don't get processed(skipped rows). In my log it says missing required field id for every entry. I checked my schema.xml. In there id is not set as a required field. removing the uniquekey-property also leads to no improvement. [...] There are several things wrong with your problem statement. You say that you have two tables, but both SELECTs seem to use the same table. I am going to assume that you really have two different tables. Unless you have changed the default schema.xml, id should be defined as the uniqueKey for the document. You probably do not want to remove that, and even if you just remove the uniqueKey property, the field id remains defined as a required field. The issue is with with your SELECT for the second entity: entity name=title query=SELECT id AS titleid, title FROM name/entity This renames id to titleid, and hence the required field id in schema.xml is missing. While you do need something like: document entity name=name query=SELECT id, name FROM name1/entity entity name=title query=SELECT id, title FROM name2/entity /document However, you will need to ensure that the ids are unique in the two tables, else entries from the second entity will overwrite matching ids from the first. Also, do you have field definitions within the entities? Please share the complete schema.xml and the DIH configuration file with us, rather than snippets: Use pastebin.com if they are large. Regards, Gora
Re: copyField generates multiple values encountered for non multiValued field
Look in the Solr log - the error message should tell you what the multiple values are. For example, 95484 [qtp2998209-11] ERROR org.apache.solr.core.SolrCore – org.apache.solr.common.SolrException: ERROR: [doc=doc-1] multiple values encountered for non multiValued field content_s: [def, abc] One of the values should be the value of the field that is the source of the copyField. Maybe the other value will give you a clue as to where it came from. Check your SolrJ code - maybe you actually do try to initialize a value in the field that is the copyField target. -- Jack Krupansky -Original Message- From: Robert Krüger Sent: Wednesday, June 05, 2013 1:17 PM To: solr-user@lucene.apache.org Subject: Re: copyField generates multiple values encountered for non multiValued field OK, I have two fields defined as follows: field name=name type=string indexed=true stored=true multiValued=false / field name=name2 type=string_ci indexed=true stored=true multiValued=false / and this copyField directive copyField source=name dest=name2/ I updated the Index using SolrJ and got the exact same error message that is in the subject. However, while waiting for feedback I built a workaround at the application level and now reconstructing the original state, to be able to answer you, I have different behaviour. What happens now is that the field name2 is populated with multiple values although it is not defined as multiValued (see above). Although this is strange, it is consistent with the earlier problem in that copyField does not seem to overwrite the existing field values. I may be using it incorrectly (it's the first time I am using copyField) but the docs in the wiki did not say anything about an overwrite option. Cheers, Robert On Wed, Jun 5, 2013 at 5:16 PM, Jack Krupansky j...@basetechnology.com wrote: Try describing your own symptom in your own words - because his issue related to Solr 1.4. I mean, where exactly are you setting allowDuplicates=false?? And why do you think it has anything to do with adding documents to Solr? Solr 1.4 did not have atomic update, so sending the exact same document twice would not result in a change in the index (unless you had a date field with a value of NOW.) Copy field only uses values from the current document. -- Jack Krupansky -Original Message- From: Robert Krüger Sent: Wednesday, June 05, 2013 10:37 AM To: solr-user@lucene.apache.org Subject: copyField generates multiple values encountered for non multiValued field I have the exact same problem as the guy here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201105.mbox/%3C3A2B3E42FCAA4BF496AE625426C5C6E4@Wurstsemmel%3E AFAICS he did not get an answer. Is this a known issue? What can I do other than doing what copyField should do in my application? I am using solr 4.0.0. Thanks, Robert
Re: No files added to classloader from lib
Good call Jack. I totally missed that. I am curious how dataimport handler worked before – if I made a mistake in the specification and it did not get the jar. Anyway, it works now. Thanks again. O.O. apache-solr-dataimporthandler-.*\.jar - note that the apache- prefix has been removed from Solr jar files. -- Jack Krupansky -- View this message in context: http://lucene.472066.n3.nabble.com/No-files-added-to-classloader-from-lib-tp4068374p4068421.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
Thanks for the hints. I am not sure how to solve this issue. I previously made a typo, there are definetly two different tables. Here is my real configuration: http://pastebin.com/JUDzaMk0 For testing purposes I added LIMIT 10 to the SQL-statements because my tables are very huge and tests would take too long (about 5gb, 6.5million rows). I included my whole data-config and what I have changed from the default schema.xml. I don't know how to solve the all ids have to be unique-problem. I can not believe that Solr does not offer any solution at all to handle multiple data sources with their own individual ids. Maybe its possible to have solr create its own ids while importing the data? Actually there is no direct relation between my name-table and my title-table. All I want is to be able to do fast text-search in those two tables in order to find the belonging ids of these entries. Let me know if you need more information. Thank you! Am 05.06.2013 20:54, schrieb Gora Mohanty: On 6 June 2013 00:09, Stavros Delisavas stav...@delisavas.de wrote: Thanks so far. This change makes Solr work over the title-entries too, yay! Unfortunatly they don't get processed(skipped rows). In my log it says missing required field id for every entry. I checked my schema.xml. In there id is not set as a required field. removing the uniquekey-property also leads to no improvement. [...] There are several things wrong with your problem statement. You say that you have two tables, but both SELECTs seem to use the same table. I am going to assume that you really have two different tables. Unless you have changed the default schema.xml, id should be defined as the uniqueKey for the document. You probably do not want to remove that, and even if you just remove the uniqueKey property, the field id remains defined as a required field. The issue is with with your SELECT for the second entity: entity name=title query=SELECT id AS titleid, title FROM name/entity This renames id to titleid, and hence the required field id in schema.xml is missing. While you do need something like: document entity name=name query=SELECT id, name FROM name1/entity entity name=title query=SELECT id, title FROM name2/entity /document However, you will need to ensure that the ids are unique in the two tables, else entries from the second entity will overwrite matching ids from the first. Also, do you have field definitions within the entities? Please share the complete schema.xml and the DIH configuration file with us, rather than snippets: Use pastebin.com if they are large. Regards, Gora
Entire query is stopwords
Hi, I am using the standard edismax parser and my example query is as follows: {!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'} In this case, 'object' happens to be a stopword in the StopWordsFilter in my datatype 'object_description'. Now, since 'object' is not indexed at all, the query does not return any results. In an ideal case, I would want documents containing the term 'object' to be returned. What is the best practice to achieve this? Index stop-words and re-query with 'stopwords=false'. Or can this be done without re-querying? Thanks, Vardhan
Re: Solr 4.3 with Internationalization.
Check out this http://stackoverflow.com/questions/5549880/using-solr-for-indexing-multiple-languages http://wiki.apache.org/solr/LanguageAnalysis#French French stop words file (sample): http://trac.foswiki.org/browser/trunk/SolrPlugin/solr/multicore/conf/stopwords-fr.txt Solr includes three stemmers for French: one via solr.SnowballPorterFilterFactory, an alternative stemmer Solr3.1 via solr.FrenchLightStemFilterFactory, and an even less aggressive approach Solr3.1 via solr.FrenchMinimalStemFilterFactory. Solr can also removing elisions via solr.ElisionFilterFactory, and Lucene includes an example stopword list. ... filter class=solr.LowerCaseFilterFactory/ filter class=solr.ElisionFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French / ... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-with-Internationalization-tp4068368p4068426.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem with zkcli.sh linkconfig
On 6/5/2013 10:05 AM, Mark Miller wrote: Sounds like a bug - we probably don't have a test that updates a link - if you can make a JIRA issue, I'll be happy to look into it soon. I will go ahead and create an issue so that a test can be built, but I have some more info: It works perfectly when running the script from the 4.3.1 example, and from the 4.2.1 example. I am using slf4j 1.7.2 and log4j 1.4.17 in my production 4.2.1 lib/ext. That is the only difference I can think of at the moment. Thanks, Shawn
Solrj Stats encoding problem
Hi, I've tested a query using solr admin web interface and it works fine. But when I'm trying to execute the same search using solrj, it doesn't include Stats information. I've figured out that it's because my query is encoded. Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType The query in java is like q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType If I copy this query to browser address bar, it doesn't work, but it does if I replace encoded := with original values. What should I do do make it work through java? The code is like the following: SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryBuilder.toString()); QueryResponse query = getSolrServer().query(solrQuery); -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string
Please excuse my misunderstanding, but I always wonder why this index time processing is suggested usually. from my POV is the case for query-time processing i.e. PrefixQuery aka wildcard query Jason* . Ultra-fast term retrieval also provided by TermsComponent. On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky j...@basetechnology.comwrote: ngrams? See: http://lucene.apache.org/core/**4_3_0/analyzers-common/org/** apache/lucene/analysis/ngram/**NGramFilterFactory.htmlhttp://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html -- Jack Krupansky -Original Message- From: Prathik Puthran Sent: Wednesday, June 05, 2013 11:59 AM To: solr-user@lucene.apache.org Subject: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string Hi, Is it possible to configure solr to suggest the indexed string for all the searches of the substring of the string? Thanks, Prathik -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solrj Stats encoding problem
Sounds like the Solr Admin UI is too-aggressively encoding the query part of the URL for display. Each query parameter value needs to be encoded, not the entire URL query string as a whole. -- Jack Krupansky -Original Message- From: ethereal Sent: Wednesday, June 05, 2013 4:11 PM To: solr-user@lucene.apache.org Subject: Solrj Stats encoding problem Hi, I've tested a query using solr admin web interface and it works fine. But when I'm trying to execute the same search using solrj, it doesn't include Stats information. I've figured out that it's because my query is encoded. Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType The query in java is like q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType If I copy this query to browser address bar, it doesn't work, but it does if I replace encoded := with original values. What should I do do make it work through java? The code is like the following: SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryBuilder.toString()); QueryResponse query = getSolrServer().query(solrQuery); -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Heap space problem with mlt query
To add some numbers to adityab's comment. Each entry in your filter cache will probably consist of maxDocs/8 bytes plus some overhead. Or about 16G. This will only grow as you fire queries at Solr, so it's no surprise you're running out of memory as you process queries. Your documentCache is probably also a problem, although I'm extrapolating based on an 80G index with only 1M docs. The result cache is also very big, but it's usually much smaller. Still, I'd set it back to the defaults. Why did you change these from the defaults? The very first thing I'd do is change them back. Your autowarm counts are also a problem at 2,048. Again, take the filterCache. It's essentially a map where each entry's key is the fq clause and the value is the set of documents that match the query, often stored as a bit set (thus the maxDocs/8 above). Whenever a new searcher is opened in your setup, the most recent 2,048 fq clauses will be re-executed. Which should really kill your searcher open times. Try something reasonable like 16-32. These are caches that are intended to age out the oldest entries, not hold all the entries you ever send at Solr. Best Erick On Wed, Jun 5, 2013 at 9:35 AM, adityab aditya_ba...@yahoo.com wrote: Did you try reducing filter and query cache. They are fairly large too unless you really need them to be cached for your use cache. Do you have that many distinct filter queries hitting solr for the size you have defined for filterCache? Are you doing any sorting? as this will chew up a lot of memory because of lucene's internal field cache -- View this message in context: http://lucene.472066.n3.nabble.com/Heap-space-problem-with-mlt-query-tp4068278p4068326.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing Heavy dataset
Note that stored=true/false is irrelevant to the raw search time. What it _is_ relevant to is the time it takes to assemble the doc for return, if (and only if) you return that field. I claim your search time would be fast if you went ahead and stored the field, and specified an fl clause that did NOT contain the big field. Oh, and you'd have to have lazy field loading enabled too. FWIW, Erick On Wed, Jun 5, 2013 at 10:29 AM, Raheel Hasan raheelhasan@gmail.com wrote: some values in the field are up to a 1M as well On Wed, Jun 5, 2013 at 7:27 PM, Raheel Hasan raheelhasan@gmail.comwrote: ok thanks for the reply The field having values like 60kb each Furthermore, I have realized that the issue is with MySQL as its not processing this table when a where is applied Secondly, I have turned this field to *stored=false* and now the * select/* is fast working again On Wed, Jun 5, 2013 at 6:56 PM, Shawn Heisey s...@elyograg.org wrote: On 6/5/2013 3:08 AM, Raheel Hasan wrote: Hi, I am trying to index a heavy dataset with 1 particular field really too heavy... However, As I start, I get Memory warning and rollback (OutOfMemoryError). So, I have learned that we can use -Xmx1024m option with java command to start the solr and allocate more memory to the heap. My question is, that since this could also become insufficient later, so it the issue related to cacheing? here is my cache block in solrconfig: filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ I am thinking like maybe I need to turn of the cache for documentClass. Anyone got a better idea? Or perhaps there is another issue here? Exactly how big is this field? Do you need this giant field returned with your results, or is it just there for searching? Caches of size 512, especially with autowarm disabled, are probably not a major cause for concern, unless the big field is big enough so that 512 of them is really really huge. If that's the case, I would reduce the size of your documentCache, not turn it off. The value of ramBufferSizeMB elsewhere in your config is more likely to affect how much RAM gets used during indexing. The default for this field as of Solr 4.1.0 is 100. Most people can reduce this value. I'm writing a reply to another thread where you are participating, with info that will likely be useful for you too. Look for that. Thanks, Shawn -- Regards, Raheel Hasan -- Regards, Raheel Hasan
Re: Solrj Stats encoding problem
: I've tested a query using solr admin web interface and it works fine. : But when I'm trying to execute the same search using solrj, it doesn't : include Stats information. : I've figured out that it's because my query is encoded. I don't think you are understading how to use SolrJ andthe SolrQuery object : Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO : 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType : The query in java is like : q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType ... : SolrQuery solrQuery = new SolrQuery(); : solrQuery.setQuery(queryBuilder.toString()); : QueryResponse query = getSolrServer().query(solrQuery); it looks like you are passing the setQuery method an entire URL encoded set of params from a request you made in your browser. the setQuery method is syntactic sugar for for specifying just the q param containing the query string, and it should not alreayd be escaped (ie: eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]). Other methods exist on the SolrQuery object to provide syntactic sugar for other things (ie: specifying facet fields, enabling highlighting, etc...) If you want to provide a list of params using explicit names (q, stats, stats,field, etc...) you can ignore the helper methods on SolrQuery and just direct use the low level methods it inherits from ModifibleSolrParams like setParam ... SolrQuery query = new SolrQuery(); query.setParam(q, eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]); query.setParam(stats, true); query.setParam(stats.field, numberOfBytes,eventType); QueryResponse response = getSolrServer().query(query); -Hoss
Re: data-import problem
My usual admonishment is that Solr isn't a database, and when you try to use it like one you're just _asking_ for problems. That said Consider two options: 1 use a different core for each table. 2 in schema.xml, remove the id field (required=true _might_ be specified) remove the uniqueKey definition in schema.xml You'll have to re-index of course. But do not, that while Solr does not _require_ a uniqueKey definition, almost all solr installations have one. Best Erick On Wed, Jun 5, 2013 at 3:19 PM, Stavros Delisavas stav...@delisavas.de wrote: Thanks for the hints. I am not sure how to solve this issue. I previously made a typo, there are definetly two different tables. Here is my real configuration: http://pastebin.com/JUDzaMk0 For testing purposes I added LIMIT 10 to the SQL-statements because my tables are very huge and tests would take too long (about 5gb, 6.5million rows). I included my whole data-config and what I have changed from the default schema.xml. I don't know how to solve the all ids have to be unique-problem. I can not believe that Solr does not offer any solution at all to handle multiple data sources with their own individual ids. Maybe its possible to have solr create its own ids while importing the data? Actually there is no direct relation between my name-table and my title-table. All I want is to be able to do fast text-search in those two tables in order to find the belonging ids of these entries. Let me know if you need more information. Thank you! Am 05.06.2013 20:54, schrieb Gora Mohanty: On 6 June 2013 00:09, Stavros Delisavas stav...@delisavas.de wrote: Thanks so far. This change makes Solr work over the title-entries too, yay! Unfortunatly they don't get processed(skipped rows). In my log it says missing required field id for every entry. I checked my schema.xml. In there id is not set as a required field. removing the uniquekey-property also leads to no improvement. [...] There are several things wrong with your problem statement. You say that you have two tables, but both SELECTs seem to use the same table. I am going to assume that you really have two different tables. Unless you have changed the default schema.xml, id should be defined as the uniqueKey for the document. You probably do not want to remove that, and even if you just remove the uniqueKey property, the field id remains defined as a required field. The issue is with with your SELECT for the second entity: entity name=title query=SELECT id AS titleid, title FROM name/entity This renames id to titleid, and hence the required field id in schema.xml is missing. While you do need something like: document entity name=name query=SELECT id, name FROM name1/entity entity name=title query=SELECT id, title FROM name2/entity /document However, you will need to ensure that the ids are unique in the two tables, else entries from the second entity will overwrite matching ids from the first. Also, do you have field definitions within the entities? Please share the complete schema.xml and the DIH configuration file with us, rather than snippets: Use pastebin.com if they are large. Regards, Gora
Re: Entire query is stopwords
Your problem statement is fairly odd. You say you've defined object as a stopword, but then you want your query to return documents that contain object. By definition stopwords are something that is considered irrelevant for searching and are ignored. So why not just take object out of your stopwords file? Perhaps a separate stopwords file for that particular field? Or just not use stopwords at all for that field? Best Erick On Wed, Jun 5, 2013 at 3:36 PM, Vardhan Dharnidharka vardhan1...@hotmail.com wrote: Hi, I am using the standard edismax parser and my example query is as follows: {!edismax qf='object_description ' rows=10 start=0 mm=-40% v='object'} In this case, 'object' happens to be a stopword in the StopWordsFilter in my datatype 'object_description'. Now, since 'object' is not indexed at all, the query does not return any results. In an ideal case, I would want documents containing the term 'object' to be returned. What is the best practice to achieve this? Index stop-words and re-query with 'stopwords=false'. Or can this be done without re-querying? Thanks, Vardhan
search for docs where location not present
I have a location-type field in my schema where I store lat / lon of a document when this data is available. In around half of my documents this info is not available and I just don't store anything. I am trying to find the documents where the location is not set but nothing is working. I tried q=location_field:* and get back no results I tried q=-location_field:[* TO *] but got back an error I even tried something like: q=*:*fq={!geofilt sfield=location_field}pt=34.02093,-118.210755d=25000 (distance set to a very large number) but it returned fields even if they had no location_field set. Can anyone think of a way to do this? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: data-import problem
A Solr index does not need a unique key, but almost all indexes use one. http://wiki.apache.org/solr/UniqueKey Try the below query passing id as id instead of titleid.. document entity name=title query=SELECT id, title FROM name/entity /document A proper dataimport config will look like, entity name=relationship_entity query=select id,name,value from table field column=id name=idSchemaFieldName/ field column=name name=nameSchemaFieldName/ field column=value name=valueSchemaFieldName / /entity -- View this message in context: http://lucene.472066.n3.nabble.com/data-import-problem-tp4068345p4068447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj Stats encoding problem
On 6/5/2013 2:11 PM, ethereal wrote: Hi, I've tested a query using solr admin web interface and it works fine. But when I'm trying to execute the same search using solrj, it doesn't include Stats information. I've figured out that it's because my query is encoded. Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType The query in java is like q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType If I copy this query to browser address bar, it doesn't work, but it does if I replace encoded := with original values. What should I do do make it work through java? The code is like the following: SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryBuilder.toString()); QueryResponse query = getSolrServer().query(solrQuery); The only QueryBuilder objects I can find are in the Lucene API, so I have no idea what that part of your code is doing. Here's how I would duplicate the query you reference in SolrJ. The query string is broken apart so that the lines won't wrap awkwardly: String url = http://localhost:8983/solr/collection1;; SolrServer server = new HttpSolrServer(url); String qs = eventTimestamp: + [2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]; SolrQuery query = new SolrQuery(); query.setQuery(qs); query.set(stats, true); query.set(stats.field, numberOfBytes); query.set(stats.facet, eventType); QueryResponse rsp = server.query(query); Thanks, Shawn
Re: facet.missing=true returns null records with zero count also
: filter again with the same facet. Also, when a facet has only one value, it : doesn't make sense to show it to the user, since searching with that facet : is just going to give the same result set again. So when facet.missing does : not work with facet.mincount, it is a bit of a hassle for us Will work : on handling it in our program.Thank you for the clarification yeah .. i totally unerstand where you are coming from, i'm just not certain that it's clear cut that we should change the current behavior since: 1) it's trivial to work arround client side; b) some other users might be depending on the current behavior and think that conceptually it doesn't make sense for facet.missing to consider facet.mincount. i should have said before but: feel free to open an issue baout this and propose a patch, i'm just not sure it's a slam dunk unless we make an easy way to configure it to continue working the current way as well. -Hoss
Re: search for docs where location not present
select?q=*-location_field:** worked for me -- View this message in context: http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444p4068452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search for docs where location not present
Either have your update client explicitly set a boolean field that indicates whether location is present, or use an update processor to set an explicit boolean field that means no location present: updateRequestProcessorChain name=location-present processor class=solr.CloneFieldUpdateProcessorFactory str name=sourcelocation_field/str str name=desthas_location_b/str /processor processor class=solr.RegexReplaceProcessorFactory str name=fieldNamehas_location_b/str str name=pattern[^\s]+/str str name=replacementtrue/str /processor processor class=solr.DefaultValueUpdateProcessorFactory str name=fieldNamehas_location_b/str bool name=valuefalse/bool /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain -- Jack Krupansky -Original Message- From: kevinlieb Sent: Wednesday, June 05, 2013 5:43 PM To: solr-user@lucene.apache.org Subject: search for docs where location not present I have a location-type field in my schema where I store lat / lon of a document when this data is available. In around half of my documents this info is not available and I just don't store anything. I am trying to find the documents where the location is not set but nothing is working. I tried q=location_field:* and get back no results I tried q=-location_field:[* TO *] but got back an error I even tried something like: q=*:*fq={!geofilt sfield=location_field}pt=34.02093,-118.210755d=25000 (distance set to a very large number) but it returned fields even if they had no location_field set. Can anyone think of a way to do this? Thanks in advance! -- View this message in context: http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: copyField generates multiple values encountered for non multiValued field
: I updated the Index using SolrJ and got the exact same error message there aren't a lot of specifics provided in this thread, so this may not be applicable, but if you mean you actaully using the atomic updates feature to update an existing document then the problem is that you still have the existing value in your name2 field, as well as another copy of the name field evaluated by copyField after the updates are applied... http://wiki.apache.org/solr/Atomic_Updates#Stored_Values -Hoss
Re: Indexing Heavy dataset
: Furthermore, I have realized that the issue is with MySQL as its not : processing this table when a where is applied http://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F -Hoss
Re: search for docs where location not present
Thanks for the replies. I found that -location_field:* returns documents that both have and don't have the field set. I should clarify that I am using Solr 3.4 the location type is set to solr.LatLonType Although I could add a boolean field that is true if location is set I'd rather not have redundant data in the db (harkens back to my normalize sql type days) -- View this message in context: http://lucene.472066.n3.nabble.com/search-for-docs-where-location-not-present-tp4068444p4068459.html Sent from the Solr - User mailing list archive at Nabble.com.