Default query operator OR wont work in some cases
Hi, I have some documents with keywords egg and some with salad and some with egg salad. When I search for egg salad, I expect to see egg results + salad results. I dont see them. egg and salad queries individually work fine. I am using whitespacetokenizer. Not sure if I am missing something. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: custom names for replicas in solrcloud
Is coreNodeName exposed via collections api? -- View this message in context: http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205p4086628.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Default query operator OR wont work in some cases
here is keywords field for 3 docs, Simply Asia products,Simply Asia,Sesame Chicken Egg Drop Soup,Soy Ginger Shrimp and Noodle Salad,Sesame Teriyaki Noodle Bowl Eggs,AllWhites,Better'n Eggs,Foods,AllWhites or Better'n Eggs DOLE Salad Blend Salad Kit,Salad Kit,Salad,DOLE,produce Here is my debug query: str name=parsedquery(+((DisjunctionMaxQuery((keywords:egg^2.0)~0.1) DisjunctionMaxQuery((keywords:salad^2.0)~0.1))~2) DisjunctionMaxQuery((keywords:egg salad)~0.1) /no_coord/str Here is my fieldtype definition for keywords, fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=word-delim-types.txt / filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 types=word-delim-types.txt / filter class=solr.EnglishMinimalStemFilterFactory/ /analyzer /fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086723.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Default query operator OR wont work in some cases
I am not searching for phrase query, I am not sure why it shows up in parsedquery. lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=debugQuerytrue/str str name=indenttrue/str str name=qegg salad /str str name=_1377569284170/str str name=wtxml/str /lst /lst -- View this message in context: http://lucene.472066.n3.nabble.com/Default-query-operator-OR-wont-work-in-some-cases-tp4086624p4086732.html Sent from the Solr - User mailing list archive at Nabble.com.
custom names for replicas in solrcloud
Hi, I am using Solr 4.3 with 3 solr hosta and with an external zookeeper ensemble of 3 servers. And just 1 shard currently. When I create collections using collections api it creates collections with names, collection1_shard1_replica1, collection1_shard1_replica2, collection1_shard1_replica3. Is there any way to pass a custom name? or can I have all the replicas with same name? Any pointers will be much appreciated. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/custom-names-for-replicas-in-solrcloud-tp4086205.html Sent from the Solr - User mailing list archive at Nabble.com.
entity classification solr
I have the following situation when using Solr 4.3. My document contains entities for example peanut butter. I have a list of such entities. These are items that go together and are not to be treated as two individual words. During indexing, I want solr to realize this and treat peanut butter as an entity. For example if someone searches for peanut then documents that have the word peanut should rank higher than documents that have the word peanut butter. However if someone searches for peanut butter then the document that has peanut butter should show up higher than ones that have just peanut. Is there a config setting somewhere which can be modified such that the entity list can be specified in a file and Solr would do the needful? Should I be using KeepWordFilterFactory for this? Any pointers will be much appreciated. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/entity-classification-solr-tp4082923.html Sent from the Solr - User mailing list archive at Nabble.com.
no servers hosting shard
I have setup solr cloud and when I try to access documents I get this error, lst name=errorstr name=msgno servers hosting shard: /strint name=code503/int/lst However if I add shards=shard1 param it works. -- View this message in context: http://lucene.472066.n3.nabble.com/no-servers-hosting-shard-tp4081783.html Sent from the Solr - User mailing list archive at Nabble.com.
debian package for solr with jetty
Hi, I am trying to create a debian package for solr 4.3 (default installation with jetty). Is there anything already available? Also, I need 3 different cores so plan to create corresponding packages for each of them to create solr core using admin/cores or collections api. I also want to use, solrcloud setup with external zookeeper ensemble, whats the best way to create a debian package for updating zookeeper config files as well? Please suggest. Any pointers will be helpful. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/debian-package-for-solr-with-jetty-tp4081784.html Sent from the Solr - User mailing list archive at Nabble.com.
Use same spell check dictionary across different collections
I have 2 collections, lets say coll1 and coll2. I configured solr.DirectSolrSpellChecker in coll1 solrconfig.xml and works fine. Now, I want to configure coll2 solrconfig.xml to use SAME spell check dictionary index created above. (I do not want coll2 prepare its own dictionary index but just do spell check against the coll1 Spell dictionary index) Is it possible to do it? Tried out with IndexBasedSpellChecker but could not get it working. Any suggestions? Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Use-same-spell-check-dictionary-across-different-collections-tp4079566.html Sent from the Solr - User mailing list archive at Nabble.com.
spellcheck and search in a same solr request
Hey, Is there a way to do spellcheck and search (using suggestions returned from spellcheck) in a single Solr request? I am seeing that if my query is spelled correctly, i get results but if misspelled, I just get suggestions. Any pointers will be very helpful. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/spellcheck-and-search-in-a-same-solr-request-tp4079571.html Sent from the Solr - User mailing list archive at Nabble.com.
Exception when using File based and Index based SpellChecker
I am trying to use Filebased and index based spell checker and getting this exception All checkers need to use the same StringDistance. They work fine as expected individually but not together. Any pointers? -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-when-using-File-based-and-Index-based-SpellChecker-tp4078773.html Sent from the Solr - User mailing list archive at Nabble.com.
Spellcheck questions
Exploring various SpellCheckers in solr and have a few questions, 1. Which algorithm is used for generating suggestions when using IndexBasedSpellChecker. I know its Levenshtein (with edit distance=2 - default) in DirectSolrSpellChecker. 2. If i have 2 indices, can I setup multiple IndexBasedSpellCheckers to point to different spellcheck dictionaries to generate suggestions from both. 3. Can I use IndexBasedSpellChecker and FileBasedSpellChecker together? I tried doing it and ran into an exception All checkers need to use the same StringDistance. Any help will be much apprecited. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-questions-tp4078985.html Sent from the Solr - User mailing list archive at Nabble.com.
Switching to using SolrCloud with tomcat7 and embedded zookeeper
I am running solr 4.3 with tomcat 7 (with non SolrCloud) and have 4 solr cores running. Switching to start using SolrCloud with tomcat7 and embedded zookeeper I updated JAVA_OPTS in this file tomcat7/bin/setenv.sh to following, JAVA_OPTS=-Djava.awt.headless=true -Xms2048m -Xmx4096m -XX:+UseConcMarkSweepGC -Dbootstrap_confdir=/path/solr/collection1/conf/ -Dcollection.configName=collection1_conf -DnumShards=1 -DzkRun Now, all my cores (collections) are set to the default collection1 schema. Not sure why? solr.xml is set to correct instanceDir settings. Any pointers? Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Switching-to-using-SolrCloud-with-tomcat7-and-embedded-zookeeper-tp4078524.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Switching to using SolrCloud with tomcat7 and embedded zookeeper
Originally i was running a single solr 4.3 instance with 4 cores ... and now starting to learn about solrCloud and thought I could setup number of shards=1 (since its a single instance) and same 4 cores can be converted to 4 collections on the same single shard same single instance. How do I define each -Dcollection.configName as a part of setenv.sh? should I point my -Dbootstrap_confdir to parent directory instead? -Manasi Daniel Collins wrote You've specified bootstrap_confdir and the same collection.configName on all your cores, so as each of them start, each will be uploading its own configuration to the collection1_conf area of ZK, so they will all be overwriting each other. Are your 4 cores replicas of the same collection or are they distinct collections? If the latter, then why put them all in SolrCloud (there really isn't any benefit?) but assuming you wanted to do it for expansion reasons (to add replicas later on), then each one will need to have a distinct collection.configName parameter, so that ZK knows to keep the configs separate. On 17 July 2013 07:44, smanad lt; smanad@ gt; wrote: I am running solr 4.3 with tomcat 7 (with non SolrCloud) and have 4 solr cores running. Switching to start using SolrCloud with tomcat7 and embedded zookeeper I updated JAVA_OPTS in this file tomcat7/bin/setenv.sh to following, JAVA_OPTS=-Djava.awt.headless=true -Xms2048m -Xmx4096m -XX:+UseConcMarkSweepGC -Dbootstrap_confdir=/ path /solr/collection1/conf/ -Dcollection.configName=collection1_conf -DnumShards=1 -DzkRun Now, all my cores (collections) are set to the default collection1 schema. Not sure why? solr.xml is set to correct instanceDir settings. Any pointers? Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Switching-to-using-SolrCloud-with-tomcat7-and-embedded-zookeeper-tp4078524.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Switching-to-using-SolrCloud-with-tomcat7-and-embedded-zookeeper-tp4078524p4078538.html Sent from the Solr - User mailing list archive at Nabble.com.
select in clause in solr
I am using solr 4.3 and have 2 collections coll1, coll2. After searching in coll1 I get field1 values which is a comma separated list of strings like, val1, val2, val3,... valN. How can I use that list to match field2 in coll2 with those values separated by an OR clause. So i want to return all documents in coll2 with field2=val1 or field2=val2 or field2=val3 ... or field2=valN In short looking for select in type clause in solr. Any pointers will be much appreciated. -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/select-in-clause-in-solr-tp4078255.html Sent from the Solr - User mailing list archive at Nabble.com.
Normalizing/Returning solr scores between 0 to 1
Hi, We have a need where we would want normalized scores from score ranging between 0 to 1 rather than a free range. I read about it @ http://wiki.apache.org/lucene-java/ScoresAsPercentages and seems like thats not something that is recommended. However, is there still a way to set some config in solrconfig to make sure scores are always between 0 to 1? OR i will have to implement that logic in my code after I get the results from Solr. Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Normalizing-Returning-solr-scores-between-0-to-1-tp4073797.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update solr.xml dynamically to add new cores
Gr8! thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800p4072190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update solr.xml dynamically to add new cores
Thanks Michael, both the reasons make sense. Currently I am not planning on using SolrCloud so as you suggested if I can use http://wiki.apache.org/solr/CoreAdmin api. While doing that did you mean running a curl command similar to this, http://localhost:8983/solr/admin/cores?action=CREATEname=coreXinstanceDir=path_to_instance_directoryconfig=config_file_name.xmlschema=schem_file_name.xmldataDir=data as a part of 'postinst' script? or running it manually on the host after the index package is installed? ( I would love to do it as a part of pkg installation.) Also, there will be two cases here, if I am installing a new index package in that case create will work however, if I am updating a package with some tweaks to configs and schema then I need to check status to see if core is available and if yes, use reload else create. Does this make sense? Michael Della Bitta-2 wrote Hi, I wouldn't edit solr.xml directly for two reasons. One being that an already running Solr installation won't update with changes to that file, and might actually overwrite the changes that you make to it. And two, it's going away in a future release of Solr. Instead, I'd make the package that installed the Solr webapp and brought it up as you described, and have your independent index packages use either the CoreAdmin API or Collection API to create the indexes, depending on whether you're using Solr Cloud or not: http://wiki.apache.org/solr/CoreAdmin https://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions lt;https://twitter.com/Appinionsgt; | g+: plus.google.com/appinions w: appinions.com lt;http://www.appinions.com/gt; On Wed, Jun 19, 2013 at 8:27 PM, smanad lt; smanad@ gt; wrote: Hi, Is there a way to edit solr.xml as a part of debian package installation to add new cores. In my use case, there 4 solr indexes and they are managed/configured by different teams. The way I am thinking packages will work is as described below, 1. There will be a solr-base debian package which comes with solr installtion with tomcat setup (I am planning to use solr 4.3) 2. There will be individual index debian packages like, solr-index1, solr-index2 which will be dependent on solr-base. Each package's DEBIAN postinst script will have a logic to edit solr.xml to add new index like index1, index2, etc. Does this sound good? or is there a better/different way to do this? Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800p4071970.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Partial update using solr 4.3 with csv input
Thanks for confirming. So if my input is a csv file, I will need a script to read the delta changes one by one, convert it to json and then use 'update' handler with that piece of json data. Makes sense? Jack Krupansky-2 wrote Correct, no atomic update for CSV format. There just isn't any place to put the atomic update options in such a simple text format. -- Jack Krupansky -Original Message- From: smanad Sent: Wednesday, June 19, 2013 8:30 PM To: solr-user@.apache Subject: Partial update using solr 4.3 with csv input I was going through this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of the comments is about support for csv. Since the comment is almost a year old, just wondering if this is still true that, partial updates are possible only with xml and json input? Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801p4071972.html Sent from the Solr - User mailing list archive at Nabble.com.
update solr.xml dynamically to add new cores
Hi, Is there a way to edit solr.xml as a part of debian package installation to add new cores. In my use case, there 4 solr indexes and they are managed/configured by different teams. The way I am thinking packages will work is as described below, 1. There will be a solr-base debian package which comes with solr installtion with tomcat setup (I am planning to use solr 4.3) 2. There will be individual index debian packages like, solr-index1, solr-index2 which will be dependent on solr-base. Each package's DEBIAN postinst script will have a logic to edit solr.xml to add new index like index1, index2, etc. Does this sound good? or is there a better/different way to do this? Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/update-solr-xml-dynamically-to-add-new-cores-tp4071800.html Sent from the Solr - User mailing list archive at Nabble.com.
Partial update using solr 4.3 with csv input
I was going through this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ and one of the comments is about support for csv. Since the comment is almost a year old, just wondering if this is still true that, partial updates are possible only with xml and json input? Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-update-using-solr-4-3-with-csv-input-tp4071801.html Sent from the Solr - User mailing list archive at Nabble.com.
Need help with search in multiple indexes
Hi, I am thinking of using Solr to implement Search on our site. Here is my use case, 1. We will have multiple 4-5 indexes based on different data types/structures and data will be indexed into these by several processes, like cron, on demand, thru message queue applications, etc. 2. A single web service needs to search across all these indexes and return results. I am thinking of using Solr 4.2.1 or may be 4.3 with single instance - multicore setup. I read about distributed search and I believe I should be able to search across multiple indices using shards parameters. However in my case, all shards will be on same host/port but with different core name. Is my understanding correct? Or is there any better alternative to this approach? Please suggest. Thanks, -Manasi -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
Thanks for the reply Michael. In some cases schema is similar but not all of them. So lets go with assumption schema NOT being similar. I am not quite sure what you mean by you're probably stuck coordinating the results externally. Do you mean, searching in each index and then somehow merge results manually? will I still be able to use shards parameters? or no? Also, I was planning to use php library SolrClient. Do you see any downside? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070049.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
Is this a limitation of solr/lucene, should I be considering using other option like using Elasticsearch (which is also based on lucene)? But I am sure search in multiple indexes is kind of a common problem. Also, i as reading this post http://stackoverflow.com/questions/2139030/search-multiple-solr-cores-and-return-one-result-set in one of the comments it says, So if I have Core0 with fields documentId,fieldA,fieldB and Core1 with fields documentId,fieldC,fieldD. Then I create another core, lets say Core3 with fields documentId,fieldA,fieldB,fieldC,fieldD. I will never be importing data into this core? And then create a query handler, that includes the shard parameter. So when I query Core3, it will never really contain indexed data, but because of the shard searching it will fetch the results from the other to cores, and present it on the 3rd core? Thanks for the help! Is that what I should be doing? So all the indexing still happens in separate cores but searching happens in a one single core? -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help with search in multiple indexes
In my case, different teams will be updating indexes at different intervals so having separate cores gives more control. However, I can still update(add/edit/delete) data with conditions like check for doc type. Its just that, using shards sounds much cleaner and readable. However, I am not yet sure if there might be any performance issues. -- View this message in context: http://lucene.472066.n3.nabble.com/Need-help-with-search-in-multiple-indexes-tp4070040p4070061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Scheduling DataImports
Thanks for the reply. Regarding second question, actually thats what I am looking for. My use case is, my DIH runs for 2 httpdatasources, api1 and api2 with different ttls returned. I was thinking of saving this in a file something like, url:api1, timestamp:100, expires: 60 url:api2, timestamp:101, expires: 30 Then, a cron job will run every min to see what entries are expiring in next 30 secs? entry#2 will be expiring so it will re-index that entry by running DIH curl command for that entity. Is there a better of scheduling DIH imports automatically? I read about NRT, is that related to this problem at all? Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435p4065873.html Sent from the Solr - User mailing list archive at Nabble.com.
Scheduling DataImports
Hi, I am new to Solr and recently started exploring it for search/sort needs in our webapp. I have couple of questions as below, (I am using solr 4.2.1 with default core named collection1) 1. We have a use case where we would like to index data every 10 mins (avg). Whats the best way to schedule data import every 10 mins or so? cron job? 2. Also, We are indexing data returned from an api which returns different cache ttls. How can I re-index after ttl its expired? some process which polls for the expiring soon entries and issues data-import command? Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html Sent from the Solr - User mailing list archive at Nabble.com.