DIH deleting documents
I am adding documents with data import handler from a mysql database. I create a unique id for each document by concatenating a couple of fields in the database. Every id is unique. After the import, over half the documents which were imported are deleted again, leaving me with less then half the documents in the database ending up in the Solr index. Is there a way to get a list of the deleted documents, so that I can start troubleshooting what went wrong? thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slaves always replicate entire index Index versions
A few others have posted about this too apparently and SOLR-4413 is the root problem. Basically what I am seeing is that if your index directory is not index/ but rather index.timestamp set in the index.properties a new index will be downloaded all the time because the download is expecting your index to be in solr_data_dir/index. Sounds like a quick solution might be to rename your index directory to just index and see if the problem goes away. To confirm, look at line 728 in the SnapPuller.java file (in downloadIndexFiles) I am hoping that the patch and a more unified getIndexDir can be added to the next release of Solr as this is a fairly significant bug to me. Cheers Amit On Thu, Feb 21, 2013 at 12:56 AM, Amit Nithian anith...@gmail.com wrote: So the diff in generation numbers are due to the commits I believe that Solr does when it has the new index files but the fact that it's downloading a new index each time is baffling and I just noticed that too (hit the replicate button and noticed a full index download). I'm going to pop in to the source and see what's going on to see why unless there's a known bug filed about this? On Tue, Feb 19, 2013 at 1:48 AM, Raúl Grande Durán raulgrand...@hotmail.com wrote: Hello. We have recently updated our Solr from 3.5 to 4.1 and everything is running perfect except the replication between nodes. We have a master-repeater-2slaves architecture and we have seen some things that weren't happening before: When a Slave (repeater or slaves) starts to replicate it needs to download the entire index. Even when some little changes has been made to the index at master. This takes such a long time since our index is more than 20 Gb.After replication cycle we have different index generations in master, repeater and slaves. For example:Master: gen. 64590Repeater: gen. 64591Both slaves: gen. 64592 My replicationHandler configuration is like this:requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str str name=replicateAfterstartup/str str name=confFilesschema.xml,stopwords.txt/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrl${solr.master.url:http://localhost/solr}/str str name=pollInterval00:03:00/str /lst /requestHandler Our problems are very similar to those explained here: http://lucene.472066.n3.nabble.com/Problem-with-replication-td2294313.html Any ideas?? Thanks
Re: Slaves always replicate entire index Index versions
Hi Amit, I have came across some JIRAs that may be useful in this issue: https://issues.apache.org/jira/browse/SOLR-4471 https://issues.apache.org/jira/browse/SOLR-4354 https://issues.apache.org/jira/browse/SOLR-4303 https://issues.apache.org/jira/browse/SOLR-4413 https://issues.apache.org/jira/browse/SOLR-2326 Please, let us know if you find any solution. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slaves always replicate entire index Index versions
Thanks for the links... I have updated SOLR-4471 with a proposed solution that I hope can be incorporated or amended so we can get a clean fix into the next version so our operations and network staff will be happier with not having gigs of data flying around the network :-) On Thu, Feb 21, 2013 at 1:24 AM, raulgrande83 raulgrand...@hotmail.comwrote: Hi Amit, I have came across some JIRAs that may be useful in this issue: https://issues.apache.org/jira/browse/SOLR-4471 https://issues.apache.org/jira/browse/SOLR-4354 https://issues.apache.org/jira/browse/SOLR-4303 https://issues.apache.org/jira/browse/SOLR-4413 https://issues.apache.org/jira/browse/SOLR-2326 Please, let us know if you find any solution. Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr UIMA
Hi Bart, I think the only way you can do that is by reindexing, or maybe by just doing a dummy atomic update [1] to each of the documents (e.g. adding or changing a field of type 'ignored' or something like that) that weren't tagged by UIMA before. Regards, Tommaso [1] : http://wiki.apache.org/solr/Atomic_Updates 2013/2/21 jazzsalsa jazzsa...@me.com Reposted because I did not arrive at the list (I didn't see it) On Feb 20, 2013, at 12:42 PM, jazz jazzsa...@me.com wrote: Hi, I managed to get Solr and UIMA work together. When I send a document to Solr it annotates the field contents and adds the result of the UIMA annotations to e.g. a field location. My question is: how do I annotate the contents of an already existing solr database without triggering an /update ? My UIMA processor defaults for an /update command. I was thinking about exporting the contents and re-importing it but that seems too complex using the DIH. Is there a smarter way? Regards Bart
Re: Slaves always replicate entire index Index versions
Thanks for the patch, we'll try to install these fixes and post if replication works or not. I renamed 'index.timestamp' folders to just 'index' but it didn't work. These lines appeared in the log: INFO: Master's generation: 64594 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's generation: 64593 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchFileList SEVERE: No files to download for index generation: 64594 -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041827.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: If we Open Source our platform, would it be interesting to you?
Hi Marcelo, Looked through your site and the framework looks very powerful as an aggregator. We do a lot of data aggregation from many different sources in many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main repository for eventual SOLR indexing. A 'one-stop-shop' for all this would be very appealing. Have you looked at products like Talend Jitterbit? These offer transformation from almost anything to almost anything using graphical interfaces (Jitterbit is better) and a PHP-like coding format for trickier work. If you (or somebody) could add a graphical interface, the world would beat a path to your door! Regards, DQ -Original Message- From: Marcelo Elias Del Valle [mailto:marc...@s1mbi0se.com.br] Sent: 20 February 2013 18:18 To: solr-user@lucene.apache.org Subject: If we Open Source our platform, would it be interesting to you? Hello All, I’m sending this email because I think it may be interesting for Solr users, as this project have a strong usage of Solr platform. We are strongly considering opening the source of our DMP (Data Management Platform), if it proves to be technically interesting to other developers / companies. More details: http://www.s1mbi0se.com/s1mbi0se_DMP.html All comments, questions and critics happening at HN: http://news.ycombinator.com/item?id=5251780 Please, feel free to send questions, comments and critics... We will try to reply them all. Regards, Marcelo
How to retrive all terms with their frequency in that website.
I have indexed data of 10 websites in solr. Now i want to dump data of each website with following format : [Terms,Frequency of terms in that website ,IDF] Can i do this with solr admin, or i need to write any script for that? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retrive-all-terms-with-their-frequency-in-that-website-tp4041848.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrive all terms with their frequency in that website.
Hi Look up the luke page in admin Solr .. /admin/luke?show=index That page show topTerms of terms, so I suppose is possible get frecuency all terms. El 21/02/2013 12:58, search engn dev escribió: I have indexed data of 10 websites in solr. Now i want to dump data of each website with following format : [Terms,Frequency of terms in that website ,IDF] Can i do this with solr admin, or i need to write any script for that? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retrive-all-terms-with-their-frequency-in-that-website-tp4041848.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to retrive all terms with their frequency in that website.
I guess the Term Vector Component might satisfy all or most of what you're trying to do: http://wiki.apache.org/solr/TermVectorComponent On 21.02.2013 12:58, search engn dev wrote: I have indexed data of 10 websites in solr. Now i want to dump data of each website with following format : [Terms,Frequency of terms in that website ,IDF] Can i do this with solr admin, or i need to write any script for that? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retrive-all-terms-with-their-frequency-in-that-website-tp4041848.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud vs. distributed suggester
In pre-cloud version of SOLR it was necessary to pass shards and shards.qt parameters in order to make /suggest handler work standalone. How should it work in SolrCloud? SpellCheckComponent skips the distributed stage of processing and thus I get suggestions only when I force distrib=false mode. Setting parameters like in previous releases doesn't work either. The only way that worked so far is forcing a 'query' component on the /suggest handler. Is there any other (better) way? Thanks, Alexey -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-distributed-suggester-tp4041859.html Sent from the Solr - User mailing list archive at Nabble.com.
multiple facet.prefix for the same facet.field VS multiple facet.query
There have been requests for supporting multiple facet.prefix for the same facet.field. There is an open JIRA with a patch: https://issues.apache.org/jira/browse/SOLR-1351 Wouldn't using multiple facet.query achieve the same result? I mean something like: facet.query=lastName:A*facet.query=lastName:B*facet.query=lastName:C* Bill
Re: multiple facet.prefix for the same facet.field VS multiple facet.query
Never mind. I just realized the difference between the two. Sorry for the noise. Bill On Thu, Feb 21, 2013 at 8:42 AM, Bill Au bill.w...@gmail.com wrote: There have been requests for supporting multiple facet.prefix for the same facet.field. There is an open JIRA with a patch: https://issues.apache.org/jira/browse/SOLR-1351 Wouldn't using multiple facet.query achieve the same result? I mean something like: facet.query=lastName:A*facet.query=lastName:B*facet.query=lastName:C* Bill
Re: DIH deleting documents
Thanks Gora, Sorry I might not have been sufficiently clear. I start with an empty index, then add documents. 9000 are added and 6000 immediately deleted again, leaving 3000. I assume this can only happen with duplicate IDs, but that should not be possible! So I wanted to get a list of deleted documents so that I could try and figure out why they were deleted immediately. thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041887.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud vs. distributed suggester
It's not really any different in SolrCloud as the pre-cloud - distrib search is still the same code done the same way by and large. shards.qt should be just as valid an option as forcing a query component. - Mark On Feb 21, 2013, at 7:56 AM, AlexeyK lex.kudi...@gmail.com wrote: In pre-cloud version of SOLR it was necessary to pass shards and shards.qt parameters in order to make /suggest handler work standalone. How should it work in SolrCloud? SpellCheckComponent skips the distributed stage of processing and thus I get suggestions only when I force distrib=false mode. Setting parameters like in previous releases doesn't work either. The only way that worked so far is forcing a 'query' component on the /suggest handler. Is there any other (better) way? Thanks, Alexey -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-vs-distributed-suggester-tp4041859.html Sent from the Solr - User mailing list archive at Nabble.com.
How to change the index dir in Solr 4.1
I am having 5 shards in one machine using the new one collection multiple cores method. I am trying to change the index directory, but if i hard code that in the SolrConfig.xml , the index dir does not change for other cores and each core tries to fight over it and ends up as a deadlock. Is there anyway to suffix the index directory with the shard replica name so that for each shard i will have a different index directory? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR4 SAN vs Local Disk?
Thanks Shawn for the Input, I could actually get RAID10's. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR4-SAN-vs-Local-Disk-tp4041299p4041895.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr splitting my words
Let me start out by saying that I am just learning Solr now. Solr is splitting a word and I am not sure why. The word is mcmurdo. If I do a search for McMurdo it picks it up. If I do a search for just murdo it will also pick it up. If I search for mcmurdo, I get nothing. womens-mcmurdo-ii-boots that is the data in the name field that is getting copied to the name_search field without the quotes. This is what we are feeding into solr The data is coming from a filed called name_search which is copied from a field called name. Below is the description for name_search in the schema_browser. Field Type: TEXT Properties: Indexed, Tokenized, Omit Norms Schema: Indexed, Tokenized, Omit Norms Index: (unstored field) Copied From: NAME Position Increment Gap: 100 Index Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: index_synonyms.txt expand: false ignoreCase: true } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Any help would be greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr splitting my words
Feed your data into the Analysis form to see the transformations taking place. Navigate to the Solr admin console, select your collection name on the left (e.g. collection1). Click on Analysis link. I suspect it's the WordDelimiterFilterFactory that is not doing what you expect, which you can fine-tune with the various attributes on that factory. Cheers, Tim On Thu, Feb 21, 2013 at 8:47 AM, scallawa dami...@altrec.com wrote: Let me start out by saying that I am just learning Solr now. Solr is splitting a word and I am not sure why. The word is mcmurdo. If I do a search for McMurdo it picks it up. If I do a search for just murdo it will also pick it up. If I search for mcmurdo, I get nothing. womens-mcmurdo-ii-boots that is the data in the name field that is getting copied to the name_search field without the quotes. This is what we are feeding into solr The data is coming from a filed called name_search which is copied from a field called name. Below is the description for name_search in the schema_browser. Field Type: TEXT Properties: Indexed, Tokenized, Omit Norms Schema: Indexed, Tokenized, Omit Norms Index: (unstored field) Copied From: NAME Position Increment Gap: 100 Index Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: index_synonyms.txt expand: false ignoreCase: true } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Any help would be greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr splitting my words
The word splitting is caused by splitOnCaseChange: 1. Change that 1 to 0 and completely reindex your data. -- Jack Krupansky -Original Message- From: scallawa Sent: Thursday, February 21, 2013 7:47 AM To: solr-user@lucene.apache.org Subject: Solr splitting my words Let me start out by saying that I am just learning Solr now. Solr is splitting a word and I am not sure why. The word is mcmurdo. If I do a search for McMurdo it picks it up. If I do a search for just murdo it will also pick it up. If I search for mcmurdo, I get nothing. womens-mcmurdo-ii-boots that is the data in the name field that is getting copied to the name_search field without the quotes. This is what we are feeding into solr The data is coming from a filed called name_search which is copied from a field called name. Below is the description for name_search in the schema_browser. Field Type: TEXT Properties: Indexed, Tokenized, Omit Norms Schema: Indexed, Tokenized, Omit Norms Index: (unstored field) Copied From: NAME Position Increment Gap: 100 Index Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: index_synonyms.txt expand: false ignoreCase: true } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain DETAILS Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Any help would be greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud as my primary data store
With Solr's atomic updates, optimistic locking, update log, openSearcher=false on commits, etc. you can definitely do this. Biggest question in my mind is whether you're willing to accept Solr's emphasis on consistency vs. write-availability? With a db like Cassandra, you can achieve better write-availability by giving up a little on the consistency side. With Solr, you don't have that choice - writes must succeed on the shard leader and replicas. With the tlog, Solr still does pretty good here. The other concern is how frequently (and how many) are you updating data in existing docs? Solr has to delete and re-index the entire doc after updating a single field. We abuse Solr with millions of atomic updates daily but it's not anywhere near as fast as you get with database updates. Lastly, have you seen Yonik's slides from Apache Eurocon - great read if not: http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55387447 Cheers, Tim On Wed, Feb 20, 2013 at 10:02 PM, jimtronic jimtro...@gmail.com wrote: Now that I've been running Solr Cloud for a couple months and gotten comfortable with it, I think it's time to revisit this subject. When I search for the topic of using Solr as a primary db online, I get lots of discussions from 2-3 years ago and usually they point out a lot of hurdles that have now largely been eliminated with the release of Solr Cloud. I've stopped using the standard method of writing to my db and pushing out periodically to solr. Instead, I'm writing simultaneously to solr and the db with less frequent syncs from the database just to be safe. I find this to be much faster and easier than doing delta imports via the DIH handler. In fact, it's gone so smoothly, I'm really wondering why I need to keep writing it to the db at all. I've always got several nodes running and launching new ones takes only minutes to be fully operational. I'm taking frequent snapshots and my test restores have been painless and quick. So, if I'm looking at other NoSQL solutions like MongoDB or Cassandra, why wouldn't I just use Solr? It's distributed, fast, and stable. It has a great http api and it's nearly schema-less using dynamic fields. And, most importantly, it offers the most powerful query language available. I'd really like to hear from someone who has made the leap. Cheers, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-as-my-primary-data-store-tp4041774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is their a way in which I can make spell suggestion dictionary build on specific fileds
Yes, each spellchecker (or dictionary) in your spellcheck search component has a field parameter to specify the field to be used to generate the dictionary index for that spellchecker: str name=fieldspell/str See the Solr example solrconfig.xml and search for lst name=spellchecker. Also see: http://wiki.apache.org/solr/SpellCheckComponent -- Jack Krupansky -Original Message- From: Rohan Thakur Sent: Thursday, February 21, 2013 2:34 AM To: solr-user@lucene.apache.org Subject: Is their a way in which I can make spell suggestion dictionary build on specific fileds hi all I wanted to know is their a way in which I have select on which indexed field I want to build the spell suggestions dictionary? thanks regards Rohan
Re: Document update question
Hi Jack, There was a bug for this fixed for 4.1 - which version are you on? I remember this b/c I was on 4.0 and had to upgrade for this exact reason. https://issues.apache.org/jira/browse/SOLR-4134 Tim On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote: From what I can read about partial updates, it will only work for singleton fields where you can set them to something else, or multi-valued fields where you can add something. I am testing on 4.1 I ran some tests to prove to me that you cannot do anything else to a multi-valued field, like remove a value and do a partial update on the whole list. It flattens the result to a comma delimited String when I remove a value, from details: [ here there, Hello there, Oh Fudge ], to this details: [ [here there, Oh Fudge] ], Does this meant that I must remove the entire document and re-index it? Many thanks in advance Jack
Re: Is their a way in which I can make spell suggestion dictionary build on specific fileds
AnalyzingSuggester might also be worth having a look at (requires some Googling and SO reading to get it right for now). Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Feb 21, 2013 at 11:11 AM, Jack Krupansky j...@basetechnology.comwrote: Yes, each spellchecker (or dictionary) in your spellcheck search component has a field parameter to specify the field to be used to generate the dictionary index for that spellchecker: str name=fieldspell/str See the Solr example solrconfig.xml and search for lst name=spellchecker. Also see: http://wiki.apache.org/solr/**SpellCheckComponenthttp://wiki.apache.org/solr/SpellCheckComponent -- Jack Krupansky -Original Message- From: Rohan Thakur Sent: Thursday, February 21, 2013 2:34 AM To: solr-user@lucene.apache.org Subject: Is their a way in which I can make spell suggestion dictionary build on specific fileds hi all I wanted to know is their a way in which I have select on which indexed field I want to build the spell suggestions dictionary? thanks regards Rohan
Re: Document update question
I am using 4.1. I was not aware of that link. In the absence of being able to do partial updates to multi-valued fields, I just punted to delete and reindex. I'd like to see otherwise. Many thanks Jack On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter thelabd...@gmail.com wrote: Hi Jack, There was a bug for this fixed for 4.1 - which version are you on? I remember this b/c I was on 4.0 and had to upgrade for this exact reason. https://issues.apache.org/jira/browse/SOLR-4134 Tim On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote: From what I can read about partial updates, it will only work for singleton fields where you can set them to something else, or multi-valued fields where you can add something. I am testing on 4.1 I ran some tests to prove to me that you cannot do anything else to a multi-valued field, like remove a value and do a partial update on the whole list. It flattens the result to a comma delimited String when I remove a value, from details: [ here there, Hello there, Oh Fudge ], to this details: [ [here there, Oh Fudge] ], Does this meant that I must remove the entire document and re-index it? Many thanks in advance Jack
Re: How to change the index dir in Solr 4.1
Have you tried leaving: dataDir${solr.data.dir:}/dataDir in solrconfig.xml and then setting the data dir for each core in the solr.xml, i.e. core schema=schema.xml loadOnStartup=true instanceDir=someCore/ transient=false name=justSomeCore config=solrconfig.xml dataDir=PATH_TO_DATA_DIR/ On Thu, Feb 21, 2013 at 7:13 AM, chamara chama...@gmail.com wrote: I am having 5 shards in one machine using the new one collection multiple cores method. I am trying to change the index directory, but if i hard code that in the SolrConfig.xml , the index dir does not change for other cores and each core tries to fight over it and ends up as a deadlock. Is there anyway to suffix the index directory with the shard replica name so that for each shard i will have a different index directory? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document update question
Weird - the only difference I see is that we us XML vs. JSON, but otherwise, doing the following works for us: field update=set name=someMultiValuedFieldVALU1/field field update=set name=someMultiValuedFieldVALU2/field Result would be: arr name=someMultiValuedField strVALU1/str strVALU2/str /arr On Thu, Feb 21, 2013 at 9:44 AM, Jack Park jackp...@topicquests.org wrote: I am using 4.1. I was not aware of that link. In the absence of being able to do partial updates to multi-valued fields, I just punted to delete and reindex. I'd like to see otherwise. Many thanks Jack On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter thelabd...@gmail.com wrote: Hi Jack, There was a bug for this fixed for 4.1 - which version are you on? I remember this b/c I was on 4.0 and had to upgrade for this exact reason. https://issues.apache.org/jira/browse/SOLR-4134 Tim On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote: From what I can read about partial updates, it will only work for singleton fields where you can set them to something else, or multi-valued fields where you can add something. I am testing on 4.1 I ran some tests to prove to me that you cannot do anything else to a multi-valued field, like remove a value and do a partial update on the whole list. It flattens the result to a comma delimited String when I remove a value, from details: [ here there, Hello there, Oh Fudge ], to this details: [ [here there, Oh Fudge] ], Does this meant that I must remove the entire document and re-index it? Many thanks in advance Jack
Re: Document update question
Interesting you should say that. Here is my solrj code: public Solr3Client(String solrURL) throws Exception { server = new HttpSolrServer(solrURL); // server.setParser(new XMLResponseParser()); } I cannot recall why I commented out the setParser line; something about someone saying in another thread it's not important. I suppose I should revisit my unit tests with that line uncommented. Or, did I miss something? The JSON results I painted earlier were from reading the document online in the admin query panel. Many thanks Jack On Thu, Feb 21, 2013 at 8:52 AM, Timothy Potter thelabd...@gmail.com wrote: Weird - the only difference I see is that we us XML vs. JSON, but otherwise, doing the following works for us: field update=set name=someMultiValuedFieldVALU1/field field update=set name=someMultiValuedFieldVALU2/field Result would be: arr name=someMultiValuedField strVALU1/str strVALU2/str /arr On Thu, Feb 21, 2013 at 9:44 AM, Jack Park jackp...@topicquests.org wrote: I am using 4.1. I was not aware of that link. In the absence of being able to do partial updates to multi-valued fields, I just punted to delete and reindex. I'd like to see otherwise. Many thanks Jack On Thu, Feb 21, 2013 at 8:13 AM, Timothy Potter thelabd...@gmail.com wrote: Hi Jack, There was a bug for this fixed for 4.1 - which version are you on? I remember this b/c I was on 4.0 and had to upgrade for this exact reason. https://issues.apache.org/jira/browse/SOLR-4134 Tim On Wed, Feb 20, 2013 at 9:16 PM, Jack Park jackp...@topicquests.org wrote: From what I can read about partial updates, it will only work for singleton fields where you can set them to something else, or multi-valued fields where you can add something. I am testing on 4.1 I ran some tests to prove to me that you cannot do anything else to a multi-valued field, like remove a value and do a partial update on the whole list. It flattens the result to a comma delimited String when I remove a value, from details: [ here there, Hello there, Oh Fudge ], to this details: [ [here there, Oh Fudge] ], Does this meant that I must remove the entire document and re-index it? Many thanks in advance Jack
Re: How to change the index dir in Solr 4.1
Yes that is what i am doing now? I taught this solution is not elegant for a deployment? Is there any other way to do this from the SolrConfig.xml? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891p4041950.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to change the index dir in Solr 4.1
How about passing -Dsolr.data.dir=/ur/data/dir in the command line to java when you start Solr service. On Thu, Feb 21, 2013 at 9:05 AM, chamara chama...@gmail.com wrote: Yes that is what i am doing now? I taught this solution is not elegant for a deployment? Is there any other way to do this from the SolrConfig.xml? -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-change-the-index-dir-in-Solr-4-1-tp4041891p4041950.html Sent from the Solr - User mailing list archive at Nabble.com.
synonym replacement in AnalyzingSuggester?
I'm using the new AnalyzingSuggester (my code is available on http://pastebin.com/tN9yXHB0) and I got the synonyms whisky,whiskey (they are bi-directional) So whether the user searches for whiskey or whisky, I want to retrieve all documents that have any of them. However, for autosuggest, I would like to prefer (better said: only show!) whisky e.g. I got the document Whiskey Bottle but autosuggest for whi should return Whisky Bottle The only way I'd think of is replacing Whiskey with Whisky on feeding, but that would also mean an additional field in solr (since I do want to keep Whiskey in the original field) Is there any way to do some kind of synonym replacement on-the-fly for these suggestions? Has anyone ever done that or has an idea how to do that? Cheers. Sebastian
Re: Threads running while querrying
I get 2 second response time in average. Any config / hardware change suggestions for my usecase - low qps rate? I would say more shards on the same node, but there would be the cache diminution disadvantage On Wednesday, February 20, 2013, Walter Underwood wrote: In production, you should have requests arriving at Solr simultaneously. Those simultaneous requests will be processed in parallel. For each query, there are many ways to improve response time. It depends on the query and the schema. What query response time are you seeing? wunder On Feb 20, 2013, at 7:39 AM, Manuel Le Normand wrote: Thanks for the reply Erick! To make sure i understand: each query request runs on a single thread of the shard. My searcher thread is CPU bounded. Does it mean my only possibility to shorten my query time, assuming low qps rate, is to split my collection to many shards on different nodes? (And that multiple CPU cores are good only for high qps rate?) Thanks in advance On Wednesday, February 20, 2013, Erick Erickson wrote: Well, it matters because your single-threaded client is firing one request, waiting for the response, then firing another. There's no opportunity for Solr to use more than one thread for queries if there's only a single thread on a single client ever making requests Or I misunderstand what you've set up completely. Best Erick On Wed, Feb 20, 2013 at 8:37 AM, Manuel Le Normand manuel.lenorm...@gmail.com javascript:; javascript:; wrote: Yes, i made a one threaded script which sends a querry by a post request to the shard's url, gets back the response and posts the next querry. How can it matter? Manuel On Wednesday, February 20, 2013, Erick Erickson wrote: Silly question perhaps, but are you feeding queries at Solr with a single thread? Because Solr uses multiple threads to search AFAIK. Best Erick On Wed, Feb 20, 2013 at 4:01 AM, Manuel Le Normand manuel.lenorm...@gmail.com javascript:; javascript:; javascript:; wrote: More to it, i do see 75 more threads under the process of tomcat6, but only a single one is working while querrying On Wednesday, February 20, 2013, Manuel Le Normand wrote: Hello, I created a single collection on a linux server with 8m docs. Solr 4.1 While making performance tests, i see that my quad core server makes a full use of a single core while the 3 others are idle. Is there a possibility of making a single sharded collection available for multi-threaded querry? P.s: im not indexing while querrying
Re: Index optimize takes more than 40 minutes for 18M documents
That seems fairly fast. We index about 3 million documents in about half that time. We are probably limited by the time it takes to get the data from MySQL. Don't optimize. Solr automatically merges index segments as needed. Optimize forces a full merge. You'll probably never notice the difference, either in disk space or speed. It might make sense to force merge (optimize) if you reindex everything once per day and have no updates in between. But even then it may be a waste of time. You need lots of free disk space for merging, whether a forced merge or automatic. Free space equal to the size of the index is usually enough, but worst case can need double the size of the index. wunder On Feb 21, 2013, at 9:20 AM, Yandong Yao wrote: Hi Guys, I am using Solr 4.1 and have indexed 18M documents using solrj ConcurrentUpdateSolrServer (each document contains 5 fields, and average length is less than 1k). 1) It takes 70 minutes to index those documents without optimize on my mac 10.8, how is the performance, slow, fast or common? 2) It takes about 40 minutes to optimize those documents, following is top output, and there are lots of FAULTS, what does this means? Processes: 118 total, 2 running, 8 stuck, 108 sleeping, 719 threads 00:56:52 Load Avg: 1.48, 1.56, 1.73 CPU usage: 6.63% user, 6.40% sys, 86.95% idle SharedLibs: 31M resident, 0B data, 6712K linkedit. MemRegions: 34734 total, 5801M resident, 39M private, 638M shared. PhysMem: 982M wired, 3600M active, 3567M inactive, 8150M used, 38M free. VM: 254G vsize, 1285M framework vsize, 1469887(368) pageins, 1095550(0) pageouts. Networks: packets: 14842595/9661M in, 14777685/9395M out. Disks: 820048/43G read, 523814/53G written. PID COMMAND %CPU TIME #TH #WQ #POR #MRE RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH 4585 java 11.7 02:52:01 32 1483 342 3866M+ 6724K 3856M+ 4246M 6908M 4580 4580 sleepin 501 1490340+ 402 3000781+ 231785+ 15044055+ 10033109+ 3) If I don't run optimize, what is the impact? bigger disk size or slow query performance? Following is my index config in solrconfig.xml: ramBufferSizeMB100/ramBufferSizeMB mergeFactor10/mergeFactor autoCommit maxDocs10/maxDocs!-- 100K docs -- maxTime30/maxTime!-- 5 minutes -- openSearcherfalse/openSearcher /autoCommit Thanks very much in advance! Regards, Yandong
Re: DIH deleting documents
On 21 February 2013 19:30, cveres csabave...@me.com wrote: Thanks Gora, Sorry I might not have been sufficiently clear. I start with an empty index, then add documents. 9000 are added and 6000 immediately deleted again, leaving 3000. I assume this can only happen with duplicate IDs, but that should not be possible! So I wanted to get a list of deleted documents so that I could try and figure out why they were deleted immediately. [...] What do you mean by 9000 are added and 6000 immediately deleted again? How are you getting the number added, and the number deleted? How many documents does DIH report on the final screen after the full-import completes? From what you describe, it is most likely duplicate IDs. Could you do a SELECT from the database outside of Solr, create the IDs as you do with DIH, and see what is going wrong there? Regards, Gora
Re: Solr splitting my words
I tried playing with the analyzer before posting and wasn't sure how to interpret it. Field type: text Field value index: womens-mcmurdo-ii-bootsthis is based on the info that is in the field Field value query: mcmurdo results I only got one match in the index analyzer org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} term position 12 3 4 term text womens mcmurdo ii bootswomensmcmurdoiiboots term type word wordwordword word source start,end0,6 7,14 15,1718,23 0,23 payload Jack, The field that I am expecting to be indexed is not sending the data in caps. Which is why I am puzzled. I am wondering if the indexed data is not coming from the field I expect. I will try your change in dev once I get data generated there. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913p4041963.html Sent from the Solr - User mailing list archive at Nabble.com.
splitting big, existing index into shards
Hi I have built a 300GB index using lucene 4.1 and now it is too big to do queries efficiently. I wonder if it is possible to split it into shards, then use SolrCloud configuration? I have looked around the forum but was unable to find any tips on this. Any help please? Many thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/splitting-big-existing-index-into-shards-tp4041964.html Sent from the Solr - User mailing list archive at Nabble.com.
Matching an exact word
I'm trying to match the word created. Given that it is surrounded by quotes, I would expect an exact match to occur, but instead the entire stemming results show for words such as create, creates, created, etc. q=createdwt=xmlrows=1000qf=textdefType=edismax If I copy the text field to a new one that does not stem words, text_exact for example, I get the expected results: q=createdwt=xmlrows=1000qf=text_exactdefType=edismax I would like the decision whether to match exact or not to be determined by the quotes rather than the qf parameter (eg, not have to use it at all). What topic do I need to look into more to understand this? Thanks in advance!
Re: splitting big, existing index into shards
You can split an index using the MultiPassIndexSplitter, which is in Lucene contrib. However, it won't use the same algorithm for assigning documents to shards, which means the indexes won't work with a SolrCloud setup. A splitter that uses the same split technique but uses the shard assignment algorithm from SolrCloud could be a useful thing. But I have to say, I suspect it will be quicker/easier to just re-index. Make sure you choose the right number of shards, with SolrCloud as it is, you cannot change the number of shards without reindexing. This may change soon with newer releases of Solr though. Upayavira On Thu, Feb 21, 2013, at 06:09 PM, zqzuk wrote: Hi I have built a 300GB index using lucene 4.1 and now it is too big to do queries efficiently. I wonder if it is possible to split it into shards, then use SolrCloud configuration? I have looked around the forum but was unable to find any tips on this. Any help please? Many thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/splitting-big-existing-index-into-shards-tp4041964.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Matching an exact word
Solr will only match on the terms as they are in the index. If it is stemmed in the index, it will match that. If it isn't, it'll match that. All term matches are (by default at least) exact matches. Only with stemming you are doing an exact match against the stemmed term. Therefore, there really is no way to do what you are looking for within Solr. I'd suggest you'll need to do some parsing at your side and, if you find quotes, do the query against a different field. Upayavira On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote: I'm trying to match the word created. Given that it is surrounded by quotes, I would expect an exact match to occur, but instead the entire stemming results show for words such as create, creates, created, etc. q=createdwt=xmlrows=1000qf=textdefType=edismax If I copy the text field to a new one that does not stem words, text_exact for example, I get the expected results: q=createdwt=xmlrows=1000qf=text_exactdefType=edismax I would like the decision whether to match exact or not to be determined by the quotes rather than the qf parameter (eg, not have to use it at all). What topic do I need to look into more to understand this? Thanks in advance!
Re: Solr UIMA
: Subject: Solr UIMA : References: 5123b218.7050...@juntadeandalucia.es : In-reply-to: 5123b218.7050...@juntadeandalucia.es https://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. -Hoss
Re: DIH deleting documents
Hi Csaba. Would you mind posting your DIHconfig/data-config.xml and the command you use for the import? Thanks. Arcadius. On 21 February 2013 17:55, Gora Mohanty g...@mimirtech.com wrote: On 21 February 2013 19:30, cveres csabave...@me.com wrote: Thanks Gora, Sorry I might not have been sufficiently clear. I start with an empty index, then add documents. 9000 are added and 6000 immediately deleted again, leaving 3000. I assume this can only happen with duplicate IDs, but that should not be possible! So I wanted to get a list of deleted documents so that I could try and figure out why they were deleted immediately. [...] What do you mean by 9000 are added and 6000 immediately deleted again? How are you getting the number added, and the number deleted? How many documents does DIH report on the final screen after the full-import completes? From what you describe, it is most likely duplicate IDs. Could you do a SELECT from the database outside of Solr, create the IDs as you do with DIH, and see what is going wrong there? Regards, Gora
Re: get content is put in the index queue but is not committed
: Anybody know how-to get content is put in the index queue but is not : committed? i'm guessing you are refering to uncommited documents in the transaction log? Take a look at the UpdateLog class, and how it's used by the RealTimeGetComponent. If you provide more details as to what you end goal is, we might be able to provide more specific (or alternative) suggestions on how to achieve your goal... https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: Is their a way to remove the unwanted characters from solr index
: I have a field in which I have strings with unwanted character like : \n\r\n\n these kind, I wanted to know is their any why I can remove : these...actually I had data stored in html format in the sql database : column which I had to index in solr...using HTML stripe I had removed the : HTML tags but leaving these unwanted characters in between any one knows : how to remove them. https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html See the parent class for an in depth description of how to configure which fields it will be applied to... https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html -Hoss
Re: Slaves always replicate entire index Index versions
Sounds good I am trying the combination of my patch and 4413 now to see how it works and will have to see if I can put unit tests around them as some of what I thought may not be true with respect to the commit generation numbers. For your issue above in your last post, is it possible that there was a commit on the master in that slight window after solr checks for the latest generation of the master but before it downloads the actual files? How frequent are the commits on your master? On Thu, Feb 21, 2013 at 2:00 AM, raulgrande83 raulgrand...@hotmail.comwrote: Thanks for the patch, we'll try to install these fixes and post if replication works or not. I renamed 'index.timestamp' folders to just 'index' but it didn't work. These lines appeared in the log: INFO: Master's generation: 64594 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave's generation: 64593 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Starting replication process 21-feb-2013 10:42:00 org.apache.solr.handler.SnapPuller fetchFileList SEVERE: No files to download for index generation: 64594 -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4041827.html Sent from the Solr - User mailing list archive at Nabble.com.
can i install new SOLR 4.1 as slaver(3.3 Master)
Hi , our SOLR master version is 3.3, can i install new box SOLR 4.1 as slaver, and replication from master data. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/can-i-install-new-SOLR-4-1-as-slaver-3-3-Master-tp4041976.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Is it possible to manually select a shard leader in a running SolrCloud?
Thanks Mark, The real driver for me wanting to promote a different leader is when I create a new Collection via the Collections API across a multi-server SolrCloud, the leader of each shard is always the same host, so you're right that I'm tackling the wrong problem with this request, although it would fix it for me. If I create the cores manually via the cores API, one-by-one, I am able to get what I expect, but when running this Collections API call on a 3 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader of all 3 shards, meaning it will get all the writes for everything (correct me if I am wrong). If so, this will not scale well with all writes to one node (or correct me if I am wrong)? curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2' Currently on my 3 instance SOLR 4.1 setup, the above call creates the following: - ServerA is the leader of all 3 shards (the problem I want to address). - ServerB + ServerC are automagically replicas of the 3 leader shards on ServerA. So again, my issue is one server gets all the writes. Does anyone else encounter this? If so, I should spawn a separate thread on my specific issue. Cheers, Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, February 19, 2013 8:44 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? You can't easily do it the way it's implemented in ZooKeeper. We would probably internally have to do the same thing - elect a new leader and drop him until the one we wanted came up. The main thing doing it internally would gain is that you could skip the elected guy from becoming the actual leader and just move on to the next candidate. Still some tricky corner cases to deal with and such as well. I think for most things you would use this to solve, there is probably an alternate thing that should be addressed. - Mark On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Hey all, I feel having to unload the leader core to force an election is hacky, and as far as I know would still leave which node becomes the Leader to chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases. Also, this imposes additional load temporarily. Is there a way to force the winner of the Election, and if not, is there a known feature-request for this? Cheers, Tim Vaillancourt -Original Message- From: Joseph Dale [mailto:joey.d...@gmail.com] Sent: Sunday, February 03, 2013 7:42 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? With solrclound all cores are collections. The collections API it just a wrapper to call the core api a million times with one command. to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1 Basically your creating the shard again, after leader props have gone out. Solr will check ZK and find a core meeting that description, then simply get a copy of the index from the leader of that shard. On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote: What is the inverse I'd use to re-create/load a core on another machine but make sure it's also known to SolrCloud/as a shard? On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote: To be more clear lets say bob it the leader of core 1. On bob do a /admin/cores?action=unloadname=core1. This removes the core/shard from bob, giving the other servers a chance to grab leader props. -Joey On Feb 2, 2013, at 11:27 AM, Brett Hoerner br...@bretthoerner.com wrote: Hi, I have a 5 server cluster running 1 collection with 20 shards, replication factor of 2. Earlier this week I had to do a rolling restart across the cluster, this worked great and the cluster stayed up the whole time. The problem is that the last node I restarted is now the leader of 0 shards, and is just holding replicas. I've noticed this node has abnormally high load average, while the other nodes (who have the same number of shards, but more leaders on average) are fine. First, I'm wondering if that loud could be related to being a 5x replica and 0x leader? Second, I was wondering if I could somehow flag single shards to re-elect a leader (or force a leader) so that I could more evenly distribute how many leader shards each physical server has running? Thanks. -- - Mark
Re: Combining Solr score with customized user ratings for a document
: With this approach now I can boost (i.e. multiply Solr's score by a factor) : the results of any query by doing something like this: : http://localhost:8080/solr/Prueba/select_test?q={!boost : b=rating(usuario1)}text:grapafl=score : : Where 'rating' is the name of my function. : : Unfortunately, I still can't see which differences are between doing this or : making the product of both scores as the value for the query's sort : parameter... :( I'm not sure i understand your question. With the example query above, your score -- both returned, and used for sorting by score -- is the mathematical result of multiplying your function by the relevancy score of text:grapa Perhaps what you are refering to is the idea that if you wnat the score to remain purely about relevancy, you can still opionally sort on the results of this function, by using the function solely in your sort -- the only thing that tends to confuse people here is how you refer back to the original query in that sort by function command... http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201206.mbox/%3Calpine.DEB.2.00.1206111242260.17925@bester%3E or in your case, something like this would return the both the raw score, and your custom rating, but it would sort on the product of those two values... ?q=text:grapafl=id,score,rating(usuario1)sort=product(rating(usuario1),query($q) : Which is the best place to do it? I think I would query the DB/cache just : when the custom ValueSource is created in the ValueSourceParser's parse That might makes sense, but becareful where you put this cache data -- if it's part of the ValueSource then whenever that ValueSource is used in a FunctionQuery (ie: {!boost b=rating(usuario1)}text:grapa it will be part of the cache key for the queryResultCache or filterCache -- so having large data structures in your ValueSource could eat up a lot of RAM. Take a look at src/docs/differences between the ValueSource class and the FunctionValues class -Hoss
Re: If we Open Source our platform, would it be interesting to you?
Hello David, First of all, thanks for answering! 2013/2/21 David Quarterman da...@corexe.com Looked through your site and the framework looks very powerful as an aggregator. We do a lot of data aggregation from many different sources in many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main repository for eventual SOLR indexing. A 'one-stop-shop' for all this would be very appealing. Actually, just to clarify, it uses Cassandra as repository, not an RDMS. We want to use it for large scale, so you could import entire company databases into the repo and relate the data from one another. However, If I understood you right, you got the idea, an intermediate repo before indexing, so you could postpone decisions about what to index and how... Have you looked at products like Talend Jitterbit? These offer transformation from almost anything to almost anything using graphical interfaces (Jitterbit is better) and a PHP-like coding format for trickier work. If you (or somebody) could add a graphical interface, the world would beat a path to your door! This is very interesting, actually! We considered using Talend when we started our business, but we decided to go ahead with the development of a new product. The reason was: Talend is great, but it limits a good programmer, if he is more agile coding than using graphical interfaces. Have user interfaces as a possibility is nice, but as something you HAVE TO use is awful. Besides, it has a learning curve and seems to run better and you hire their own platform, and we wanted to choose the fine grain of our platform. However, your question made me think a lot about it. Do you think integrating to jitterbit or talend could be interesting? Or did you mean developing a new user interface? The bad thing I see in integrating with a talend like program is that you start to be dependent on the graphical interface, I feel it's hard to use my own java code... I might be wrong. Anyway, I will consider this possibility, but if you could explain better why you think one or other could be such a good idea would help us a lot. Would you be interested in using such a tool yourself? Best regards, Marcelo.
Re: DIH deleting documents
Hi Gora and Arcadius, Thanks for your help. I'll try and answer both your questions here. I am interested in three database tables. Book contains information about books, page has the content of each book page by page, and chapter contains the title of each chapter in every book, and the page on which the chapter begins. It is a bit of a mess because I need the contents of each chapter in every book, but I have to infer which pages each chapter contains by its page number. So there is quite a complex query. There are 8764 rows in the chapter table .. so 8764 unique chapter headings .. and 6870 books. When I import, I get Num Docs: 2784 Max Doc: 9488 Deleted Docs: 6704 Here is the config file (the relevant part): entity name=book_chapter PK=ID rootEntity=false query=select id as b_id,title,type_id from book entity name=chapter query=SELECT CONCAT(CAST('${book_chapter.title}' AS CHAR),'-',CAST(chapter AS CHAR)) as solr_id, book_id,'chapter' as entityType,GROUP_CONCAT(content_raw) from (select id as page_id, book_id, page_no, content_raw, (select title from chapter ch where (ch.begin_page_no lt; p.page_no OR ch.begin_page_no = p.page_no) and ch.book_id = p.book_id and ch.parent_id = 0 order by begin_page_no desc LIMIT 1) as chapter from page p where book_id = '${book_chapter.b_id}') a group by chapter field column=solr_id name=id / field column=title name=title/ field column=GROUP_CONCAT(content_raw) name=pageText/ field column=entityType name=entityType/ entity name=book-type2 query=select name,id from book_type where id='${book_chapter.type_id}' field column=name name=contentType/ /entity /entity /entity thanks, Csaba -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4041996.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: If we Open Source our platform, would it be interesting to you?
Marcelo In some sense, it sounds like you are aiming at building a topic map of all your resources. Jack On Thu, Feb 21, 2013 at 11:54 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: Hello David, First of all, thanks for answering! 2013/2/21 David Quarterman da...@corexe.com Looked through your site and the framework looks very powerful as an aggregator. We do a lot of data aggregation from many different sources in many different formats (XML, JSON, text, CSV, etc) using RDBMS as the main repository for eventual SOLR indexing. A 'one-stop-shop' for all this would be very appealing. Actually, just to clarify, it uses Cassandra as repository, not an RDMS. We want to use it for large scale, so you could import entire company databases into the repo and relate the data from one another. However, If I understood you right, you got the idea, an intermediate repo before indexing, so you could postpone decisions about what to index and how... Have you looked at products like Talend Jitterbit? These offer transformation from almost anything to almost anything using graphical interfaces (Jitterbit is better) and a PHP-like coding format for trickier work. If you (or somebody) could add a graphical interface, the world would beat a path to your door! This is very interesting, actually! We considered using Talend when we started our business, but we decided to go ahead with the development of a new product. The reason was: Talend is great, but it limits a good programmer, if he is more agile coding than using graphical interfaces. Have user interfaces as a possibility is nice, but as something you HAVE TO use is awful. Besides, it has a learning curve and seems to run better and you hire their own platform, and we wanted to choose the fine grain of our platform. However, your question made me think a lot about it. Do you think integrating to jitterbit or talend could be interesting? Or did you mean developing a new user interface? The bad thing I see in integrating with a talend like program is that you start to be dependent on the graphical interface, I feel it's hard to use my own java code... I might be wrong. Anyway, I will consider this possibility, but if you could explain better why you think one or other could be such a good idea would help us a lot. Would you be interested in using such a tool yourself? Best regards, Marcelo.
RE: Matching an exact word
Thank you. So essentially I need to write a custom query parser (extending upon something like the QParser)? -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, February 21, 2013 12:22 PM To: solr-user@lucene.apache.org Subject: Re: Matching an exact word Solr will only match on the terms as they are in the index. If it is stemmed in the index, it will match that. If it isn't, it'll match that. All term matches are (by default at least) exact matches. Only with stemming you are doing an exact match against the stemmed term. Therefore, there really is no way to do what you are looking for within Solr. I'd suggest you'll need to do some parsing at your side and, if you find quotes, do the query against a different field. Upayavira On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote: I'm trying to match the word created. Given that it is surrounded by quotes, I would expect an exact match to occur, but instead the entire stemming results show for words such as create, creates, created, etc. q=createdwt=xmlrows=1000qf=textdefType=edismax If I copy the text field to a new one that does not stem words, text_exact for example, I get the expected results: q=createdwt=xmlrows=1000qf=text_exactdefType=edismax I would like the decision whether to match exact or not to be determined by the quotes rather than the qf parameter (eg, not have to use it at all). What topic do I need to look into more to understand this? Thanks in advance!
Re: Matching an exact word
You could also do this outside Solr, in your client. If your query is surrounded by quotes, then strip away the quotes and make q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in general keeping in mind the upgrade path. -sujit On Feb 21, 2013, at 12:20 PM, Van Tassell, Kristian wrote: Thank you. So essentially I need to write a custom query parser (extending upon something like the QParser)? -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, February 21, 2013 12:22 PM To: solr-user@lucene.apache.org Subject: Re: Matching an exact word Solr will only match on the terms as they are in the index. If it is stemmed in the index, it will match that. If it isn't, it'll match that. All term matches are (by default at least) exact matches. Only with stemming you are doing an exact match against the stemmed term. Therefore, there really is no way to do what you are looking for within Solr. I'd suggest you'll need to do some parsing at your side and, if you find quotes, do the query against a different field. Upayavira On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote: I'm trying to match the word created. Given that it is surrounded by quotes, I would expect an exact match to occur, but instead the entire stemming results show for words such as create, creates, created, etc. q=createdwt=xmlrows=1000qf=textdefType=edismax If I copy the text field to a new one that does not stem words, text_exact for example, I get the expected results: q=createdwt=xmlrows=1000qf=text_exactdefType=edismax I would like the decision whether to match exact or not to be determined by the quotes rather than the qf parameter (eg, not have to use it at all). What topic do I need to look into more to understand this? Thanks in advance!
Re: Matching an exact word
And keep in mind you do need quotes around your searchTerm if it consists of multiple words - q=text_exact_field:your_unquoted_query otherwise Solr will interpret two words as: exact_field:two defaultfield:words (Maybe not directly applicable for your problem Kristian, but I just want to mention that there are a few StemFilters available, maybe another one acts differently!) On 21 February 2013 21:52, SUJIT PAL sujit@comcast.net wrote: You could also do this outside Solr, in your client. If your query is surrounded by quotes, then strip away the quotes and make q=text_exact_field:your_unquoted_query. Probably better to do outside Solr in general keeping in mind the upgrade path. -sujit On Feb 21, 2013, at 12:20 PM, Van Tassell, Kristian wrote: Thank you. So essentially I need to write a custom query parser (extending upon something like the QParser)? -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, February 21, 2013 12:22 PM To: solr-user@lucene.apache.org Subject: Re: Matching an exact word Solr will only match on the terms as they are in the index. If it is stemmed in the index, it will match that. If it isn't, it'll match that. All term matches are (by default at least) exact matches. Only with stemming you are doing an exact match against the stemmed term. Therefore, there really is no way to do what you are looking for within Solr. I'd suggest you'll need to do some parsing at your side and, if you find quotes, do the query against a different field. Upayavira On Thu, Feb 21, 2013, at 06:17 PM, Van Tassell, Kristian wrote: I'm trying to match the word created. Given that it is surrounded by quotes, I would expect an exact match to occur, but instead the entire stemming results show for words such as create, creates, created, etc. q=createdwt=xmlrows=1000qf=textdefType=edismax If I copy the text field to a new one that does not stem words, text_exact for example, I get the expected results: q=createdwt=xmlrows=1000qf=text_exactdefType=edismax I would like the decision whether to match exact or not to be determined by the quotes rather than the qf parameter (eg, not have to use it at all). What topic do I need to look into more to understand this? Thanks in advance!
RE: Is it possible to manually select a shard leader in a running SolrCloud?
Correction, I used this curl: curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2' So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader of all 3 shards in 4.1 with this call. Tim Vaillancourt -Original Message- From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] Sent: Thursday, February 21, 2013 11:27 AM To: solr-user@lucene.apache.org; markrmil...@gmail.com Subject: RE: Is it possible to manually select a shard leader in a running SolrCloud? Thanks Mark, The real driver for me wanting to promote a different leader is when I create a new Collection via the Collections API across a multi-server SolrCloud, the leader of each shard is always the same host, so you're right that I'm tackling the wrong problem with this request, although it would fix it for me. If I create the cores manually via the cores API, one-by-one, I am able to get what I expect, but when running this Collections API call on a 3 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader of all 3 shards, meaning it will get all the writes for everything (correct me if I am wrong). If so, this will not scale well with all writes to one node (or correct me if I am wrong)? curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2' Currently on my 3 instance SOLR 4.1 setup, the above call creates the following: - ServerA is the leader of all 3 shards (the problem I want to address). - ServerB + ServerC are automagically replicas of the 3 leader shards on ServerA. So again, my issue is one server gets all the writes. Does anyone else encounter this? If so, I should spawn a separate thread on my specific issue. Cheers, Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, February 19, 2013 8:44 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? You can't easily do it the way it's implemented in ZooKeeper. We would probably internally have to do the same thing - elect a new leader and drop him until the one we wanted came up. The main thing doing it internally would gain is that you could skip the elected guy from becoming the actual leader and just move on to the next candidate. Still some tricky corner cases to deal with and such as well. I think for most things you would use this to solve, there is probably an alternate thing that should be addressed. - Mark On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Hey all, I feel having to unload the leader core to force an election is hacky, and as far as I know would still leave which node becomes the Leader to chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases. Also, this imposes additional load temporarily. Is there a way to force the winner of the Election, and if not, is there a known feature-request for this? Cheers, Tim Vaillancourt -Original Message- From: Joseph Dale [mailto:joey.d...@gmail.com] Sent: Sunday, February 03, 2013 7:42 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? With solrclound all cores are collections. The collections API it just a wrapper to call the core api a million times with one command. to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1 Basically your creating the shard again, after leader props have gone out. Solr will check ZK and find a core meeting that description, then simply get a copy of the index from the leader of that shard. On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote: What is the inverse I'd use to re-create/load a core on another machine but make sure it's also known to SolrCloud/as a shard? On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote: To be more clear lets say bob it the leader of core 1. On bob do a /admin/cores?action=unloadname=core1. This removes the core/shard from bob, giving the other servers a chance to grab leader props. -Joey On Feb 2, 2013, at 11:27 AM, Brett Hoerner br...@bretthoerner.com wrote: Hi, I have a 5 server cluster running 1 collection with 20 shards, replication factor of 2. Earlier this week I had to do a rolling restart across the cluster, this worked great and the cluster stayed up the whole time. The problem is that the last node I restarted is now the leader of 0 shards, and is just holding replicas. I've noticed this node has abnormally high load average, while the other nodes (who have the same number of shards, but more leaders on average) are fine. First, I'm wondering if that loud could be related to being a 5x replica and 0x leader? Second, I was wondering if I could somehow flag single shards to re-elect a leader (or force a
Re: Is it possible to manually select a shard leader in a running SolrCloud?
Which of your three hosts did you point this request at? Upayavira On Thu, Feb 21, 2013, at 09:13 PM, Vaillancourt, Tim wrote: Correction, I used this curl: curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2' So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader of all 3 shards in 4.1 with this call. Tim Vaillancourt -Original Message- From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] Sent: Thursday, February 21, 2013 11:27 AM To: solr-user@lucene.apache.org; markrmil...@gmail.com Subject: RE: Is it possible to manually select a shard leader in a running SolrCloud? Thanks Mark, The real driver for me wanting to promote a different leader is when I create a new Collection via the Collections API across a multi-server SolrCloud, the leader of each shard is always the same host, so you're right that I'm tackling the wrong problem with this request, although it would fix it for me. If I create the cores manually via the cores API, one-by-one, I am able to get what I expect, but when running this Collections API call on a 3 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader of all 3 shards, meaning it will get all the writes for everything (correct me if I am wrong). If so, this will not scale well with all writes to one node (or correct me if I am wrong)? curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2' Currently on my 3 instance SOLR 4.1 setup, the above call creates the following: - ServerA is the leader of all 3 shards (the problem I want to address). - ServerB + ServerC are automagically replicas of the 3 leader shards on ServerA. So again, my issue is one server gets all the writes. Does anyone else encounter this? If so, I should spawn a separate thread on my specific issue. Cheers, Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, February 19, 2013 8:44 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? You can't easily do it the way it's implemented in ZooKeeper. We would probably internally have to do the same thing - elect a new leader and drop him until the one we wanted came up. The main thing doing it internally would gain is that you could skip the elected guy from becoming the actual leader and just move on to the next candidate. Still some tricky corner cases to deal with and such as well. I think for most things you would use this to solve, there is probably an alternate thing that should be addressed. - Mark On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Hey all, I feel having to unload the leader core to force an election is hacky, and as far as I know would still leave which node becomes the Leader to chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases. Also, this imposes additional load temporarily. Is there a way to force the winner of the Election, and if not, is there a known feature-request for this? Cheers, Tim Vaillancourt -Original Message- From: Joseph Dale [mailto:joey.d...@gmail.com] Sent: Sunday, February 03, 2013 7:42 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? With solrclound all cores are collections. The collections API it just a wrapper to call the core api a million times with one command. to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1 Basically your creating the shard again, after leader props have gone out. Solr will check ZK and find a core meeting that description, then simply get a copy of the index from the leader of that shard. On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote: What is the inverse I'd use to re-create/load a core on another machine but make sure it's also known to SolrCloud/as a shard? On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote: To be more clear lets say bob it the leader of core 1. On bob do a /admin/cores?action=unloadname=core1. This removes the core/shard from bob, giving the other servers a chance to grab leader props. -Joey On Feb 2, 2013, at 11:27 AM, Brett Hoerner br...@bretthoerner.com wrote: Hi, I have a 5 server cluster running 1 collection with 20 shards, replication factor of 2. Earlier this week I had to do a rolling restart across the cluster, this worked great and the cluster stayed up the whole time. The problem is that the last node I restarted is now the leader of 0 shards, and is just holding replicas. I've noticed this node has abnormally high load average, while the other nodes (who have the same number of
Re: can i install new SOLR 4.1 as slaver(3.3 Master)
I cannot give an affirmative answer. But I am thinking that it would have potential problem, as the index format in 3.3 and 4.1 are slightly different. Why don't you upgrade to 4.1? The only thing you need to do is 1. install solr 4.1 2.1 copy all related config files from 3.3 2.2 back up the index data folder 3. shutdown solr 3.3 4 start solr 4.1 with solr.data.dir pointing to the old dir On Thu, Feb 21, 2013 at 10:54 AM, michaelweica m...@hipdigital.com wrote: Hi , our SOLR master version is 3.3, can i install new box SOLR 4.1 as slaver, and replication from master data. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/can-i-install-new-SOLR-4-1-as-slaver-3-3-Master-tp4041976.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can i install new SOLR 4.1 as slaver(3.3 Master)
thanks we do have 1 master , 5 slave servers. and we use slave as production server. we just update master index file when we have new contents now our index file almost 88G, the server just 1 core, 8G ram,JVM: Xmx60964M -Xms1024M it's easy out of memory so i plan to deploy new server to install SOLR 4.1, it easy to keep master update and just replication to new SOLR, and i dont know it's same process to update new content index on SOLR 4.1 as SOLR 3.3. hope can find best solution for that. -- View this message in context: http://lucene.472066.n3.nabble.com/can-i-install-new-SOLR-4-1-as-slaver-3-3-Master-tp4041976p4042037.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can i instruct the Solr/ Solr Cell to output the original HTML document which was fed to it.?
: Hi everyone, i am new to solr technology and not getting a way to get back : the original HTML document with Hits highlighted into it. what : configuration and where i can do to instruct SolrCell/ Tika so that it does : not strips down the tags of HTML document in the content field. I _think_ what you want is simply to ensure that you have a content field in your schema which is stored=true (and indexed=true if you want to serach on it directly) ... and then ExtractingRequestHandler will put the entire XHTML it generates from the documents you index into that field. http://wiki.apache.org/solr/ExtractingRequestHandler If that isn't what you had in mind, then you need to provide us with more details about what you've tried, what results you get, and how exactly those results differ fro mwhat you want to get. -Hoss
Re: Document update question
On 2/21/2013 10:00 AM, Jack Park wrote: Interesting you should say that. Here is my solrj code: public Solr3Client(String solrURL) throws Exception { server = new HttpSolrServer(solrURL); // server.setParser(new XMLResponseParser()); } I cannot recall why I commented out the setParser line; something about someone saying in another thread it's not important. I suppose I should revisit my unit tests with that line uncommented. Or, did I miss something? The JSON results I painted earlier were from reading the document online in the admin query panel. Jack, SolrJ defaults to the javabin response parser, which offers maximum efficiency in the communication. Between version 1.4.1 and 3.1.0, the javabin version changed and became incompatible with the old one. The XML parser is a little bit less efficient than javabin, but is the only way to get Solr/SolrJ to talk when one side is using a different javabin version than the other side. If you are not mixing 1.x with later versions, you do not need to worry about changing the response parser. Thanks, Shawn
Re: Solr splitting my words
The issue may simply be that your indexed data has the mixed case and your query has only lower case. So, the suggested change won't affect the query itself, but will cause the indexed data to be indexed differently. -- Jack Krupansky -Original Message- From: scallawa Sent: Thursday, February 21, 2013 9:59 AM To: solr-user@lucene.apache.org Subject: Re: Solr splitting my words I tried playing with the analyzer before posting and wasn't sure how to interpret it. Field type: text Field value index: womens-mcmurdo-ii-bootsthis is based on the info that is in the field Field value query: mcmurdo results I only got one match in the index analyzer org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=1, generateNumberParts=1, catenateWords=1, generateWordParts=1, catenateAll=0, catenateNumbers=1} term position 1 2 3 4 term textwomens mcmurdo ii bootswomensmcmurdoiiboots term typeword word word word word source start,end 0,6 7,1415,17 18,23 0,23 payload Jack, The field that I am expecting to be indexed is not sending the data in caps. Which is why I am puzzled. I am wondering if the indexed data is not coming from the field I expect. I will try your change in dev once I get data generated there. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-splitting-my-words-tp4041913p4041963.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index optimize takes more than 40 minutes for 18M documents
Thans Walter for info, we will disable optimize then and do more testing. Regards, Yandong 2013/2/22 Walter Underwood wun...@wunderwood.org That seems fairly fast. We index about 3 million documents in about half that time. We are probably limited by the time it takes to get the data from MySQL. Don't optimize. Solr automatically merges index segments as needed. Optimize forces a full merge. You'll probably never notice the difference, either in disk space or speed. It might make sense to force merge (optimize) if you reindex everything once per day and have no updates in between. But even then it may be a waste of time. You need lots of free disk space for merging, whether a forced merge or automatic. Free space equal to the size of the index is usually enough, but worst case can need double the size of the index. wunder On Feb 21, 2013, at 9:20 AM, Yandong Yao wrote: Hi Guys, I am using Solr 4.1 and have indexed 18M documents using solrj ConcurrentUpdateSolrServer (each document contains 5 fields, and average length is less than 1k). 1) It takes 70 minutes to index those documents without optimize on my mac 10.8, how is the performance, slow, fast or common? 2) It takes about 40 minutes to optimize those documents, following is top output, and there are lots of FAULTS, what does this means? Processes: 118 total, 2 running, 8 stuck, 108 sleeping, 719 threads 00:56:52 Load Avg: 1.48, 1.56, 1.73 CPU usage: 6.63% user, 6.40% sys, 86.95% idle SharedLibs: 31M resident, 0B data, 6712K linkedit. MemRegions: 34734 total, 5801M resident, 39M private, 638M shared. PhysMem: 982M wired, 3600M active, 3567M inactive, 8150M used, 38M free. VM: 254G vsize, 1285M framework vsize, 1469887(368) pageins, 1095550(0) pageouts. Networks: packets: 14842595/9661M in, 14777685/9395M out. Disks: 820048/43G read, 523814/53G written. PID COMMAND %CPU TIME #TH #WQ #POR #MRE RPRVT RSHRD RSIZE VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH 4585 java 11.7 02:52:01 32 1483 342 3866M+ 6724K 3856M+ 4246M 6908M 4580 4580 sleepin 501 1490340+ 402 3000781+ 231785+ 15044055+ 10033109+ 3) If I don't run optimize, what is the impact? bigger disk size or slow query performance? Following is my index config in solrconfig.xml: ramBufferSizeMB100/ramBufferSizeMB mergeFactor10/mergeFactor autoCommit maxDocs10/maxDocs!-- 100K docs -- maxTime30/maxTime!-- 5 minutes -- openSearcherfalse/openSearcher /autoCommit Thanks very much in advance! Regards, Yandong
RE: Is it possible to manually select a shard leader in a running SolrCloud?
I sent this request to ServerA in this case, which became the leader of all shards. As far as I know you're supposed to issue this call to just one server as it issues the calls to the other leaders/replicas in the background, right? I am expecting the single collections API call to spread the leaders evenly across SOLR instances. Hopefully I am just doing/expecting something wrong :). Tim Vaillancourt -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, February 21, 2013 1:44 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? Which of your three hosts did you point this request at? Upayavira On Thu, Feb 21, 2013, at 09:13 PM, Vaillancourt, Tim wrote: Correction, I used this curl: curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2' So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader of all 3 shards in 4.1 with this call. Tim Vaillancourt -Original Message- From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] Sent: Thursday, February 21, 2013 11:27 AM To: solr-user@lucene.apache.org; markrmil...@gmail.com Subject: RE: Is it possible to manually select a shard leader in a running SolrCloud? Thanks Mark, The real driver for me wanting to promote a different leader is when I create a new Collection via the Collections API across a multi-server SolrCloud, the leader of each shard is always the same host, so you're right that I'm tackling the wrong problem with this request, although it would fix it for me. If I create the cores manually via the cores API, one-by-one, I am able to get what I expect, but when running this Collections API call on a 3 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader of all 3 shards, meaning it will get all the writes for everything (correct me if I am wrong). If so, this will not scale well with all writes to one node (or correct me if I am wrong)? curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2' Currently on my 3 instance SOLR 4.1 setup, the above call creates the following: - ServerA is the leader of all 3 shards (the problem I want to address). - ServerB + ServerC are automagically replicas of the 3 leader shards on ServerA. So again, my issue is one server gets all the writes. Does anyone else encounter this? If so, I should spawn a separate thread on my specific issue. Cheers, Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, February 19, 2013 8:44 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? You can't easily do it the way it's implemented in ZooKeeper. We would probably internally have to do the same thing - elect a new leader and drop him until the one we wanted came up. The main thing doing it internally would gain is that you could skip the elected guy from becoming the actual leader and just move on to the next candidate. Still some tricky corner cases to deal with and such as well. I think for most things you would use this to solve, there is probably an alternate thing that should be addressed. - Mark On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Hey all, I feel having to unload the leader core to force an election is hacky, and as far as I know would still leave which node becomes the Leader to chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases. Also, this imposes additional load temporarily. Is there a way to force the winner of the Election, and if not, is there a known feature-request for this? Cheers, Tim Vaillancourt -Original Message- From: Joseph Dale [mailto:joey.d...@gmail.com] Sent: Sunday, February 03, 2013 7:42 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? With solrclound all cores are collections. The collections API it just a wrapper to call the core api a million times with one command. to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1 Basically your creating the shard again, after leader props have gone out. Solr will check ZK and find a core meeting that description, then simply get a copy of the index from the leader of that shard. On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote: What is the inverse I'd use to re-create/load a core on another machine but make sure it's also known to SolrCloud/as a shard? On Sat, Feb 2, 2013 at 4:01 PM, Joseph Dale joey.d...@gmail.com wrote: To be more clear lets say bob it the leader of core 1. On bob do a
Re: Is it possible to manually select a shard leader in a running SolrCloud?
The leader doesn't really do a lot more work than any of the replicas, so I don't think it's likely that important. If someone starts running into problems, that's usually when we start looking for solutions. - Mark On Feb 21, 2013, at 10:20 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: I sent this request to ServerA in this case, which became the leader of all shards. As far as I know you're supposed to issue this call to just one server as it issues the calls to the other leaders/replicas in the background, right? I am expecting the single collections API call to spread the leaders evenly across SOLR instances. Hopefully I am just doing/expecting something wrong :). Tim Vaillancourt -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, February 21, 2013 1:44 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? Which of your three hosts did you point this request at? Upayavira On Thu, Feb 21, 2013, at 09:13 PM, Vaillancourt, Tim wrote: Correction, I used this curl: curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2' So 3 instances, 3 shards, 2 replicas per shard. ServerA becomes leader of all 3 shards in 4.1 with this call. Tim Vaillancourt -Original Message- From: Vaillancourt, Tim [mailto:tvaillanco...@ea.com] Sent: Thursday, February 21, 2013 11:27 AM To: solr-user@lucene.apache.org; markrmil...@gmail.com Subject: RE: Is it possible to manually select a shard leader in a running SolrCloud? Thanks Mark, The real driver for me wanting to promote a different leader is when I create a new Collection via the Collections API across a multi-server SolrCloud, the leader of each shard is always the same host, so you're right that I'm tackling the wrong problem with this request, although it would fix it for me. If I create the cores manually via the cores API, one-by-one, I am able to get what I expect, but when running this Collections API call on a 3 SOLR 4.1 instance, 3 shard setup, 1 server becomes the leader of all 3 shards, meaning it will get all the writes for everything (correct me if I am wrong). If so, this will not scale well with all writes to one node (or correct me if I am wrong)? curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=1maxShardsPerNode=2' Currently on my 3 instance SOLR 4.1 setup, the above call creates the following: - ServerA is the leader of all 3 shards (the problem I want to address). - ServerB + ServerC are automagically replicas of the 3 leader shards on ServerA. So again, my issue is one server gets all the writes. Does anyone else encounter this? If so, I should spawn a separate thread on my specific issue. Cheers, Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, February 19, 2013 8:44 PM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? You can't easily do it the way it's implemented in ZooKeeper. We would probably internally have to do the same thing - elect a new leader and drop him until the one we wanted came up. The main thing doing it internally would gain is that you could skip the elected guy from becoming the actual leader and just move on to the next candidate. Still some tricky corner cases to deal with and such as well. I think for most things you would use this to solve, there is probably an alternate thing that should be addressed. - Mark On Mon, Feb 18, 2013 at 4:15 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: Hey all, I feel having to unload the leader core to force an election is hacky, and as far as I know would still leave which node becomes the Leader to chance, ie I cannot guarantee NodeX becomes Leader 100% in all cases. Also, this imposes additional load temporarily. Is there a way to force the winner of the Election, and if not, is there a known feature-request for this? Cheers, Tim Vaillancourt -Original Message- From: Joseph Dale [mailto:joey.d...@gmail.com] Sent: Sunday, February 03, 2013 7:42 AM To: solr-user@lucene.apache.org Subject: Re: Is it possible to manually select a shard leader in a running SolrCloud? With solrclound all cores are collections. The collections API it just a wrapper to call the core api a million times with one command. to /solr/admin/cores?action=CREATEname=core1collection=core1shard=1 Basically your creating the shard again, after leader props have gone out. Solr will check ZK and find a core meeting that description, then simply get a copy of the index from the leader of that shard. On Feb 3, 2013, at 10:37 AM, Brett Hoerner br...@bretthoerner.com wrote: What is
How do I create two collections on the same cluster?
I am using Solr 4.1. I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at boot time. After the cluster is up, I am trying to create collection2 with 2 leaders and 2 replicas just like collection1. I am using following collections API for that: http://localhost:7575/solr/admin/collections?action=CREATEname=collection2numShards=2replicationFactor=2collection.configName=myconfcreateNodeSet=localhost:8983_solr,localhost:7574_solr,localhost:7575_solr,localhost:7576_solr Yes, collection2 does get created. But I see a problem - createNodeSet parameter is not being honored. All 4 nodes are not being used to create collection2, only 3 are being used. Is this a bug or I don't understand how this parameter should be used? What is the best way to create collection2? Can I specify both collections in solr.xml in the solr home dir in all nodes and launch them? Do I have to get the configs for collection2 uploaded to zookeeper before I launch the nodes? Thanks in advance. -Shankar -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: How do I create two collections on the same cluster?
On 2/21/2013 9:50 PM, Shankar Sundararaju wrote: I am using Solr 4.1. I created collection1 consisting of 2 leaders and 2 replicas (2 shards) at boot time. After the cluster is up, I am trying to create collection2 with 2 leaders and 2 replicas just like collection1. I am using following collections API for that: http://localhost:7575/solr/admin/collections?action=CREATEname=collection2numShards=2replicationFactor=2collection.configName=myconfcreateNodeSet=localhost:8983_solr,localhost:7574_solr,localhost:7575_solr,localhost:7576_solr Yes, collection2 does get created. But I see a problem - createNodeSet parameter is not being honored. All 4 nodes are not being used to create collection2, only 3 are being used. Is this a bug or I don't understand how this parameter should be used? What is the best way to create collection2? Can I specify both collections in solr.xml in the solr home dir in all nodes and launch them? Do I have to get the configs for collection2 uploaded to zookeeper before I launch the nodes? Is your cluster comprised of only those four Solr nodes, or do you have others? If it's just those four, you should not need to tell it which ones to use, it should use all of them. You could try adding maxShardsPerNode=1 just to be sure that it won't try to put more than one shard on any one node. I did find an email thread saying that hostnames won't work in createNodeSet with Solr 4.1, because 4.1 defaults to IP addresses when each node registers with Zookeeper. Check your SolrCloud graph in the admin UI. If you see IP addresses there, you will probably have to use IP addresses in the createNodeSet parameter. You can force hostnames by including host=myhostname in the cores parameter of solr.xml and restarting Solr on that node. I'm relatively new to SolrCloud, but I'm learning. Thanks, Shawn
solr 4 fragmentsBuilder and highlightMultiTerm
how to config the solrconfig.xml to open fragmentsBuilder and highlightMultiTerm on 4.0 and 4.1 i read the documnet on wiki fragmentsBuilder name=colored class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder lst name=defaults str name=hl.tag.pre/str str name=hl.tag.post/str /lst /fragmentsBuilder but i don't know where the snippet should be placed. and how to call by url path thanks -- View this message in context: http://lucene.472066.n3.nabble.com/solr-4-fragmentsBuilder-and-highlightMultiTerm-tp4042128.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH deleting documents
I should also add that some of the books don't have chapters, so the query won't succeed for these books. But in this case I expected that the document won't be added at all .. rather than first added then deleted (which I am now suspecting is the case). It would be very helpful if I could see a list of deleted documents! I was trying to look in the terminal window (Jetty) but that did not help. I don't know where else Solr might put logs. I looked in /var/log.. but did not find anything useful looking. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-deleting-documents-tp4041811p4042149.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr as local service for .NET desktop app
I need some advanced search features for a desktop application. The application is a .NET (C#) application, so I can't use Lucene and as I'm not sure about the future of Lucene.NET I consider using Solr (with SolrNET). As I need a cache for the desktop app anyway it seems to be a good opportunity to solve two problems at once. Also, we will use Solr on the server, so we need to build know-how anyway. I currently make my way through the Solr 3 book (from Packt), so I'm a newbee to Solr. Has anyone experience with it? Are there some pitfalls I should be aware of? Deployment will be a challenge I guess. How about configuration? Can I leave Solr on a Clients machine with reasonable default settings? Many thanks, Jan
Re: get content is put in the index queue but is not committed
Thanks Cris I'm going to see both UpdateLog and RealTimeGetComponent classes, but I not sure if I could use them because I'm working with apache solr version 1.4.1, (I know is older). Anyway I'll tell you my problem. I am developing a custom class extend from UpdateRequestProcessorFactory. This class must save in database all modifications from Solr server (Add, Update and Delete actions), but save in database must happen, always, when commit event was done. My problem is, clients of solr server do explicit commit, so I receive first update event and after commit event and in this last I have to recovered docs from update event, and I wanted to know if it was possible. At the least, I am going to go another way and I will use a status field in database. Status field allow save docs in database at update event and my other process do not use them until I change value of status field on commit event. thanks very much I am learning much Solr in this list El 21/02/2013 19:34, Chris Hostetter escribió: : Anybody know how-to get content is put in the index queue but is not : committed? i'm guessing you are refering to uncommited documents in the transaction log? Take a look at the UpdateLog class, and how it's used by the RealTimeGetComponent. If you provide more details as to what you end goal is, we might be able to provide more specific (or alternative) suggestions on how to achieve your goal... https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss