using edismax without velocity
I am using solr3.6 and trying to use the edismax handler. The config has a /browse requestHandler, but it doesn't work because of missing class definition VelocityResponseWriter error. queryResponseWriter name=velocity class=solr.VelocityResponseWriter startup=lazy/ I have copied the jars to solr/lib following the steps here, but no luck http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core I just want to search on multiple fields with different boost. *Can I use edismax with the /select requestHandler?* If I write a query like below, does it search in both the fields name and description? Does the query below solves my purpose? http://localhost:8080/solr/select/?q=(coldfusion^2 cache^1)*defType=edismaxqf=name^2 description^1*fq=author:[* TO *] AND -author:chinmoypstart=0rows=10fl=author,score, id -- View this message in context: http://lucene.472066.n3.nabble.com/using-edismax-without-velocity-tp4054190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr restart is taking more than 1 hour
View this message in context: http://lucene.472066.n3.nabble.com/Solr-restart-is-taking-more-than-1-hour-tp4054165p4054189.html Sent from the Solr - User mailing list archive at Nabble.com. Nabble says that the original message hasn't made it to the mailing list yet, which explains why I only saw the reply come in. Good thing nabble sent along the URL so I could see the original question. This is almost guaranteed to be caused by a huge updateLog - the tlog directory added in version 4.0. On Solr restart, all of the tlog data that exists is replayed to ensure the index is fully up to date. When the tlog is huge, it takes a very long time. A huge tlog is normally caused by one of two things: 1) only using soft commits and never hard committing. 2) doing a very large import with the dataimport handler and not committing until the end. The solution is to do hard commits on intervals that are short (but not super short) with openSearcher set to false. A hard commit starts a new tlog and flush index data to disk. With openSearcher set to false, the hard commit will not change document visibility - deleted documents are still searchable, and new documents are not yet searchable. You can still make new content searchable with a commit (hard or soft) that has openSearcher set to true. By starting a new tlog on a regular basis, it will never get very big. Solr trims old tlogs, only keeping a few of them around. If you have only a few tlogs and they are small, it won't take very long to replay them on startup. The easiest way to do this hard commit is to have Solr do it for you automatically with the autoCommit feature. updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs25000/maxDocs maxTime30/maxTime openSearcherfalse/openSearcher /autoCommit updateLog / /updateHandler I've typed this often enough that I really need to just put it on the wiki - when the question comes up, link the article. :) Thanks, Shawn
Does solr cloud support rename or swap function for collection?
Hi, We are using solr 4.1 and we create a collection name my_data with 20 shards. Our index files are generated by using lucence api every one hour and load into solr cloud using core admin API. My problem is, for data generated every one hour, I need to create a new collection name like my_data_001 and to load the index files under that collection name. And my_data will be useless and my_data_001 is the latest data. In order to keep query url unchanged, I need to rename my_data_001 to my_data, but I can't see any collection API to do the rename or swap like core admin supports. How can I do this? thanks, Brad -- View this message in context: http://lucene.472066.n3.nabble.com/Does-solr-cloud-support-rename-or-swap-function-for-collection-tp4054193.html Sent from the Solr - User mailing list archive at Nabble.com.
how to skip test while building
Hi All, I am new to Solr . I am using solr 3.4 . I want to build without building lucene tests files in lucene and skip the tests to be fired . Can anyone please help where to make the necessary changes . Thanks, Pom
Re: using edismax without velocity
Definitely in 4.x release. Did you try it and found a problem?
Re: using edismax without velocity
Yes, qf will search in both fields and boost according. If the only reason to try velocity or even /browse was because you wanted edismax, don't bother. You can just add defType to t.he /select request handler in solrconfig, so that you don't need to add it to every request. Same for qf, if it has a common value. And you can even copy /select and create one or more new request handlers with new paths, like /my-select, if you have more than one common combination of parameter settings that you want to avoid setting on every incoming query request. -- Jack Krupansky -Original Message- From: amit Sent: Saturday, April 06, 2013 3:15 AM To: solr-user@lucene.apache.org Subject: using edismax without velocity I am using solr3.6 and trying to use the edismax handler. The config has a /browse requestHandler, but it doesn't work because of missing class definition VelocityResponseWriter error. queryResponseWriter name=velocity class=solr.VelocityResponseWriter startup=lazy/ I have copied the jars to solr/lib following the steps here, but no luck http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core I just want to search on multiple fields with different boost. *Can I use edismax with the /select requestHandler?* If I write a query like below, does it search in both the fields name and description? Does the query below solves my purpose? http://localhost:8080/solr/select/?q=(coldfusion^2 cache^1)*defType=edismaxqf=name^2 description^1*fq=author:[* TO *] AND -author:chinmoypstart=0rows=10fl=author,score, id -- View this message in context: http://lucene.472066.n3.nabble.com/using-edismax-without-velocity-tp4054190.html Sent from the Solr - User mailing list archive at Nabble.com.
Boost parameter with query function - how to pass in complex params?
See example below 1. Search for SUVs and boost Honda models q=suvboost=query({! v='honda'},1) 2. Search for SUVs and boost Honda OR toyota model a) Using OR in the query does NOT work q=suvboost=query({! v='honda or toyota'},1) b) Using two query functions and summing the boosts DOES work Works: q=suvboost=sum(query({!v='honda'},1),query({!v='toyota'},1)) Any thoughts?
Use BM25Similarity for title field and default for others
We want the effect of the field length to have a lesser influence on score for the title field (we don't want to completely disable it) -- so we get the following behavior Docs with more hits in the title rank higher Docs with shorter titles rank higher if the hits are equal. The DefaultSimilarity wasn't giving us this always (shorter titles were preferred over longer titles with more hits. Note -- we use edismax and search across title and other fields (like body) Inorder to solve this we use BM25Similarity with a small value for b for the title field. We ended up using the SchemaSimilarityFactory for the global similarity inorder to use the BM25Simiarlity for the title field. This gave us the results we are looking for with respect to the title field. We also have keyword, tag and other metadata fields and we want them to be mostly treated as filters and not influence the score at all. Because of the use of the SchemaSimilarityFactory, even though we get the DefaultSimilarity for non title fields, it is not the same as DefaultSimilarityFactory and so we have situations where the metadata fields dominate the score (because PerFieldSimilarityWrapper uses queryNorm of 1.0) We are thinking that we have the following options to fix this issue a) Use BM25Similarity for all fields and adjust the k1, b values as appropriate b) Send the metadata field clauses as part of fq instead of q (but we might have lot of dynamically generated clauses and not sure if fq is the best suited for these as we don't want them to be cached as they could vary from request to request) c) Associate a boost of zero for the metadata fields in the query d) Extend the SchemaSimilarityFactory and write custom code (at this point, I am not sure what the custom class should do) Are these correct? Do we have any other options. Any advice on what is a better option. I appreciate any inputs on this. -- View this message in context: http://lucene.472066.n3.nabble.com/Use-BM25Similarity-for-title-field-and-default-for-others-tp4054159.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr restart is taking more than 1 hour
Hi , We have 2 cores one shard with Solr 4.1. After some configuration changes when try to reload the core/restart the solr instance it is taking more than one hour.In log it says opening a searcher.(maxwarmingsearcher is 2). Can any one help us on this to resolve?? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-restart-is-taking-more-than-1-hour-tp4054165.html Sent from the Solr - User mailing list archive at Nabble.com.
How to generate multiple tokens on same position through TokenFilter
Hi All, Objective: I want to create a filter to generate multiple tokens (mentioned below) of input stream and I want to put all generated tokens at same position i.e. 1. Although there is already a tokenizer (PathHierarchyTokenizerFactory) for similar purpose but I also want my tokens to be stemmed so to achieve my objective I created a filter, please look at the source code below (I am not an Java expert, so code may not be optimized): // File: ExtendedNameFilter.java // Purpose: To combine multiple tokens such that apache solr foundation generates tokens apachsolrfoundat, solrfoundat, foundat package org.apache.lucene.analysis; import java.io.IOException; import java.util.LinkedList; import java.util.ArrayList; import java.util.Iterator; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.tokenattributes.CharTermAttribute; import org.apache.lucene.analysis.util.CharacterUtils; import org.apache.lucene.util.Version; import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute; import org.apache.lucene.analysis.tokenattributes.OffsetAttribute; public final class ExtendedNameFilter extends TokenFilter { private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); private PositionIncrementAttribute posIncAttr; private OffsetAttribute setOffsetAttr; private final int extendedWordCount; public ExtendedNameFilter(Version matchVersion, TokenStream in, int extendedWordCount) { super(in); CharacterUtils.getInstance(matchVersion); this.extendedWordCount = extendedWordCount; this.posIncAttr = addAttribute(PositionIncrementAttribute.class); this.setOffsetAttr = addAttribute(OffsetAttribute.class); } LinkedListString list = new LinkedListString(); ArrayListInteger startOffsetList = new ArrayListInteger(); int endOffset = 0; int count = 0; @Override public final boolean incrementToken() throws IOException { IteratorString iterator; int len = 0; while(input.incrementToken()) { list.add(termAtt.toString()); startOffsetList.add(setOffsetAttr.startOffset()); endOffset = setOffsetAttr.endOffset(); } iterator = list.iterator(); len = list.size(); if (len 0 (extendedWordCount 0 || count extendedWordCount)) { generateToken(iterator); return true; } else { return false; } } public void generateToken(IteratorString iterator) { termAtt.setEmpty(); while (iterator.hasNext()){ termAtt.append((CharSequence) iterator.next()); } list.removeFirst(); if(count == 0) { posIncAttr.setPositionIncrement(1); } else { posIncAttr.setPositionIncrement(0); } setOffsetAttr.setOffset(startOffsetList.get(count),endOffset); count++; } } // Code Ends On analysis page of solr it worked fine, I've shared screenshot of analysis page on google, anyone can see this by click on below link https://docs.google.com/file/d/0BxNUkIJt2ma3TUN0YUF1dW1Pc2s/edit?usp=sharinghttps://docs.google.com/file/d/0BxNUkIJt2ma3SEE2SDBLTkpETE0/edit?usp=sharing but while indexing documents Solr gives following exception: Apr 6, 2013 12:05:45 PM org.apache.solr.common.SolrException log SEVERE: java.lang.IllegalArgumentException: first position increment must be 0 (got 0) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:125) at org.apache.lucene.index.DocFieldProcessor.processDocument(DocFieldProcessor.java:254) at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:256) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:376) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:206) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:477) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:346) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at
Re: Please add me: FuadEfendi
Yeah, unfortunately we had to lock it down because of spam, but we're (well Steve seems to be on it faster than I am) are adding people back as fast as we get requests... On Fri, Apr 5, 2013 at 3:34 PM, Fuad Efendi fuad.efe...@tokenizer.ca wrote: Hi, Few months ago I was able to modify Wiki; I can't do it now, probably because http://wiki.apache.org/solr/ContributorsGroup Please add me: FuadEfendi Thanks! -- Fuad Efendi, PhD, CEO C: (416)993-2060 F: (416)800-6479 Tokenizer Inc., Canada http://www.tokenizer.ca
Re: Boost parameter with query function - how to pass in complex params?
On Sat, Apr 6, 2013 at 9:42 AM, dc tech dctech1...@gmail.com wrote: See example below 1. Search for SUVs and boost Honda models q=suvboost=query({! v='honda'},1) 2. Search for SUVs and boost Honda OR toyota model a) Using OR in the query does NOT work q=suvboost=query({! v='honda or toyota'},1) The or needs to be uppercase OR. It might also be easier to compose and read like this: q=suv boost=query($boostQ) boostQ=honda OR toyota OF course something simpler like this might also serve your primary goal: q=+suv (honda OR toyota)^10 -Yonik http://lucidworks.com
Re: how to skip test while building
Don't know a good way to skip compiling the tests, but there isn't any harm in compiling them... changing to the solr directory and just issuing ant example dist builds pretty much everything. You don't execute tests unless you specify ant test. ant -p shows you all the targets. Note that you have different targets depending on whether you're executing it in solr_home or solr_home/solr or solr_home/lucene. Since you mention Solr, you probably want to work in solr_home/solr to start. Best Erick On Sat, Apr 6, 2013 at 5:36 AM, parnab kumar parnab.2...@gmail.com wrote: Hi All, I am new to Solr . I am using solr 3.4 . I want to build without building lucene tests files in lucene and skip the tests to be fired . Can anyone please help where to make the necessary changes . Thanks, Pom
Re: using edismax without velocity
In fact, just remove or comment these lines from the /browse handler and you won't be using velocity, it might make a good place to start !-- VelocityResponseWriter settings -- str name=wtvelocity/str str name=v.templatebrowse/str str name=v.layoutlayout/str str name=titleSolritas/str Best Erick On Sat, Apr 6, 2013 at 6:55 AM, Jack Krupansky j...@basetechnology.com wrote: Yes, qf will search in both fields and boost according. If the only reason to try velocity or even /browse was because you wanted edismax, don't bother. You can just add defType to t.he /select request handler in solrconfig, so that you don't need to add it to every request. Same for qf, if it has a common value. And you can even copy /select and create one or more new request handlers with new paths, like /my-select, if you have more than one common combination of parameter settings that you want to avoid setting on every incoming query request. -- Jack Krupansky -Original Message- From: amit Sent: Saturday, April 06, 2013 3:15 AM To: solr-user@lucene.apache.org Subject: using edismax without velocity I am using solr3.6 and trying to use the edismax handler. The config has a /browse requestHandler, but it doesn't work because of missing class definition VelocityResponseWriter error. queryResponseWriter name=velocity class=solr.VelocityResponseWriter startup=lazy/ I have copied the jars to solr/lib following the steps here, but no luck http://wiki.apache.org/solr/VelocityResponseWriter#Using_the_VelocityResponseWriter_in_Solr_Core I just want to search on multiple fields with different boost. *Can I use edismax with the /select requestHandler?* If I write a query like below, does it search in both the fields name and description? Does the query below solves my purpose? http://localhost:8080/solr/select/?q=(coldfusion^2 cache^1)*defType=edismaxqf=name^2 description^1*fq=author:[* TO *] AND -author:chinmoypstart=0rows=10fl=author,score, id -- View this message in context: http://lucene.472066.n3.nabble.com/using-edismax-without-velocity-tp4054190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need Help for schema definition
Hello, Is somebody kind enough to help me, at least by giving some direction for my research. Regards Le 05/04/2013 15:59, contact_pub...@mail-impact.com a écrit : Hi all, well i'm totally newbies on solr, and I need some help. Ok raw definition of my needs : I have a product database, with ordinary fields to describe a product. Name, reference, description, large description, product specifications, categories etc... The needs : 1 - Being able to search thought product name, description, specification, reference 2 - Being able to find quickly all product from a category. For now it gave me more result. 3 - being able to find in a result set all the facets corresponding to the product specification (ex : number of products in wood, number of product having a diameter of 20cm or in a range). I look for a automatic process tell me the 5 most present specification in the result set and the number of product for each. 4 - last but not least : I have a particular type of product (spare parts) for them I need to be able to : - find the by brand - find the by name - find the by reference - compatibe model : as compatible model in the description field and need to be treated with regular expression to make a list of the different compatible model (org. text ex : spare part for *Pompe HG large model, model HGS v5) * I use*d* databaseimporthandler, to retreive the data, and it seem to be good for the first range of product, however I need to ajust the tokeniser and filter because it's to strict for now. For the second set of data, I create a second entity in the data-import-config.xml adding the data in the same fields, but it doesn't feet my needs as the results are mixed and I can't select a specific entity to search into. Thanks in advance for your help David
Sharing index amongst multiple nodes
Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks.
Re: Need Help for schema definition
On 6 April 2013 20:58, contact_pub...@mail-impact.com contact_pub...@mail-impact.com wrote: Hello, Is somebody kind enough to help me, at least by giving some direction for my research. Your questions are too broad, and lack sufficient detail for someone to be able to help you without asking more questions about each area. It would help if you provided more details, e.g., what are the relationships between various entities, and what the various fields mean. Ideally, you would tell us what you have tried, and what is not working for you. Please provide details about the schema, and what queries you are making, and what the expected results should be. On the face of it 1-3 should be straightforward with Solr, but I am unable to make sense out of 4. Regards, Gora
Re: Need Help for schema definition
thanks for your reply. So on point 1 - I have been able to enter the data in the database and query them correctly. For now the result remain to strict. 2 - ok but not enough strict. 3 - here is my set of data. From an sgbd I have the folowing scema : Feature name = Diameter Feature value = 20 feature name = color feautre value = blue etc... There is numbers of feature name with different feature values. While importing the data in Solr, I made a multivalues field features with feature name : feature value ( I suppose this is not the right manner to proceed). Feature name are of 200 and change/add often. And I wish to be able to request the 5 most popular feature name with all the different features values and for each feature values the number of match (facets) request = cat:swimming pool facets=true fq=feature?? expected result = 40 product found list of product Facets = - Width - height - with liner - color width = 6m (10) 9m (15) 12m(24) 50m(1) height = . where the number between parenthesis is the number of product. regards Le 06/04/2013 18:26, Gora Mohanty a écrit : On 6 April 2013 20:58, contact_pub...@mail-impact.com contact_pub...@mail-impact.com wrote: Hello, Is somebody kind enough to help me, at least by giving some direction for my research. Your questions are too broad, and lack sufficient detail for someone to be able to help you without asking more questions about each area. It would help if you provided more details, e.g., what are the relationships between various entities, and what the various fields mean. Ideally, you would tell us what you have tried, and what is not working for you. Please provide details about the schema, and what queries you are making, and what the expected results should be. On the face of it 1-3 should be straightforward with Solr, but I am unable to make sense out of 4. Regards, Gora
Re: Solr metrics in Codahale metrics and Graphite?
Wow, that really doesn't help at all, since these seem to only be reported in the stats page. I don't need another non-standard app-specific set of metrics, especially one that needs polling. I need metrics delivered to the common system that we use for all our servers. This is also why SPM is not useful for us, sorry Otis. Also, there is no time period on these stats. How do you graph the 95th percentile? I know there was a lot of work on these, but they seem really useless to me. I'm picky about metrics, working at Netflix does that to you. wunder On Apr 3, 2013, at 4:01 PM, Walter Underwood wrote: In the Jira, but not in the docs. It would be nice to have VM stats like GC, too, so we can have common monitoring and alerting on all our services. wunder On Apr 3, 2013, at 3:31 PM, Otis Gospodnetic wrote: It's there! :) http://search-lucene.com/?q=percentilefc_project=Solrfc_type=issue Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, Apr 3, 2013 at 6:29 PM, Walter Underwood wun...@wunderwood.org wrote: That sounds great. I'll check out the bug, I didn't see anything in the docs about this. And if I can't find it with a search engine, it probably isn't there. --wunder On Apr 3, 2013, at 6:39 AM, Shawn Heisey wrote: On 3/29/2013 12:07 PM, Walter Underwood wrote: What are folks using for this? I don't know that this really answers your question, but Solr 4.1 and later includes a big chunk of codahale metrics internally for request handler statistics - see SOLR-1972. First we tried including the jar and using the API, but that created thread leak problems, so the source code was added. Thanks, Shawn
Re: Sharing index amongst multiple nodes
I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.comwrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks.
Re: how to skip test while building
If you generate the maven pom files you can do this I think by doing mvn whtaever here -DskipTests=true. On Sat, Apr 6, 2013 at 7:25 AM, Erick Erickson erickerick...@gmail.comwrote: Don't know a good way to skip compiling the tests, but there isn't any harm in compiling them... changing to the solr directory and just issuing ant example dist builds pretty much everything. You don't execute tests unless you specify ant test. ant -p shows you all the targets. Note that you have different targets depending on whether you're executing it in solr_home or solr_home/solr or solr_home/lucene. Since you mention Solr, you probably want to work in solr_home/solr to start. Best Erick On Sat, Apr 6, 2013 at 5:36 AM, parnab kumar parnab.2...@gmail.com wrote: Hi All, I am new to Solr . I am using solr 3.4 . I want to build without building lucene tests files in lucene and skip the tests to be fired . Can anyone please help where to make the necessary changes . Thanks, Pom
Empty term vector component result with Solr 4.2
Hi all, I'm doing a test a migration from Solr 3.6.2 to Solr 4.2 and cannot make work the term vector component. Was not able to find changes in the TVC configuration in docs, so used approach, same as on my 3.6 server, where it works fine. Relevant field in schema.xml is configured like this: field indexed=true name=content stored=true termOffsets=true termPositions=true termVectors=true type=text_general_all/ In the solrconfig.xml I have (I tried also a configuration from example/solrconfig.xml, bundled with Solr 4.2 but result is the same): searchComponent name=tvComponent class=org.apache.solr.handler.component.TermVectorComponent/ requestHandler name=tvrh class=org.apache.solr.handler.component.SearchHandler lst name=defaults bool name=tvtrue/bool /lst arr name=last-component strtvComponent/str /arr /requestHandler But, when I perform request to a server like this: http://test.farm:8080/solr/TestCorpus/select/?q=content%3A*start=0rows=1indent=onqt=tvrhtv=truetv.tf=truetv.df=truetv.positionstv.offsets=true I'm getting result which does not contain any TVC fields: response lst name=responseHeader int name=status0/int int name=QTime49/int /lst result name=response numFound=3877 start=0 doc.../doc /result /response On the old 3.6 server with same request I'm getting the: response lst name=responseHeader int name=status0/int int name=QTime18/int /lst result name=response numFound=80698 start=0.../result lst name=termVectors.../lst /response Could you please help me to find out what is wrong. Regards, Yakov
Re: Sharing index amongst multiple nodes
Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks.
Re: Sharing index amongst multiple nodes
This is precisely how Solr replication works. It copies the indexes then does a commit. wunder On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote: Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks. -- Walter Underwood wun...@wunderwood.org
Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Hi; First of all should mention that I am new to Solr and making a research about it. What I am trying to do that I will crawl some websites with Nutch and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 ) I wonder about something. I have a cloud of machines that crawls websites and stores that documents. Then I send that documents into SolrCloud. Solr indexes that documents and generates indexes and save them. I know that from Information Retrieval theory: it *may* not be efficient to store indexes at a NoSQL database (they are something like linked lists and if you store them in such kind of database you *may* have a sparse representation -by the way there may be some solutions for it. If you explain them you are welcome.) However Solr stores some documents too (i.e. highlights) So some of my documents will be doubled somehow. If I consider that I will have many documents, that dobuled documents may cause a problem for me. So is there any way not storing that documents at Solr and pointing to them at Hbase(where I save my crawled documents) or instead of pointing directly storing them at Hbase (is it efficient or not)?
Re: Sharing index amongst multiple nodes
Hi Walter; I am new to Solr and digging into code to understand it. I think that when indexer copies indexes, before the commit it is unsearchable. Where exactly that commit occurs at code and can I say that: rollback something because I don't want that indexes (reason maybe anything else, maybe I will decline some indexes(index filtering) because of the documents they points. Is it possible? 2013/4/7 Walter Underwood wun...@wunderwood.org This is precisely how Solr replication works. It copies the indexes then does a commit. wunder On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote: Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks. -- Walter Underwood wun...@wunderwood.org
Re: Sharing index amongst multiple nodes
Indexing happens on one Solr server. After a commit, the documents are searchable. In Solr 4, there is a soft commit, which makes the documents searchable, but does not create on-disk indexes. Solr replication copies the committed indexes to another Solr server. Solr Cloud uses a transaction log to make documents available before a hard commit. Solr does not have rollback. A commit succeeds or fails. After it succeeds, there is no going back. wunder On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote: Hi Walter; I am new to Solr and digging into code to understand it. I think that when indexer copies indexes, before the commit it is unsearchable. Where exactly that commit occurs at code and can I say that: rollback something because I don't want that indexes (reason maybe anything else, maybe I will decline some indexes(index filtering) because of the documents they points. Is it possible? 2013/4/7 Walter Underwood wun...@wunderwood.org This is precisely how Solr replication works. It copies the indexes then does a commit. wunder On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote: Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks. -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Sharing index amongst multiple nodes
Hi Walter; Thanks for your explanation. You said Indexing happens on one Solr server. Is it true even for SolrCloud? 2013/4/7 Walter Underwood wun...@wunderwood.org Indexing happens on one Solr server. After a commit, the documents are searchable. In Solr 4, there is a soft commit, which makes the documents searchable, but does not create on-disk indexes. Solr replication copies the committed indexes to another Solr server. Solr Cloud uses a transaction log to make documents available before a hard commit. Solr does not have rollback. A commit succeeds or fails. After it succeeds, there is no going back. wunder On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote: Hi Walter; I am new to Solr and digging into code to understand it. I think that when indexer copies indexes, before the commit it is unsearchable. Where exactly that commit occurs at code and can I say that: rollback something because I don't want that indexes (reason maybe anything else, maybe I will decline some indexes(index filtering) because of the documents they points. Is it possible? 2013/4/7 Walter Underwood wun...@wunderwood.org This is precisely how Solr replication works. It copies the indexes then does a commit. wunder On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote: Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks. -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Sharing index amongst multiple nodes
In Solr Cloud, a document is indexed on the shard leader. The replicas in that shard get the document and add it to their indexes. There is some indexing that happens on the replicas, but that is managed by Solr. wunder On Apr 6, 2013, at 3:58 PM, Furkan KAMACI wrote: Hi Walter; Thanks for your explanation. You said Indexing happens on one Solr server. Is it true even for SolrCloud? 2013/4/7 Walter Underwood wun...@wunderwood.org Indexing happens on one Solr server. After a commit, the documents are searchable. In Solr 4, there is a soft commit, which makes the documents searchable, but does not create on-disk indexes. Solr replication copies the committed indexes to another Solr server. Solr Cloud uses a transaction log to make documents available before a hard commit. Solr does not have rollback. A commit succeeds or fails. After it succeeds, there is no going back. wunder On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote: Hi Walter; I am new to Solr and digging into code to understand it. I think that when indexer copies indexes, before the commit it is unsearchable. Where exactly that commit occurs at code and can I say that: rollback something because I don't want that indexes (reason maybe anything else, maybe I will decline some indexes(index filtering) because of the documents they points. Is it possible? 2013/4/7 Walter Underwood wun...@wunderwood.org This is precisely how Solr replication works. It copies the indexes then does a commit. wunder On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote: Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks. -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Sharing index amongst multiple nodes
My last questions. 1) If I sent document to a replica does it pass document to shard leader and do you mean that even if I send document to shard leader does it can pass that document one of replicas to be indexed. 2) Does it possible to copy a shard into another shard, or merge them? By the way thanks for your explanations. 2013/4/7 Walter Underwood wun...@wunderwood.org In Solr Cloud, a document is indexed on the shard leader. The replicas in that shard get the document and add it to their indexes. There is some indexing that happens on the replicas, but that is managed by Solr. wunder On Apr 6, 2013, at 3:58 PM, Furkan KAMACI wrote: Hi Walter; Thanks for your explanation. You said Indexing happens on one Solr server. Is it true even for SolrCloud? 2013/4/7 Walter Underwood wun...@wunderwood.org Indexing happens on one Solr server. After a commit, the documents are searchable. In Solr 4, there is a soft commit, which makes the documents searchable, but does not create on-disk indexes. Solr replication copies the committed indexes to another Solr server. Solr Cloud uses a transaction log to make documents available before a hard commit. Solr does not have rollback. A commit succeeds or fails. After it succeeds, there is no going back. wunder On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote: Hi Walter; I am new to Solr and digging into code to understand it. I think that when indexer copies indexes, before the commit it is unsearchable. Where exactly that commit occurs at code and can I say that: rollback something because I don't want that indexes (reason maybe anything else, maybe I will decline some indexes(index filtering) because of the documents they points. Is it possible? 2013/4/7 Walter Underwood wun...@wunderwood.org This is precisely how Solr replication works. It copies the indexes then does a commit. wunder On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote: Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks. -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Does solr cloud support rename or swap function for collection?
4.2 and 4.2.1 have collection aliasing (similar to what we had with SolrCore aliasing at one point). You can use that to have one url and swap the collection search by it by the scenes. - Mark On Apr 6, 2013, at 6:28 AM, bradhill99 bradhil...@yahoo.com wrote: Hi, We are using solr 4.1 and we create a collection name my_data with 20 shards. Our index files are generated by using lucence api every one hour and load into solr cloud using core admin API. My problem is, for data generated every one hour, I need to create a new collection name like my_data_001 and to load the index files under that collection name. And my_data will be useless and my_data_001 is the latest data. In order to keep query url unchanged, I need to rename my_data_001 to my_data, but I can't see any collection API to do the rename or swap like core admin supports. How can I do this? thanks, Brad -- View this message in context: http://lucene.472066.n3.nabble.com/Does-solr-cloud-support-rename-or-swap-function-for-collection-tp4054193.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sharing index amongst multiple nodes
A document sent to any Solr Cloud node will be sent to the right place. Shard merging and splitting is not supported now. There is work on shard splitting: https://issues.apache.org/jira/browse/SOLR-3755 wunder On Apr 6, 2013, at 4:15 PM, Furkan KAMACI wrote: My last questions. 1) If I sent document to a replica does it pass document to shard leader and do you mean that even if I send document to shard leader does it can pass that document one of replicas to be indexed. 2) Does it possible to copy a shard into another shard, or merge them? By the way thanks for your explanations. 2013/4/7 Walter Underwood wun...@wunderwood.org In Solr Cloud, a document is indexed on the shard leader. The replicas in that shard get the document and add it to their indexes. There is some indexing that happens on the replicas, but that is managed by Solr. wunder On Apr 6, 2013, at 3:58 PM, Furkan KAMACI wrote: Hi Walter; Thanks for your explanation. You said Indexing happens on one Solr server. Is it true even for SolrCloud? 2013/4/7 Walter Underwood wun...@wunderwood.org Indexing happens on one Solr server. After a commit, the documents are searchable. In Solr 4, there is a soft commit, which makes the documents searchable, but does not create on-disk indexes. Solr replication copies the committed indexes to another Solr server. Solr Cloud uses a transaction log to make documents available before a hard commit. Solr does not have rollback. A commit succeeds or fails. After it succeeds, there is no going back. wunder On Apr 6, 2013, at 3:08 PM, Furkan KAMACI wrote: Hi Walter; I am new to Solr and digging into code to understand it. I think that when indexer copies indexes, before the commit it is unsearchable. Where exactly that commit occurs at code and can I say that: rollback something because I don't want that indexes (reason maybe anything else, maybe I will decline some indexes(index filtering) because of the documents they points. Is it possible? 2013/4/7 Walter Underwood wun...@wunderwood.org This is precisely how Solr replication works. It copies the indexes then does a commit. wunder On Apr 6, 2013, at 2:40 PM, Furkan KAMACI wrote: Hi Daire Mac Mathúna; If there is a way copying one Solr's indexes into another Solr instance, this may also solve the problem. Somebody generates indexes and some of other instances could get a copy of them. At synchronizing process you may eliminate some of indexes at reader instance. So you can filter something to become unsearchable. *This may not be efficient and good thing and maybe solved with built-in functionality somehow.* However I think somebody may need that mechanism. 2013/4/6 Amit Nithian anith...@gmail.com I don't understand why this would be more performant.. seems like it'd be more memory and resource intensive as you'd have multiple class-loaders and multiple cache spaces for no good reason. Just have a single core with sufficiently large caches to handle your response needs. If you want to load balance reads consider having multiple physical nodes with a master/slaves or SolrCloud. On Sat, Apr 6, 2013 at 9:21 AM, Daire Mac Mathúna daire...@gmail.com wrote: Hi. Wat are the thoughts on having multiple SOLR instances i.e. multiple SOLR war files, sharing the same index (i.e. sharing the same solr_home) where only one SOLR instance is used for writing and the others for reading? Is this possible? Is it beneficial - is it more performant than having just one solr instance? How does it affect auto-commits i.e. how would the read nodes know the index has been changed and re-populate cache etc.? Sole 3.6.1 Thanks. -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: It seems a issue of deal with chinese synonym for solr
Hi Kuro Kurosaka, Thanks for your attention. It must Chinese query can reproduce this problem. because English word is seperate by space. If I search 北京市 动物园, I inserted a space in the query. The query would be parsed to +(北京市 北京) +动物园, which is expected. So the Chinese query can aslo work only if I insert space to seperate words. The query parser I used is ik-analyzer: http://code.google.com/p/ik-analyzer/ Thanks, Wei Li -- Original -- From: Kuro Kurosakakuro...@sonic.net; Date: Thu, Apr 4, 2013 02:53 AM To: solr-usersolr-user@lucene.apache.org; Cc: 李威li...@antvision.cn; 罗佳luo...@antvision.cn; 李景泽lijin...@antvision.cn; Subject: Re: It seems a issue of deal with chinese synonym for solr On 3/11/13 6:15 PM, 李威 wrote: in org.apache.solr.parser.SolrQueryParserBase, there is a function: protected Query newFieldQuery(Analyzer analyzer, String field, String queryText, boolean quoted) throws SyntaxError The below code can't process chinese rightly. BooleanClause.Occur occur = positionCount 1 operator == AND_OPERATOR ? BooleanClause.Occur.MUST : BooleanClause.Occur.SHOULD; For example, “北京市 and “北京 are synonym, if I seach 北京市动物园, the expected parse result is +(北京市 北京) +动物园, but actually it would be parsed to +北京市 +北京 +动物园. The code can process English, because English word is seperate by space, and only one position. An interesting feature of this example is that difference between the two synonyms is omission of one token 市 (city). Doesn't the same same problem happen if we define London City and London as synonyms, and execute a query like London City Zoo? Must Chinese Analyzer be used to reproduce this problem? I tried to test this but I couldn't. The result of query string expansion using Solr 4.2's query interface with debug output shows: str name=parsedqueryMultiPhraseQuery(text:(london london) city zoo)/str I see no plus (+). What query parser did you use? -- Kuro Kurosaka
Re: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase
Solr would not be storing the original source form of the documents in any case. Whether you use Tika or SolrCell, only the text stream of the content and the metadata would ever get indexed or stored in Solr. Solr completely decouples indexing and storing of data values. If you don't want to store the text stream in Solr, then don't. If you want to store the original blob of the source documents in some other data store, that's your choice. You can store the original URL or a document ID or URL for some alternate document store. That's your choice to make. Solr in no way forces you one way or the other. And whether that URL or document ID refers to HBase or a web site, doesn't matter to Solr either. Whether or not you could more efficiently store the original document bytes in Lucene/Solr DocValues vs. HBase is a separate matter - I don't know one way or the other whether DocValues help or not. Or whether a Solr BinaryField might be suitable for store the original bytes of a document (but without indexing the bytes.) In other words, maybe you could just use two separate Solr servers, one for text index and metadata store, and the other for raw store of the original document bytes. -- Jack Krupansky -Original Message- From: Furkan KAMACI Sent: Saturday, April 06, 2013 6:01 PM To: solr-user@lucene.apache.org Subject: Pointing to Hbase for Docuements or Directly Saving Documents at Hbase Hi; First of all should mention that I am new to Solr and making a research about it. What I am trying to do that I will crawl some websites with Nutch and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 ) I wonder about something. I have a cloud of machines that crawls websites and stores that documents. Then I send that documents into SolrCloud. Solr indexes that documents and generates indexes and save them. I know that from Information Retrieval theory: it *may* not be efficient to store indexes at a NoSQL database (they are something like linked lists and if you store them in such kind of database you *may* have a sparse representation -by the way there may be some solutions for it. If you explain them you are welcome.) However Solr stores some documents too (i.e. highlights) So some of my documents will be doubled somehow. If I consider that I will have many documents, that dobuled documents may cause a problem for me. So is there any way not storing that documents at Solr and pointing to them at Hbase(where I save my crawled documents) or instead of pointing directly storing them at Hbase (is it efficient or not)?