Re: Problem with XML encode UFT-8
Hi, Attachments may not work on the mailing lists. Paste the code into email or provide a link. May it be your Python code not handling UTF-8 strings correctly? Can you paste some relevant lines from the Solr log? If you start solr with Jetty, you can use java -jar start.jar and get the log right in your console. The same for Tomcat would be bin/catalina.sh run -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 23. feb. 2011, at 13.29, jayronsoares wrote: Hi Jan, I appreciate you attention. I've tried to answer your questions to the best of my knowledge. 2011/2/22 Jan Høydahl / Cominvent [via Lucene] ml-node+2551500-1071759141-363...@n3.nabble.com Hi, Please explain some more. a) What version of Solr? Solr version 1.4 b) Are you trying to feed XML or PDF? XML via solrpy c) What request handler are you feeding to? /update or /update/extract ? I don't know, see the example attached d) Can you copy/paste some more lines from the error log? I'm attaching one example, so you can test for yourself. Thanks for your help. Cheers jayron -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 21. feb. 2011, at 15.02, jayronsoares wrote: Hi I'm using solr py to stored files in pdf, however at moment of run script, shows me that issue: An invalid XML character (Unicode: 0xc) was found in the element content of the document. Someone could give some help? cheers jayron -- View this message in context: http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2545020.htmlhttp://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2545020.html?by-user=t Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2551500.html To unsubscribe from Any new python libraries?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=493419code=amF5cm9uc29hcmVzQGdtYWlsLmNvbXw0OTM0MTl8MTExMzU0MzU1Mw==. -- A Vida é arte do Saber...Quem quiser saber tem que viver! http://bucolick.tumblr.com http://artecultural.wordpress.com/ -- View this message in context: http://lucene.472066.n3.nabble.com/Any-new-python-libraries-tp493419p2559636.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem in full query searching
Hi sir, My problem is that when i am searching a string software engineering institute in query then i am not getting those documents first which have complete text matching in them. There are documents which have complete text matching but they are not appearing above in the result. I want the results like that first complete string matching after that 2 word matching and at last any word matching. I am using dismax request handler. I also studied about Term Proximity but its also not working for me. I have sorted on score desc to result. After analyzing i observed that the documents which don't have complete text in it but they have more occurrence of 3 or 2 or 1 words in its body text due to this they are getting higher score. Is there any way to get high score for those documents which have complete text matching instead of more occurrences of any word. Please suggest me. -- Thanks and Regards Bagesh Sharma -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566054.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: UpdateProcessor and copyField
Hi, I'd also like a more powerful/generic CopyField. Today copyField always copies after UpdateChain and before analysis. Refactoring it as an UP (using SOLR-2370 to include it as part of default chain) would let us specify before UpdateChain in addition. But how could we get it to copy after analysis? Imagine these lines in schema.xml: copyField source=my_raw_keywords dest=keywords when=preUpdate append=true / copyField source=my_raw_keywords2 dest=keywords when=preUpdate append=true / copyField source=keywords dest=keywords_facet / // Default when=preAnalysis copyField source=keywords dest=keywords_stemmed / copyField source=keywords_stemmed dest=all_stemmed when=postAnalysis append=true / This would read in two source fields and merge them into the keywords field before UpdateChain is run. UpdateChain may do various magic with the field, and then before analysis it is copied to two fields, for facet and a stemmed version. After analysis we copy the stemmed field to another stemmed field (must be same field Class and multiValued of course). The PostAnalysis copying would also allow for some advanced hacking by copying results of different fieldTypes into one, enabling the usecase of lemmatization by expansion on the index side and thus querying multiple languages in the one and same field. From my understanding, the RunUpdateProcessor is one monolithic beast passing the doc along for analysis and indexing. Would it be possible to split it in two, one AnalysisUpdateProcessor and one IndexUpdateProcessor? Chris, for the custom field manipulations in custom UpdateChains it makes sense with a FieldManipulator UpdateProcessor which can be inserted wherever you like, and depending on use case. I believe this can/should exist independently from a refactoring of copyField -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 24. feb. 2011, at 03.16, Chris Hostetter wrote: : Maybe copy fields should be refactored to happen in a new, core, : update processor, so there is nothing special/awkward about them? It : seems they fit as part of what an update processor is all about, : augmenting/modifying incoming documents. : : Seems reasonable. : By default, the copyFields could be read from the schema for back : compat (and the fact that copyField does feel more natural in the : schema) As someone who has written special case UpdateProcessors that clone field values, I agree that it would be handy to have a new generic CopyFieldUpdateProcessor but i'm not really on board the idea of it reading copyField .. / declarations by default. the ideas really serve differnet purposes... * as an UpdateProcessor it's something that can be adjusted/configured/overridden on a use cases basis - some request handlers could be confgured to use a processor chain that includes the CopyFieldUpdateProcessor and some could be configured not to. * schema copyField declarations are things hat happen to *every* document, regardless of where it comes from. the use cases would be very differnet: consider a schema with many differnet fields specific to certain types of documents, as well as a few required fields that every type of document must have: title, description, body, and maintext fields. it might make sense for to use differnet processor chains along with a CopyFieldUpdateProcessor to clone some some other fields (say: an dust_jacked_text field for books, and a plot_summary field for movies) into the description field when those docs are indexed -- but if you absolutely positively *allways* wanted the contents of title, description, and body to be copied into the maintext field that would make more sense as a schema.xml declaration. likewise: it would be handy t have an UpdateProcessor that rejected documents that were missing some fields -- but that would not be a true substitute for using required=true on a field in the schema.xml. a single index may have multiple valid processor chains for differnet indexing situations -- but rules declared in the schema.xml are absolute and can not be circumvented. -Hoss
Re: disable replication in a persistent way
I think all of this should be adapted for SolrCloud. ZK should be the one knowing which is master and slave. ZK should know whether replication on a slave is disabled or not. To disable replication it should be enough to set a new value in ZK, and the node will be notified and change behaviour at next poll. Thus, in a ZK environment we'll not need the replicationHandler section of solrconfig.xml at all, as it should be stored in distinct ZK nodes, not? We somehow have to refactor this to work seamlessly with and without ZK. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 24. feb. 2011, at 05.10, Otis Gospodnetic wrote: Hi, - Original Message From: Ahmet Arslan iori...@yahoo.com Subject: disable replication in a persistent way Hello, solr/replication?command=disablepoll disables replication on slave(s). However it is not persistent. After solr/tomcat restart, slave(s) will continue polling. Is there a built-in way to disable replication on slave side in a persistent manner? Not that I know of. Hoss or somebody else will correct me if I'm wrong :) Currently I am using system property substitution along with solrcore.properties file to simulate this. lst name=slave str name=enable${enable.slave:false}/str #solrcore.properties in slave enable.master=true And modify solrcore.properties with a custom solr request handler after the disablepoll command, to make it persistent. It seems that there is no existing mechanism to write solrconfig.properties file, am I correct? What about modifying the existing classes (the one/ones that handle the disablepoll command) to take another param: persist=true|false ? Would that be better than a custom Solr request handler that requires a separate call? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: custom query parameters
I would probably try the SearchComponent route first, translating input into DisMax speak. But if you have a completely different query language, a QParserPlugin could be the way to go. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 24. feb. 2011, at 06.26, Michael Moores wrote: Trying to answer my own question.. seems like it would be a good idea to create a SearchComponent and add this to the list of existing components. My component just converts query parameters to something that the solr QueryComponent understands. One good way of doing it? On Feb 23, 2011, at 8:12 PM, Michael Moores wrote: I'm required to provide a handler with some specialized query string inputs. I'd like to translate the query inputs to a lucene/solr query and delegate the request to the existing lucene/dismax handler. What's the best way to do this? Do I implement SolrRequestHandler, or a QParser? Do I extend the existing StandardRequestHandler? thanks, --Michael
Re: Problem in full query searching
Try to configue more waight on ps and pf parameters of dismax request handler to boost phrase matching documents. Or if you do not want to consider the term frequency then use omitTermFreqAndPositions=true in field definition - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566230.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem when search grouping word
may synanym will help - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566548.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: problem when search grouping word
may synonym will help - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566550.html Sent from the Solr - User mailing list archive at Nabble.com.
Make syntax highlighter caseinsensitive
Hi, I got an index where I have two fields, body and caseInsensitiveBody. Body is indexed and stored while caseInsensitiveBody is just indexed. The idea is that by not storing the caseInsensitiveBody I save some space and gain some performance. So I query against the caseInsensitiveBody and generate highlighting from the case sensitive one. The problem is that as a result, I am missing highlighting terms. For example, when I search for solr and get a match in caseInsensitiveBody for solr but that it is Solr in the original document, no highlighting is done. Is there a way around this? Currently I am using the following highlighting params: 'hl' = 'on', 'hl.fl' = 'header,body', 'hl.usePhraseHighlighter' = 'true', 'hl.highlightMultiTerm' = 'true', 'hl.fragsize' = 200, 'hl.regex.pattern' = '[-\w ,/\n\\']{20,200}', Regards / Med vennlig hilsen Tarjei Huse
Re: Special Circumstances for embedded Solr
Can you please show me how an http implementation of solrj querying can be converted to one for embedded solr with the help of an example? -- View this message in context: http://lucene.472066.n3.nabble.com/Special-Circumstances-for-embedded-Solr-tp833409p2566768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: embedding solr
How does the SolrParams fill up directly? Shouldn't it be SolrQueryRequest and not SolrParams, if I am not mistaken? -- View this message in context: http://lucene.472066.n3.nabble.com/embedding-solr-tp476484p2566785.html Sent from the Solr - User mailing list archive at Nabble.com.
Filter Query
Hi, I know Filter Query is really useful due to caching but I am confused about how it filter results. Lets say I have following criteria Text:: Abc def Date: 24th Feb, 2011 Now abc def might be coming in almost every document but if SOLR first filters based on date it will have to do search only on few documents (instead of millions) If I put Date parameter in fq would it be first filtering on date and then doing text search or both of them would be filtered separately and then intersection? If its filtered separately the issue would be that lets say abd def takes 20 secs on all documents (without any filters - due to large # of documents) and it will be still taking same time but if its done only on few documents on that specific date it would be super fast. If fq doesn't give what I am looking for, is there any other parameter? There should be a way as this is a very common scenario. -- Regards, Salman Akram
Re: problem when search grouping word
There are many product names. How could I list them all, and the list is growing fast as well? On Thu, Feb 24, 2011 at 5:25 PM, Grijesh pintu.grij...@gmail.com wrote: may synonym will help - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/problem-when-search-grouping-word-tp2566499p2566550.html Sent from the Solr - User mailing list archive at Nabble.com. -- Chhorn Chamnap http://chamnapchhorn.blogspot.com/
synonym.txt
Hi, I have a doubt regarding query time synonym expansion that whether the changes apply after index creation for synonym.txt will work or not? or it will refer to initial synonym. txt present at index time. Thanks! Isha Garg
Re: Filter Query
Salman, afaik, the Query is executed first and afterwards FilterQuery steps in Place .. so it's only an additional Filter on your Results. Recommended Wiki-Pages on FilterQuery: * http://wiki.apache.org/solr/CommonQueryParameters#fq * http://wiki.apache.org/solr/FilterQueryGuidance Regards Stefan On Thu, Feb 24, 2011 at 12:46 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, I know Filter Query is really useful due to caching but I am confused about how it filter results. Lets say I have following criteria Text:: Abc def Date: 24th Feb, 2011 Now abc def might be coming in almost every document but if SOLR first filters based on date it will have to do search only on few documents (instead of millions) If I put Date parameter in fq would it be first filtering on date and then doing text search or both of them would be filtered separately and then intersection? If its filtered separately the issue would be that lets say abd def takes 20 secs on all documents (without any filters - due to large # of documents) and it will be still taking same time but if its done only on few documents on that specific date it would be super fast. If fq doesn't give what I am looking for, is there any other parameter? There should be a way as this is a very common scenario. -- Regards, Salman Akram
Re: synonym.txt
Isha, Solr will use the currently loaded synonyms-file, so no relation to synonyms-file-content which was used while indexing. But to refresh the used synonyms you'll have to restart your java-process (in singlecore mode) or to reload your core-configuration (otherwise) Regards Stefan On Thu, Feb 24, 2011 at 12:58 PM, Isha Garg isha.g...@orkash.com wrote: Hi, I have a doubt regarding query time synonym expansion that whether the changes apply after index creation for synonym.txt will work or not? or it will refer to initial synonym. txt present at index time. Thanks! Isha Garg
Question Solr Index main in RAM
Hi, My name is Felipe and i want to use the index main of solr in RAM memory. How it's possible? I have solr 1.4 Thank you! Felipe
Re: Filter Query
Yea I had an idea about that... Now logically speaking main text search should be in the Query filter so there is no way to first filter based on meta data and then do text search on that limited data set? Thanks! On Thu, Feb 24, 2011 at 5:24 PM, Stefan Matheis matheis.ste...@googlemail.com wrote: Salman, afaik, the Query is executed first and afterwards FilterQuery steps in Place .. so it's only an additional Filter on your Results. Recommended Wiki-Pages on FilterQuery: * http://wiki.apache.org/solr/CommonQueryParameters#fq * http://wiki.apache.org/solr/FilterQueryGuidance Regards Stefan On Thu, Feb 24, 2011 at 12:46 PM, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, I know Filter Query is really useful due to caching but I am confused about how it filter results. Lets say I have following criteria Text:: Abc def Date: 24th Feb, 2011 Now abc def might be coming in almost every document but if SOLR first filters based on date it will have to do search only on few documents (instead of millions) If I put Date parameter in fq would it be first filtering on date and then doing text search or both of them would be filtered separately and then intersection? If its filtered separately the issue would be that lets say abd def takes 20 secs on all documents (without any filters - due to large # of documents) and it will be still taking same time but if its done only on few documents on that specific date it would be super fast. If fq doesn't give what I am looking for, is there any other parameter? There should be a way as this is a very common scenario. -- Regards, Salman Akram -- Regards, Salman Akram
query slop issue
Hi all, i have a search string q=water+treatment+plant and i am using dismax request handler where i have qs = 1 . in which way processing will be done means with in how many words water or treatment or plant should occur to come in result set. -- View this message in context: http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2567418.html Sent from the Solr - User mailing list archive at Nabble.com.
Free Webcast/Technical Case Study: How Bazaarvoice moved to Solr to implement Search Strategies for Social and eCommerce
I thought you might be interested in a technical webcast on Solr/Lucene and e-commerce/social media that we are sponsoring, featuring RC Johnson of Bazaarvoice. It's Wednesday, March 2, 2011 at 11:00am PST/2:00pm EST/19:00 GMT. RC has been leading efforts at Bazaarvoice to build out their Solr search applications moving beyond a more traditional RDBMS-centered data strategy. If you've not heard of Bazaarovoice, they provide user-generated content and ratings in a white-label service offering. They use Solr to index and search millions online customer conversations that deliver billions of monthly impressions for leading companies in retail, manufacturing, financial services, health care, travel and media. Key topics this webcast will cover include: Iterative expansion of search features and content collections Migrating from simplistic database search to Solr-based search Integrating statistical analytics into search at scale Considering NoSQL for scalability and deployability of big data, to make data easier to consume across applications You can sign up here: http://www.eventsvc.com/lucidimagination/030211?trk=ap and mark you calendars for Wednesday, March 2, 2011 at 11:00am PST/2:00pm EST/19:00 GMT. -Grant
Re: Special Circumstances for embedded Solr
On 02/24/2011 12:16 PM, Devangini wrote: Can you please show me how an http implementation of solrj querying can be converted to one for embedded solr with the help of an example? Hi, heres an example that almost compiles. You should be able to get going with this. T class EmbeddedSolrExample { public static void main (String[] args) { setupSolrContainer(); addDocument(); SolrDocumentList getResults(QueryResponse response) { if (response.getStatus() != 0) { return new SolrDocumentList(); } return response.getResults(); } private void addDocument() throws IncompleteDocumentException, SolrServerException, IOException { SolrInputDocument res = new SolrInputDocument(); res.setField(body, test); res.setField(id, 12); UpdateResponse s = server.add(res); assertEquals((int) s.getStatus(), 0 ); server.commit(); SolrDocumentList res = getResults(search(test)); System.out.println(I got + res.size() + documents); } private void setupSolrContainer() throws ParserConfigurationException, IOException, SAXException, IncompleteDocumentException, SolrServerException { File home = new File(/tmp/solr); File f = new File(home, solr.xml); CoreContainer container = new CoreContainer(); container.load(/tmp/solr, f); server = new EmbeddedSolrServer(container, model); addDocument(); } QueryResponse search(String words) throws SolrServerException { SolrQuery query = new SolrQuery(); query.addField(id).addField(body).addField(score); query.setTimeAllowed(1000); query.setRows(50); query.set(q, words); query.setSortField(timestamp, ORDER.desc); // sorter på dato return server.query(query); } -- Regards / Med vennlig hilsen Tarjei Huse Mobil: 920 63 413
Re: Question Solr Index main in RAM
(11/02/24 21:38), Andrés Ospina wrote: Hi, My name is Felipe and i want to use the index main of solr in RAM memory. How it's possible? I have solr 1.4 Thank you! Felipe Welcome Felipe! If I understand your question correctly, you can use RAMDirectoryFactory: https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/RAMDirectoryFactory.html But I believe it is available 3.1 (to be released soon...). Koji -- http://www.rondhuit.com/en/
Re: Question about Nested Span Near Query
Hi To narrow down the issue I indexed a single document with one of the sample queries (given below) which was giving issue. *evaluation of loan and lease portfolios for purposes of assessing the adequacy of * Now when i Perform a search query (*TextContents:evaluation of loan and lease portfolios for purposes of assessing the adequacy of*) the parsed query is *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation, Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true), Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0, true), Contents:purposes], 0, true), Contents:of], 0, true), Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy], 0, true), Contents:of], 0, true)* and search is not successful. If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from end it works fine. Issue seems to come on relatively long phrases but I have not been able to find a pattern and its really mind boggling coz I thought this issue might be due to large position list but this is a single document with one phrase. So its definitely not related to size of index. Any ideas whats going on?? On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal ahsan.iqbal...@gmail.comwrote: Hi It didn't search.. (means no results found even results exist) one observation is that it works well even in the long phrases but when the long phrases contain stop words and same stop word exist two or more time in the phrase then, solr can't search with query parsed in this way. On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, What do you mean by this doesn't work fine? Does it not work correctly or is it slow or ... I was going to suggest you look at Surround QP, but it looks like you already did that. Wouldn't it be better to get Surround QP to work? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Ahsan |qbal ahsan.iqbal...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, February 22, 2011 10:59:26 AM Subject: Question about Nested Span Near Query Hi All I had a requirement to implement queries that involves phrase proximity. like user should be able to search ab cd w/5 de fg, both phrases as whole should be with in 5 words of each other. For this I implement a query parser that make use of nested span queries, so above query would be parsed as spanNear([spanNear([Contents:ab, Contents:cd], 0, true), spanNear([Contents:de, Contents:fg], 0, true)], 5, false) Queries like this seems to work really good when phrases are small but when phrases are large this doesn't work fine. Now my question, Is there any limitation of SpanNearQuery. that we cannot handle large phrases in this way? please help Regards Ahsan
Re: Question about Nested Span Near Query
Send schema and document in XML format and I'll look at it Bill Bell Sent from mobile On Feb 24, 2011, at 7:26 AM, Ahsan |qbal ahsan.iqbal...@gmail.com wrote: Hi To narrow down the issue I indexed a single document with one of the sample queries (given below) which was giving issue. *evaluation of loan and lease portfolios for purposes of assessing the adequacy of * Now when i Perform a search query (*TextContents:evaluation of loan and lease portfolios for purposes of assessing the adequacy of*) the parsed query is *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation, Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true), Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0, true), Contents:purposes], 0, true), Contents:of], 0, true), Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy], 0, true), Contents:of], 0, true)* and search is not successful. If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from end it works fine. Issue seems to come on relatively long phrases but I have not been able to find a pattern and its really mind boggling coz I thought this issue might be due to large position list but this is a single document with one phrase. So its definitely not related to size of index. Any ideas whats going on?? On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal ahsan.iqbal...@gmail.comwrote: Hi It didn't search.. (means no results found even results exist) one observation is that it works well even in the long phrases but when the long phrases contain stop words and same stop word exist two or more time in the phrase then, solr can't search with query parsed in this way. On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, What do you mean by this doesn't work fine? Does it not work correctly or is it slow or ... I was going to suggest you look at Surround QP, but it looks like you already did that. Wouldn't it be better to get Surround QP to work? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Ahsan |qbal ahsan.iqbal...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, February 22, 2011 10:59:26 AM Subject: Question about Nested Span Near Query Hi All I had a requirement to implement queries that involves phrase proximity. like user should be able to search ab cd w/5 de fg, both phrases as whole should be with in 5 words of each other. For this I implement a query parser that make use of nested span queries, so above query would be parsed as spanNear([spanNear([Contents:ab, Contents:cd], 0, true), spanNear([Contents:de, Contents:fg], 0, true)], 5, false) Queries like this seems to work really good when phrases are small but when phrases are large this doesn't work fine. Now my question, Is there any limitation of SpanNearQuery. that we cannot handle large phrases in this way? please help Regards Ahsan
Re: Question Solr Index main in RAM
How to use this? Bill Bell Sent from mobile On Feb 24, 2011, at 7:19 AM, Koji Sekiguchi k...@r.email.ne.jp wrote: (11/02/24 21:38), Andrés Ospina wrote: Hi, My name is Felipe and i want to use the index main of solr in RAM memory. How it's possible? I have solr 1.4 Thank you! Felipe Welcome Felipe! If I understand your question correctly, you can use RAMDirectoryFactory: https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/RAMDirectoryFactory.html But I believe it is available 3.1 (to be released soon...). Koji -- http://www.rondhuit.com/en/
Re: DataImportHandler in Solr 4.0
It seems this thread has been hijacked. My initial posting was in regards to my custom Evaluators always receiving a null context. Same Evaluators work in 1.4.1 On 2/23/11 5:47 PM, Alexandre Rocco wrote: I got it working by building the DIH from the contrib folder and made a change on the lib statements to map the folder that contains the .jar files. Thanks! Alexandre On Wed, Feb 23, 2011 at 8:55 PM, Smiley, David W.dsmi...@mitre.org wrote: The DIH is no longer supplied embedded in the Solr war file. You need to get it on the classpath somehow. You could add anotherlib... statement to solrconfig.xml to resolve this. ~ David Smiley Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/ On Feb 23, 2011, at 4:11 PM, Alexandre Rocco wrote: Hi guys, I'm having some issues when trying to use the DataImportHandler on Solr 4.0. I've downloaded the latest nightly build of Solr 4.0 and configured normally (on the example folder) solrconfig.xml file like this: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler At this point I noticed that the DIH jar was not being loaded correctly causing exceptions like: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' and java.lang.ClassNotFoundException: org.apache.solr.handler.dataimport.DataImportHandler Do I need to build to get DIH running on Solr 4.0? Thanks! Alexandre
Order Facet on ranking score
Hello everybody, Is it possibile to order the facet results on some ranking score? I was doing a query with or operator and sometimes the first facet have inside of them only result with small rank and not important. This cause that users are led to other reasearch not important. // -- Jenny Arduini I.T.T. S.r.l. Strada degli Angariari, 25 47891 Falciano Repubblica di San Marino Tel 0549 941183 Fax 0549 974280 email: jardu...@ittweb.net http://www.ittweb.net
facet.offset with facet.sort=lex and shards problem?
Hi all, I'm having a problem using distributed search in conjunction with the facet.offset parameter and lexical facet value sorting. Is there an incompatibility between these? I'm using Solr 1.41. I have a facet with ~100k values in one index. I'm wanting to page through them alphabetically. When not using distributed search, everything works just fine, and very quick. A query like this works, returning 10 facet values starting at the 50,001st: http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5 # Butterflies - Indiana ! However, if I enable distributed search, using a single shard (which is the same index), I get no facet values returned. http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr # empty list :( Doing a little more testing, I'm finding that with sharding I often get an empty list any time the facet.offset = facet.limit. Also, by example, if I do facet.limit=100 and facet.offset=90, I get 10 facet values. Doing so without sharding, I get the expected (by me, at least) 100 values (starting at what would normally be the 91st). Can anybody shed any light on this for me? Thanks, Peter
Re: Question about Nested Span Near Query
Hi schema and document are attached. On Thu, Feb 24, 2011 at 8:24 PM, Bill Bell billnb...@gmail.com wrote: Send schema and document in XML format and I'll look at it Bill Bell Sent from mobile On Feb 24, 2011, at 7:26 AM, Ahsan |qbal ahsan.iqbal...@gmail.com wrote: Hi To narrow down the issue I indexed a single document with one of the sample queries (given below) which was giving issue. *evaluation of loan and lease portfolios for purposes of assessing the adequacy of * Now when i Perform a search query (*TextContents:evaluation of loan and lease portfolios for purposes of assessing the adequacy of*) the parsed query is *spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([spanNear([Contents:evaluation, Contents:of], 0, true), Contents:loan], 0, true), Contents:and], 0, true), Contents:lease], 0, true), Contents:portfolios], 0, true), Contents:for], 0, true), Contents:purposes], 0, true), Contents:of], 0, true), Contents:assessing], 0, true), Contents:the], 0, true), Contents:adequacy], 0, true), Contents:of], 0, true)* and search is not successful. If I remove '*evaluation*' from start OR *'assessing the adequacy of*' from end it works fine. Issue seems to come on relatively long phrases but I have not been able to find a pattern and its really mind boggling coz I thought this issue might be due to large position list but this is a single document with one phrase. So its definitely not related to size of index. Any ideas whats going on?? On Thu, Feb 24, 2011 at 10:25 AM, Ahsan |qbal ahsan.iqbal...@gmail.com wrote: Hi It didn't search.. (means no results found even results exist) one observation is that it works well even in the long phrases but when the long phrases contain stop words and same stop word exist two or more time in the phrase then, solr can't search with query parsed in this way. On Wed, Feb 23, 2011 at 11:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, What do you mean by this doesn't work fine? Does it not work correctly or is it slow or ... I was going to suggest you look at Surround QP, but it looks like you already did that. Wouldn't it be better to get Surround QP to work? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Ahsan |qbal ahsan.iqbal...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, February 22, 2011 10:59:26 AM Subject: Question about Nested Span Near Query Hi All I had a requirement to implement queries that involves phrase proximity. like user should be able to search ab cd w/5 de fg, both phrases as whole should be with in 5 words of each other. For this I implement a query parser that make use of nested span queries, so above query would be parsed as spanNear([spanNear([Contents:ab, Contents:cd], 0, true), spanNear([Contents:de, Contents:fg], 0, true)], 5, false) Queries like this seems to work really good when phrases are small but when phrases are large this doesn't work fine. Now my question, Is there any limitation of SpanNearQuery. that we cannot handle large phrases in this way? please help Regards Ahsan doc field name=DocID3369660/field field name=Contentsevaluation of loan and lease portfolios for purposes of assessing the adequacy of/field /doc ?xml version=1.0 encoding=UTF-8 ? schema name=example version=1.2 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldtype name=binary class=solr.BinaryField/ fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=float class=solr.TrieFloatField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=double class=solr.TrieDoubleField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=tint class=solr.TrieIntField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tfloat class=solr.TrieFloatField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tlong class=solr.TrieLongField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=tdouble class=solr.TrieDoubleField precisionStep=8 omitNorms=true positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField omitNorms=true precisionStep=0 positionIncrementGap=0/ fieldType name=tdate class=solr.TrieDateField omitNorms=true precisionStep=6 positionIncrementGap=0/ fieldType name=pint class=solr.IntField omitNorms=true/ fieldType name=plong
Re: Filter Query
On Thu, Feb 24, 2011 at 6:46 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, I know Filter Query is really useful due to caching but I am confused about how it filter results. Lets say I have following criteria Text:: Abc def Date: 24th Feb, 2011 Now abc def might be coming in almost every document but if SOLR first filters based on date it will have to do search only on few documents (instead of millions) Yes, this is the way Solr works. The filters are executed separately, but the query is executed last with the filters (i.e. it will be faster if the filter cuts down the number of documents). -Yonik http://lucidimagination.com
Re: Filter Query
So you are agreeing that it does what I want? So in my example Abc def would only be searched on 24th Feb 2010 documents? When you say 'last with filters' does it mean first it filters out with Filter Query and then applies Query on it? On Thu, Feb 24, 2011 at 9:29 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Thu, Feb 24, 2011 at 6:46 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: Hi, I know Filter Query is really useful due to caching but I am confused about how it filter results. Lets say I have following criteria Text:: Abc def Date: 24th Feb, 2011 Now abc def might be coming in almost every document but if SOLR first filters based on date it will have to do search only on few documents (instead of millions) Yes, this is the way Solr works. The filters are executed separately, but the query is executed last with the filters (i.e. it will be faster if the filter cuts down the number of documents). -Yonik http://lucidimagination.com -- Regards, Salman Akram
Re: Filter Query
On Thu, Feb 24, 2011 at 11:56 AM, Salman Akram salman.ak...@northbaysolutions.net wrote: So you are agreeing that it does what I want? So in my example Abc def would only be searched on 24th Feb 2010 documents? Pretty much, but not exactly. It's close enough to what you want though. The details are that the scorer and the filter are leapfrogged, but always starting with the filter again after a match. If you're interested in further details, look at the source code of IndexSearcher for a filtered query. This was added in 1.4: http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performance-increases-for-solr-14/ -Yonik http://lucidimagination.com
Re: Solr 4.0 DIH
(11/02/22 6:58), Mark wrote: I download Solr 4.0 from trunk today and I tried using a custom Evaluator during my full/delta-importing. Within the evaluate method though, the Context is always null? When using this same class with Solr 1.4.1 the context always exists. Is this a bug or is this behavior expected? Thanks public class MyEvaluator extends Evaluator { @Override public String evaluate(String argument, Context context) { // Argument is present however context is always null! } } I tried my test Evaluator on Solr 4.0 and it worked as expected, context is not null. What I did on example-DIH is that: 1. add the following tag to db-data-config.xml: function name=toLowerCase class=LowerCaseFunctionEvaluator/ 2. use the above evaluator: entity name=feature query=select DESCRIPTION from FEATURE where ITEM_ID='${dih.functions.toLowerCase(item.ID)}' 3. do full-import Mt test evaluator looks like this: public class LowerCaseFunctionEvaluator extends Evaluator { public String evaluate(String expression, Context context) { System.out.println( * exp = + expression ); System.out.println( * context = + context ); return null; } } and the context was not null. Koji -- http://www.rondhuit.com/en/
Re: facet.offset with facet.sort=lex and shards problem?
On Thu, Feb 24, 2011 at 10:57 AM, Peter Cline pcl...@pobox.upenn.edu wrote: Hi all, I'm having a problem using distributed search in conjunction with the facet.offset parameter and lexical facet value sorting. Is there an incompatibility between these? I'm using Solr 1.41. I have a facet with ~100k values in one index. I'm wanting to page through them alphabetically. When not using distributed search, everything works just fine, and very quick. A query like this works, returning 10 facet values starting at the 50,001st: http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5 # Butterflies - Indiana ! However, if I enable distributed search, using a single shard (which is the same index), I get no facet values returned. http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr # empty list :( Doing a little more testing, I'm finding that with sharding I often get an empty list any time the facet.offset = facet.limit. Also, by example, if I do facet.limit=100 and facet.offset=90, I get 10 facet values. Doing so without sharding, I get the expected (by me, at least) 100 values (starting at what would normally be the 91st). Can anybody shed any light on this for me? Sounds like a bug. Have you tried a 3x or trunk development build to see if it's fixed there? -Yonik http://lucidimagination.com
Re: facet.offset with facet.sort=lex and shards problem?
On 02/24/2011 12:37 PM, Yonik Seeley wrote: On Thu, Feb 24, 2011 at 10:57 AM, Peter Clinepcl...@pobox.upenn.edu wrote: Hi all, I'm having a problem using distributed search in conjunction with the facet.offset parameter and lexical facet value sorting. Is there an incompatibility between these? I'm using Solr 1.41. I have a facet with ~100k values in one index. I'm wanting to page through them alphabetically. When not using distributed search, everything works just fine, and very quick. A query like this works, returning 10 facet values starting at the 50,001st: http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5 # Butterflies - Indiana ! However, if I enable distributed search, using a single shard (which is the same index), I get no facet values returned. http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr # empty list :( Doing a little more testing, I'm finding that with sharding I often get an empty list any time the facet.offset= facet.limit. Also, by example, if I do facet.limit=100 and facet.offset=90, I get 10 facet values. Doing so without sharding, I get the expected (by me, at least) 100 values (starting at what would normally be the 91st). Can anybody shed any light on this for me? Sounds like a bug. Have you tried a 3x or trunk development build to see if it's fixed there? -Yonik http://lucidimagination.com I haven't. I'll try the current trunk and get back to you. Thanks, Peter
Re: facet.offset with facet.sort=lex and shards problem?
On 02/24/2011 02:58 PM, Peter Cline wrote: On 02/24/2011 12:37 PM, Yonik Seeley wrote: On Thu, Feb 24, 2011 at 10:57 AM, Peter Clinepcl...@pobox.upenn.edu wrote: Hi all, I'm having a problem using distributed search in conjunction with the facet.offset parameter and lexical facet value sorting. Is there an incompatibility between these? I'm using Solr 1.41. I have a facet with ~100k values in one index. I'm wanting to page through them alphabetically. When not using distributed search, everything works just fine, and very quick. A query like this works, returning 10 facet values starting at the 50,001st: http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5 # Butterflies - Indiana ! However, if I enable distributed search, using a single shard (which is the same index), I get no facet values returned. http://server:port/solr/select/?q=*:*facet.field=subject_full_facetfacet=truef.subject_full_facet.facet.limit=10facet.sort=lexfacet.offset=5shards=server:port/solr # empty list :( Doing a little more testing, I'm finding that with sharding I often get an empty list any time the facet.offset= facet.limit. Also, by example, if I do facet.limit=100 and facet.offset=90, I get 10 facet values. Doing so without sharding, I get the expected (by me, at least) 100 values (starting at what would normally be the 91st). Can anybody shed any light on this for me? Sounds like a bug. Have you tried a 3x or trunk development build to see if it's fixed there? -Yonik http://lucidimagination.com I haven't. I'll try the current trunk and get back to you. Thanks, Peter I tried today's builds for the 3.x branch and the trunk. The problem persists in both. Peter
dataimport
Hi all, First of all, I'm quite new to solr. I have the server set up and everything appears to work. I set it up so that the indexed data comes through a mysql connection: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler And here is the contents of db-data-config.xml: dataConfig dataSource type=JdbcDataSource name=mystuff batchSize=-1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/database?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull user=user password=password/ document entity name=id dataSource=mystuff query=SELECT p.id, p.fielda, p.fieldb, p.fieldc, p.fieldd FROM mytable p /entity /document /dataConfig When I point my browser at localhost:8983/solr/dataimport, the server produces the following message: Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=10 Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.DataImporter doFullImport INFO: Starting Full Import Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties INFO: Read dataimport.properties Feb 24, 2011 8:58:24 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [] REMOVING ALL DOCUMENTS FROM INDEX Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=/wwwroot/apps/apache-solr-1.4.1/example/solr/data/index,segFN=segments_p,version=1297781919778,generation=25,filenames=[_n.nrm, _n.tis, _n.prx, segments_p, _n.fdt, _n.frq, _n.tii, _n.fdx, _n.fnm] Feb 24, 2011 8:58:24 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1297781919778 Feb 24, 2011 8:58:24 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity id with URL: jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull Feb 24, 2011 8:58:25 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Time taken for getConnection(): 137 Killed So it looks like for whatever reason, the server crashes trying to do a full import. When I add a LIMIT clause on the query, it works fine when the LIMIT is only 250 records but if I try to do 500 records, I get the same message. The fields types are: SHOW CREATE TABLE mytable; CREATE TABLE mytable ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `fielda` varchar(650) COLLATE utf8_unicode_ci DEFAULT NULL, `fieldb` varchar(500) COLLATE utf8_unicode_ci DEFAULT NULL, `fieldc` text COLLATE utf8_unicode_ci, `fieldd` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL, PRIMARY KEY (`id`) ); How can I get Solr to do a full import without crashing? Doing it 250 records at a time is not going to be feasible because there are about 50 records.
Re: query slop issue
qs is only the amount of slop on phrase queries explicitly specified in the q for qf fields. So only if the search q is water treatment plant, would the qs come into picture. Slop is the maximum allowable positional distance between terms to be considered a match is called slop. and distance is the number of positional moves of terms to reconstruct the phrase in same order. So with qs=1 you are allowed for only one positional move to recreate the exact phrase. You may also want to check the pf and the ps params for the dismax. Regards, Jayendra On Thu, Feb 24, 2011 at 8:31 AM, Bagesh Sharma mail.bag...@gmail.com wrote: Hi all, i have a search string q=water+treatment+plant and i am using dismax request handler where i have qs = 1 . in which way processing will be done means with in how many words water or treatment or plant should occur to come in result set. -- View this message in context: http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2567418.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem in full query searching
With dismax or extended dismax parser you should be able to achieve this. Dismax :- qf, qs, pf ps should help you to have exact control on the fields and boosts. Extended Dismax :- In addition to qf, qs, pf ps, you have pf2 and pf3 for the two and three words shingles. As Grijesh mentioned, use more weight for phrase or proximity matches Regards, Jayendra On Thu, Feb 24, 2011 at 4:03 AM, Grijesh pintu.grij...@gmail.com wrote: Try to configue more waight on ps and pf parameters of dismax request handler to boost phrase matching documents. Or if you do not want to consider the term frequency then use omitTermFreqAndPositions=true in field definition - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-full-query-searching-tp2566054p2566230.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: facet.offset with facet.sort=lex and shards problem?
On Thu, Feb 24, 2011 at 3:53 PM, Peter Cline pcl...@pobox.upenn.edu wrote: I tried today's builds for the 3.x branch and the trunk. The problem persists in both. Thanks Peter, I was now also able to duplicated the bug. Could you open a JIRA issue for this? -Yonik http://lucidimagination.com
DIH regex remove email + extract url
Hi, I'm trying to remove all email address in my content field with following line: field column=description xpath=/product/content regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z]{2,4} replaceWith= / But it doesn't seem to remove emails? Is the syntax right? Second thing: I would like to extract domain name from url via a regex: field column=source xpath=/product/url regex=http://(.*?)\\/(.*) / Example: url=http://www.abcd.com/product.php?id=324 -- i want to index source = abcd.com What the syntax for this one? Thanks for your help Rosa
Re: Order Facet on ranking score
No, Solr returns facets ordered alphabetically or count. Hello everybody, Is it possibile to order the facet results on some ranking score? I was doing a query with or operator and sometimes the first facet have inside of them only result with small rank and not important. This cause that users are led to other reasearch not important. //
Re: CUSTOM JSP FOR APACHE SOLR
Hello list, as suggested below, I tried to implement a custom ResponseWriter that would evaluate a JSP but that seems impossible: the HttpServletRequest and the HttpServletResponse are not available anymore. Have I missed something? Should I rather do a RequestHandler? Does anyone know an artificial way to run a JSP? (I rather not like it). thanks in advance paul Le 2 févr. 2011 à 20:42, Tomás Fernández Löbbe a écrit : Hi Paul, I don't fully understand what you want to do. The way, I think, SolrJ is intended to be used is from a client application (outside Solr). If what you want is something like what's done with Velocity I think you could implement a response writer that renders the JSP and send it on the response. Tomás On Mon, Jan 31, 2011 at 6:25 PM, Paul Libbrecht p...@hoplahup.net wrote: Tomas, I also know velocity can be used and works well. I would be interested to a simpler way to have the objects of SOLR available in a jsp than write a custom jsp processor as a request handler; indeed, this seems to be the way solrj is expected to be used in the wiki page. Actually I migrated to velocity (which I like less than jsp) just because I did not find a response to this question. paul Le 31 janv. 2011 à 21:53, Tomás Fernández Löbbe a écrit : Hi John, you can use whatever you want for building your application, using Solr on the backend (JSP included). You should find all the information you need on Solr's wiki page: http://wiki.apache.org/solr/ http://wiki.apache.org/solr/including some client libraries to easy integrate your application with Solr: http://wiki.apache.org/solr/IntegratingSolr http://wiki.apache.org/solr/IntegratingSolrfor fast prototyping you could use Velocity: http://wiki.apache.org/solr/VelocityResponseWriter http://wiki.apache.org/solr/VelocityResponseWriterAnyway, I recommend you to start with Solr's tutorial: http://lucene.apache.org/solr/tutorial.html Good luck, http://lucene.apache.org/solr/tutorial.htmlTomás 2011/1/31 JOHN JAIRO GÓMEZ LAVERDE jjai...@hotmail.com SOLR LUCENE DEVELOPERS Hi i am new to solr and i like to make a custom search page for enterprise users in JSP that takes the results of Apache Solr. - Where i can find some useful examples for that topic ? - Is JSP the correct approach to solve mi requirement ? - If not what is the best solution to build a customize search page for my users? Thanks from South America JOHN JAIRO GOMEZ LAVERDE Bogotá - Colombia
query results filter
Hi everyone, I have some existing solr cores that for one reason or another have documents that I need to filter from the query results page. I would like to do this inside Solr instead of doing it on the receiving end, in the client. After searching the mailing list archives and Solr wiki, it appears you do this by registering a custom SearchHandler / SearchComponent with Solr. Still, I don't quite understand how this machinery fits together. Any suggestions / ideas / pointers much appreciated! Cheers, -Babak ~~ Ideally, I'd like to find / code a solution that does the following: 1. A request handler that works like the StandardRequestHandler but which allows an optional DocFilter (say, modeled like the java.io.FileFilter interface) 2. Allows current pagination to work transparently. 3. Works transparently with distributed/sharded queries.
RE: query results filter
Hmm, depending on what you are actually needing to do, can you do it with a simple fq param to filter out what you want filtered out, instead of needing to write custom Java as you are suggesting? It would be a lot easier to just use an fq. How would you describe the documents you want to filter from the query results page? Can that description be represented by a Solr query you can already represent using the lucene, dismax, or any other existing query? If so, why not just use a negated fq describing what to omit from the results? From: Babak Farhang [farh...@gmail.com] Sent: Thursday, February 24, 2011 6:58 PM To: solr-user Subject: query results filter Hi everyone, I have some existing solr cores that for one reason or another have documents that I need to filter from the query results page. I would like to do this inside Solr instead of doing it on the receiving end, in the client. After searching the mailing list archives and Solr wiki, it appears you do this by registering a custom SearchHandler / SearchComponent with Solr. Still, I don't quite understand how this machinery fits together. Any suggestions / ideas / pointers much appreciated! Cheers, -Babak ~~ Ideally, I'd like to find / code a solution that does the following: 1. A request handler that works like the StandardRequestHandler but which allows an optional DocFilter (say, modeled like the java.io.FileFilter interface) 2. Allows current pagination to work transparently. 3. Works transparently with distributed/sharded queries.
Re: DataImportHandler in Solr 4.0
: It seems this thread has been hijacked. My initial posting was in regards to : my custom Evaluators always receiving a null context. Same Evaluators work in : 1.4.1 I'm pretty sure you are talking about a completely different thread, with a completely differnet subject (Solr 4.0 DIH) -Hoss
Re: DIH regex remove email + extract url
Hi Rosa, field column=description xpath=/product/content regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[A-Z]{2,4} replaceWith= / Shouldn't it be regex=[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-z]{2,4}? field column=source xpath=/product/url regex=http://(.*?)\\/(.*) / Example: url=http://www.abcd.com/product.php?id=324 -- i want to index source = abcd.com Probably it could be regex=http:\/\/(.*?)\/(.*) I use a regex web tool: http://www.regexplanet.com/simple/index.html Koji -- http://www.rondhuit.com/en/
Re: Make syntax highlighter caseinsensitive
(11/02/24 20:18), Tarjei Huse wrote: Hi, I got an index where I have two fields, body and caseInsensitiveBody. Body is indexed and stored while caseInsensitiveBody is just indexed. The idea is that by not storing the caseInsensitiveBody I save some space and gain some performance. So I query against the caseInsensitiveBody and generate highlighting from the case sensitive one. The problem is that as a result, I am missing highlighting terms. For example, when I search for solr and get a match in caseInsensitiveBody for solr but that it is Solr in the original document, no highlighting is done. Is there a way around this? Currently I am using the following highlighting params: 'hl' = 'on', 'hl.fl' = 'header,body', 'hl.usePhraseHighlighter' = 'true', 'hl.highlightMultiTerm' = 'true', 'hl.fragsize' = 200, 'hl.regex.pattern' = '[-\w ,/\n\\']{20,200}', Tarjei, Maybe silly question, but why no you make body field case insensitive and eliminate caseInsensitiveBody field, and then query and highlight on just body field? Koji -- http://www.rondhuit.com/en/
Ramdirectory
I could not figure out how to setup the ramdirectory option in solrconfig.XML. Does anyone have an example for 1.4? Bill Bell Sent from mobile
Re: Ramdirectory
: I could not figure out how to setup the ramdirectory option in solrconfig.XML. Does anyone have an example for 1.4? it wasn't an option in 1.4. as Koji had already mentioned in the other thread where you chimed in and asked about this, it was added in the 3x branch... http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td2567166.html -Hoss
boosting based on number of terms matched?
I'm using the edismax handler, although my question is probably the same for dismax. When the user types a long query, I use the mm parameter so that only 75% of terms need to match. This works fine, however, sometimes documents that only match 75% of the terms show up higher in my results than documents that match 100%. I'd like to set a boost so that documents that match 100% will be much more likely to be put ahead of documents that only match 75%. Can anyone give me a pointer of how to do this? Thanks, Nick
Re: Ramdirectory
Thanks - yeah that is why I asked how to use it. But I still don't know how to use it. https://hudson.apache.org/hudson/job/Solr-3.x/javadoc/org/apache/solr/core/ RAMDirectoryFactory.html https://issues.apache.org/jira/browse/SOLR-465 directoryProvider class=org.apache.lucene.store.RAMDirectory !-- Parameters as required by the implementation -- /directoryProvider Is that right? Examples? Options? Where do I put that in solrconfig.xml ? Do I put it in mainIndex/directoryProvider ? I know that SOLR-465 is more generic, but https://issues.apache.org/jira/browse/SOLR-480 seems easier to use. Thanks. On 2/24/11 6:21 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I could not figure out how to setup the ramdirectory option in solrconfig.XML. Does anyone have an example for 1.4? it wasn't an option in 1.4. as Koji had already mentioned in the other thread where you chimed in and asked about this, it was added in the 3x branch... http://lucene.472066.n3.nabble.com/Question-Solr-Index-main-in-RAM-td25671 66.html -Hoss
Re: query results filter
In my case, I want to filter out duplicate docs so that returned docs are unique w/ respect to a certain field (not the schema's unique field, of course): a duplicate doc here is one that has same value for a checksum field as one of the docs already in the results. It would be great if I could somehow express that w/ a query, but I don't think that would be possible. On Thu, Feb 24, 2011 at 5:11 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Hmm, depending on what you are actually needing to do, can you do it with a simple fq param to filter out what you want filtered out, instead of needing to write custom Java as you are suggesting? It would be a lot easier to just use an fq. How would you describe the documents you want to filter from the query results page? Can that description be represented by a Solr query you can already represent using the lucene, dismax, or any other existing query? If so, why not just use a negated fq describing what to omit from the results? From: Babak Farhang [farh...@gmail.com] Sent: Thursday, February 24, 2011 6:58 PM To: solr-user Subject: query results filter Hi everyone, I have some existing solr cores that for one reason or another have documents that I need to filter from the query results page. I would like to do this inside Solr instead of doing it on the receiving end, in the client. After searching the mailing list archives and Solr wiki, it appears you do this by registering a custom SearchHandler / SearchComponent with Solr. Still, I don't quite understand how this machinery fits together. Any suggestions / ideas / pointers much appreciated! Cheers, -Babak ~~ Ideally, I'd like to find / code a solution that does the following: 1. A request handler that works like the StandardRequestHandler but which allows an optional DocFilter (say, modeled like the java.io.FileFilter interface) 2. Allows current pagination to work transparently. 3. Works transparently with distributed/sharded queries.
Re: query slop issue
Thanks very good explanation. -- View this message in context: http://lucene.472066.n3.nabble.com/query-slop-issue-tp2567418p2573185.html Sent from the Solr - User mailing list archive at Nabble.com.