Re: WordDelimiterFilter to QueryParser to MultiPhraseQuery?
On Mon, Aug 31, 2009 at 10:47 PM, jOhn net...@gmail.com wrote: This is mostly my misunderstanding of catenateAll=1 as I thought it would break down with an OR using the full concatenated word. Thus: Jokers Wild - { jokers, wild } OR { jokerswild } But really it becomes: { jokers, {wild, jokerswild}} which will not match. And if you have a mistyped camel case like: jOkerswild - { j, {okerswild, jokerswild}} again no match. Sorry for the late reply. You still haven't given the fieldtype definition that you were using. I tried: fieldtype name=wdf_preserve_catenate class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype And I tried indexing Jokers Wild which matches when I query for jOkerswild and jokerswild. Note that if you change the tokenizer to WhiteSpaceTokenizer then such queries won't match. -- Regards, Shalin Shekhar Mangar.
Re: Problem querying for a value with a space
On Thu, Sep 3, 2009 at 1:45 AM, Adam Allgaier allgai...@yahoo.com wrote: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ ... dynamicField name=*_s type=string indexed=true stored=true/ I am indexing the specific_LIST_s with the value For Sale. The document indexes just fine. A query returns the document with the proper value: str name=specific_LIST_sFor Sale/str However, when I try to query on that field +specific_LIST_s:For Sale +specific_LIST_s:For+Sale +specific_LIST_s:For%20Sale I get no results with any one of those three queries. Use +specific_LIST_s:(For Sale) or +specific_LIST_s:For Sale -- Regards, Shalin Shekhar Mangar.
Re: Return 2 fields per facet.. name and id, for example? / facet value search
On Fri, Aug 28, 2009 at 12:57 AM, Rihaed Tan tanrihae...@gmail.com wrote: Hi, I have a similar requirement to Matthew (from his post 2 years ago). Is this still the way to go in storing both the ID and name/value for facet values? I'm planning to use id#name format if this is still the case and doing a prefix query. I believe this is a common requirement so I'd appreciate if any of you guys can share what's the best way to do it. Also, I'm indexing the facet values for text search as well. Should the field declaration below suffice the requirement? field name=category type=text indexed=true stored=true required=true multiValued=true/ There have been talks of having a pair field type in Solr but there is no patch yet. So I guess the way proposed by Yonik is a good solution. -- Regards, Shalin Shekhar Mangar.
Re: Field Collapsing (was Re: Schema for group/child entity setup)
The development on this patch is quite active. It works well for single solr instance, but distributed search (ie. shards) is not yet supported. Using this page you can group search results based on a specific field. There are two flavors of field collapsing - adjacent and non-adjacent, the former collapses only document which happen to be located next to each other in the otherwise-non-collapsed results set. The later (the non-adjacent) one collapses all documents with the same field value (regardless of their position in the otherwise-non-collapsed results set). Note, that non-adjacent performs better than adjacent one. There's currently discussion to extend this support so in addition to collapsing the documents, extra information will be returned for the collapsed documents (see the discussion on the issue page). Uri R. Tan wrote: I think this is what I'm looking for. What is the status of this patch? On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote: Hi Solrers, I would like to get your opinion on how to best approach a search requirement that I have. The scenario is I have a set of business listings that may be group into one parent business (such as 7-eleven having several locations). On the results page, I only want 7-eleven to show up once but also show how many locations matched the query (facet filtered by state, for example) and maybe a preview of the some of the locations. Searching for the business name is straightforward but the locations within the a result is quite tricky. I can do the opposite, searching for the locations and faceting on business names, but it will still basically be the same thing and repeat results with the same business name. Any advice? Thanks, R
Exact Word Search
Hi, Can any one help me with the below scenario?. Scenario : I have integrated Solr with Carrot2. The issue is Assuming i give bhaskar as input string for search. It should give me search results pertaining to bhaskar only. Example: It should not display search results as chandarbhaskar or bhaskarc. Basically search should happen based on the exact word match. I am not bothered about case sensitive here How to achieve the above Scenario in Carrot2 ?. Regards Bhaskar
Solr question
Hi, Following solr tuto, I send doc to solr by request : curl 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_map. content=attr_contentcommit=true' --F myfi...@oxiane.pdf response lst name=responseHeaderint name=status0/intint name=QTime23717/int/lst /response Reply seems OK, content is in the index, but after no query match the doc... TIA Regards Bruno
Re: questions about solr
On Wed, Sep 2, 2009 at 10:44 PM, Zhenyu Zhong zhongresea...@gmail.comwrote: Dear all, I am very interested in Solr and would like to deploy Solr for distributed indexing and searching. I hope you are the right Solr expert who can help me out. However, I have concerns about the scalability and management overhead of Solr. I am wondering if anyone could give me some guidance on Solr. Basically, I have the following questions, For indexing 1. How does Solr handle the distributed indexing? It seems Solr generates index on a single box. What if the index is huge and can't sit on one box? Solr leaves the distribution of index upto the user. So if you think your index will not fit in one box, you figure out a sharding strategy (such as hashing or round-robin) and index your collection into each shards. Solr supports distributed search so that your query can use all the shards to give you the results. 2. Is it possible for Solr to generate index in HDFS? Never tried but it seems so. See Jason's response and the Jira issue he has mentioned. For searching 3. Solr provides Master/Slave framework. How does the Solr distribute the search? Does Solr know which index/shard to deliver the query to? Or does it have to do a multicast query to all the nodes? For a full-text search it is hard to figure out the correct shards because matching document could be living anywhere (unless you shard in a very clever way and your data can be sharded in that way). Each shard is queried, the results are merged and returned as if you had queried a single Solr server. For fault tolerance 4. Does Solr handle the management overhead automatically? suppose master goes down, how does Solr recover the master in order to get the latest index updates? Do we have to code ourselves to handle this? It does not. You have to handle that yourself currently. Similar topics have been discussed on this list in the past and some workarounds have been suggested. I suggest you search the archives. 5. Suppose master goes down immediately after the index updates, while the updates haven't been replicated to the slaves, data loss seems to happen. Does Solr have any mechanism to deal with that? No. If you want you can setup a backup master and index on both master and backup machines to achieve redundancy. However switching between the master and the backup would need to be done by you. Performance of real-time index updating 6. How is the performance of this realtime index updating? Suppose we are updating a million records for a huge index with billions of records frequently. Can Solr provides a reasonable performance and low latency on that? (Probably it is related to Lucene library) How frequently? With careful sharding, you can distribute your write load. Depending on your data, you may also be able to split you indexes into a more frequently updated on and an older archive index. A lot of work is in progress in this area. Lucene 2.9 has support for near real time search with more improvements planned in the coming days. Solr 1.4 will not have support for these new Lucene features but with 1.5 things should be a lot better. -- Regards, Shalin Shekhar Mangar.
Re: Field Collapsing (was Re: Schema for group/child entity setup)
Thanks Uri. How does paging and scoring work when using field collapsing? What patch works with 1.3? Is it production ready? R On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote: The development on this patch is quite active. It works well for single solr instance, but distributed search (ie. shards) is not yet supported. Using this page you can group search results based on a specific field. There are two flavors of field collapsing - adjacent and non-adjacent, the former collapses only document which happen to be located next to each other in the otherwise-non-collapsed results set. The later (the non-adjacent) one collapses all documents with the same field value (regardless of their position in the otherwise-non-collapsed results set). Note, that non-adjacent performs better than adjacent one. There's currently discussion to extend this support so in addition to collapsing the documents, extra information will be returned for the collapsed documents (see the discussion on the issue page). Uri R. Tan wrote: I think this is what I'm looking for. What is the status of this patch? On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote: Hi Solrers, I would like to get your opinion on how to best approach a search requirement that I have. The scenario is I have a set of business listings that may be group into one parent business (such as 7-eleven having several locations). On the results page, I only want 7-eleven to show up once but also show how many locations matched the query (facet filtered by state, for example) and maybe a preview of the some of the locations. Searching for the business name is straightforward but the locations within the a result is quite tricky. I can do the opposite, searching for the locations and faceting on business names, but it will still basically be the same thing and repeat results with the same business name. Any advice? Thanks, R
Question: How do I run the solr analysis tool programtically ?
Form java code I want to contact solr through Http and supply a text buffer (or a url that returns text, whatever is easier) and I want to get in return the final list of tokens (or the final text buffer) after it went through all the query time filters defined for this solr instance (stemming, stop words etc) thanks in advance -- View this message in context: http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question: How do I run the solr analysis tool programtically ?
Hi Yatir, The FieldAnalysisRequestHandler has the same behavior as the analysis tool. It will show you the list of tokens that are created after each of the filters have been applied. It can be used through normal HTTP requests, or you can use SolrJ's support. Thanks, Chris On Thu, Sep 3, 2009 at 12:42 PM, Yatir yat...@outbrain.com wrote: Form java code I want to contact solr through Http and supply a text buffer (or a url that returns text, whatever is easier) and I want to get in return the final list of tokens (or the final text buffer) after it went through all the query time filters defined for this solr instance (stemming, stop words etc) thanks in advance -- View this message in context: http://www.nabble.com/Question%3A-How-do-I-run-the-solr-analysis-tool-programtically---tp25273484p25273484.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr question
On Sep 3, 2009, at 1:24 AM, SEZNEC Bruno wrote: Hi, Following solr tuto, I send doc to solr by request : curl 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=attr_map . content=attr_contentcommit=true' --F myfi...@oxiane.pdf response lst name=responseHeaderint name=status0/intint name=QTime23717/int/lst /response Reply seems OK, content is in the index, but after no query match the doc... Not even a *:* query? What queries are you trying? What's your default search field? What does the query parse to, as seen in the response using debugQuery=true ? Likely the problem is that you aren't searching on the field the content was indexed into, or that it was not analyzed as you need. Erik
Re: Exact Word Search
On Thu, Sep 3, 2009 at 1:33 PM, bhaskar chandrasekar bas_s...@yahoo.co.inwrote: Hi, Can any one help me with the below scenario?. Scenario : I have integrated Solr with Carrot2. The issue is Assuming i give bhaskar as input string for search. It should give me search results pertaining to bhaskar only. Example: It should not display search results as chandarbhaskar or bhaskarc. Basically search should happen based on the exact word match. I am not bothered about case sensitive here How to achieve the above Scenario in Carrot2 ?. Bhaskar, I think this question is better suited for the Carrot mailing lists. Unless you yourself control how the solr query is created, we will not be able to help you. -- Regards, Shalin Shekhar Mangar.
Re: Using SolrJ with Tika
Hi Laurent, I am not sure if this is what you need, but you can extract the content from the uploaded document (MS Docs, PDF etc) using TIKA and then send it to SOLR for indexing. String CONTENT = extract the content using TIKA (you can use AutoDetectParser) and then, SolrInputDocument doc = new SolrInputDocument(); doc.addField(DOC_CONTENT, CONTENT); solrServer.add(doc); soltServer.commit(); On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice lbil...@yahoo.fr wrote: Hi everybody. I hope it's the right place for questions, if not sorry. I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene. I have seen a few examples explaining how to use tika to solve this. But most of these examples are using curl to send documents to Solr or an HTML POST with an input file. But i'd like to do it in full java. Is there a way to use Solrj to index the documents with the ExtractingRequestHandler of SolR or at least to get the extracted xml back (with the extract.only option) ? Many thanks. Laurent.
Indexing docs using TIKA
I am not sure if this went to Mailing List before.. hence forwarding again Hi All, I want to search for a document containing string to search, price between 100 to 200 and weight 10-20. SolrQuery query = new SolrQuery(); query.setQuery( DOC_CONTENT: string to search); query.setFilterQueries(PRICE:[100 TO 200]); query.setFilterQueries(WEIGHT:[10 TO 20]); QueryResponse response = server.query(query); The DOC_CONTENT contains the content extracted from the file uploaded by the user, extracted using TIKA. Is the above approach correct ?
Re : Using SolrJ with Tika
Hi This is the solution I was testing. I got some difficulties with AutoDetectParser but I think it's the solution I will use in the end. Thanks for the advice anyway :) Regards, Laurent De : Abdullah Shaikh abdullah.sha...@viithiisys.com À : solr-user@lucene.apache.org Envoyé le : Jeudi, 3 Septembre 2009, 14h31mn 10s Objet : Re: Using SolrJ with Tika Hi Laurent, I am not sure if this is what you need, but you can extract the content from the uploaded document (MS Docs, PDF etc) using TIKA and then send it to SOLR for indexing. String CONTENT = extract the content using TIKA (you can use AutoDetectParser) and then, SolrInputDocument doc = new SolrInputDocument(); doc.addField(DOC_CONTENT, CONTENT); solrServer.add(doc); soltServer.commit(); On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice lbil...@yahoo.fr wrote: Hi everybody. I hope it's the right place for questions, if not sorry. I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene. I have seen a few examples explaining how to use tika to solve this. But most of these examples are using curl to send documents to Solr or an HTML POST with an input file. But i'd like to do it in full java. Is there a way to use Solrj to index the documents with the ExtractingRequestHandler of SolR or at least to get the extracted xml back (with the extract.only option) ? Many thanks. Laurent.
RE: Solr question
Thanks My idea was that is I have dynamicField name=attr_* type=textgen indexed=true stored=true multiValued=true/ in schema.xml Eveything was stored in the index. The query solr or other stuff works well only with text given in the sample files Rgds Bruno -Message d'origine- De : Erik Hatcher [mailto:erik.hatc...@gmail.com] Envoyé : jeudi 3 septembre 2009 13:40 À : solr-user@lucene.apache.org Objet : Re: Solr question On Sep 3, 2009, at 1:24 AM, SEZNEC Bruno wrote: Hi, Following solr tuto, I send doc to solr by request : curl 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=att r_map . content=attr_contentcommit=true' --F myfi...@oxiane.pdf response lst name=responseHeaderint name=status0/intint name=QTime23717/int/lst /response Reply seems OK, content is in the index, but after no query match the doc... Not even a *:* query? What queries are you trying? What's your default search field? What does the query parse to, as seen in the response using debugQuery=true ? Likely the problem is that you aren't searching on the field the content was indexed into, or that it was not analyzed as you need. Erik
Re: score = sum of boosts
You could start with a TF formula that ignores frequencies above 1. onOffTF, I guess, returning 1 if the term is there one or more times. Or, you could tell us what you are trying to achieve. wunder On Sep 3, 2009, at 12:28 AM, Shalin Shekhar Mangar wrote: On Thu, Sep 3, 2009 at 4:09 AM, Joe Calderon calderon@gmail.com wrote: hello *, what would be the best approach to return the sum of boosts as the score? ex: a dismax handler boosts matches to field1^100 and field2^50, a query matches both fields hence the score for that row would be 150 Not really. The tf-idf score would be multiplied by 100 for field1 and by 50 for field2. The score can be more than 150 if both fields match. is this something i could do with a function query or do i need to hack up DisjunctionMaxScorer ? Can you give a little more background on what you want to achieve this way? -- Regards, Shalin Shekhar Mangar.
Best way to do a lucene matchAllDocs not using q.alt=*:*
Hey there, I need a query to get the total number of documents in my index. I can get if I do this using DismaxRequestHandler: q.alt=*:*facet=falsehl=falserows=0 I have noticed this query is very memory consuming. Is there any more optimized way in trunk to get the total number of documents of my index? Thanks in advanced -- View this message in context: http://www.nabble.com/Best-way-to-do-a-lucene-matchAllDocs-not-using-q.alt%3D*%3A*-tp25277585p25277585.html Sent from the Solr - User mailing list archive at Nabble.com.
Default Query Type For Facet Queries
We have a custom query parser plugin registered as the default for searches, and we'd like to have the same parser used for facet.query. Is there a way to register it as the default for FacetComponent in solrconfig.xml? I know I can add {!type=customparser} to each query as a workaround, but I'd rather register it in the config that make my code send that and strip it off on every facet query. -- Stephen Duncan Jr www.stephenduncanjr.com
RE: Solr question
Response with id:doc4 is OK response − lst name=responseHeader int name=status0/int int name=QTime3/int − lst name=params str name=indenton/str str name=start0/str str name=qid:doc4/str str name=version2.2/str str name=rows10/str /lst /lst − result name=response numFound=1 start=0 − doc − arr name=attr_Author strSami Siren/str /arr − arr name=attr_Content-Type strapplication/pdf/str /arr − arr name=attr_content − str Example PDF document Tika Solr Cell This is a sample piece of content for Tika Solr Cell article. /str /arr − arr name=attr_created strWed Dec 31 10:17:13 CET 2008/str /arr − arr name=attr_creator strWriter/str /arr − arr name=attr_producer strOpenOffice.org 3.0/str /arr − arr name=attr_stream_content_type strapplication/octet-stream/str /arr − arr name=attr_stream_name strSampleDocument.pdf/str /arr − arr name=attr_stream_size str18408/str /arr − arr name=attr_stream_source_info strmyfile/str /arr str name=iddoc4/str str name=titleExample PDF document/str /doc /result /response What I don't understand is why a simple search on title or content Doesn't works : response − lst name=responseHeader int name=status0/int int name=QTime3/int − lst name=params str name=indenton/str str name=start0/str str name=qPDF/str str name=version2.2/str str name=rows10/str /lst /lst result name=response numFound=0 start=0/ /response Thanks -Message d'origine- De : Erik Hatcher [mailto:erik.hatc...@gmail.com] Envoyé : jeudi 3 septembre 2009 13:40 À : solr-user@lucene.apache.org Objet : Re: Solr question On Sep 3, 2009, at 1:24 AM, SEZNEC Bruno wrote: Hi, Following solr tuto, I send doc to solr by request : curl 'http://localhost:8983/solr/update/extract?literal.id=doc1uprefix=att r_map . content=attr_contentcommit=true' --F myfi...@oxiane.pdf response lst name=responseHeaderint name=status0/intint name=QTime23717/int/lst /response Reply seems OK, content is in the index, but after no query match the doc... Not even a *:* query? What queries are you trying? What's your default search field? What does the query parse to, as seen in the response using debugQuery=true ? Likely the problem is that you aren't searching on the field the content was indexed into, or that it was not analyzed as you need. Erik
how to scan dynamic field without specifying each field in query
say I have a dynamic field called Foo* (where * can be in the hundreds) and want to search Foo* for a value of 3 (for example) I know I can do this via this: http://localhost:8994/solr/select?q=(Foo1:3 OR Foo2:3 OR Foo3:3 OR … Foo999:3) However, is there a better way? i.e. is there some way to query by a function I create, possibly something like this: http://localhost:8994/solr/select?q=myfunction(‘Foo’, 3) where myfunction itself iterates thru all the instances of Foo* any help appreciated -- View this message in context: http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280228.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to scan dynamic field without specifying each field in query
You can copy the dynamic fields value into a different field and query on that field. Thanks, Kalyan Manepalli -Original Message- From: gdeconto [mailto:gerald.deco...@topproducer.com] Sent: Thursday, September 03, 2009 12:06 PM To: solr-user@lucene.apache.org Subject: how to scan dynamic field without specifying each field in query say I have a dynamic field called Foo* (where * can be in the hundreds) and want to search Foo* for a value of 3 (for example) I know I can do this via this: http://localhost:8994/solr/select?q=(Foo1:3 OR Foo2:3 OR Foo3:3 OR ... Foo999:3) However, is there a better way? i.e. is there some way to query by a function I create, possibly something like this: http://localhost:8994/solr/select?q=myfunction('Foo', 3) where myfunction itself iterates thru all the instances of Foo* any help appreciated -- View this message in context: http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280228.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to scan dynamic field without specifying each field in query
I know I can do this via this: http://localhost:8994/solr/select?q=(Foo1:3OR Foo2:3 OR Foo3:3 OR ... Foo999:3) Careful! You may hit the upper limit for MAX_BOOLEAN_CLAUSES this way. You can copy the dynamic fields value into a different field and query on that field. Good idea! Cheers **Avlesh On Thu, Sep 3, 2009 at 10:47 PM, Manepalli, Kalyan kalyan.manepa...@orbitz.com wrote: You can copy the dynamic fields value into a different field and query on that field. Thanks, Kalyan Manepalli -Original Message- From: gdeconto [mailto:gerald.deco...@topproducer.com] Sent: Thursday, September 03, 2009 12:06 PM To: solr-user@lucene.apache.org Subject: how to scan dynamic field without specifying each field in query say I have a dynamic field called Foo* (where * can be in the hundreds) and want to search Foo* for a value of 3 (for example) I know I can do this via this: http://localhost:8994/solr/select?q=(Foo1:3http://localhost:8994/solr/select?q=%28Foo1:3OR Foo2:3 OR Foo3:3 OR ... Foo999:3) However, is there a better way? i.e. is there some way to query by a function I create, possibly something like this: http://localhost:8994/solr/select?q=myfunction('Foohttp://localhost:8994/solr/select?q=myfunction%28%27Foo', 3) where myfunction itself iterates thru all the instances of Foo* any help appreciated -- View this message in context: http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280228.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to scan dynamic field without specifying each field in query
thx for the reply. you mean into a multivalue field? possible, but was wondering if there was something more flexible than that. the ability to use a function (ie myfunction) would open up some possibilities for more complex searching and search syntax. I could write my own query parser with special extended syntax, but that is farther than I wanted to go. Manepalli, Kalyan wrote: You can copy the dynamic fields value into a different field and query on that field. Thanks, Kalyan Manepalli -- View this message in context: http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280669.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to use DataImportHandler with ExtractingRequestHandler?
Hi Khai, a few weeks ago, I was facing the same problem. In my case, this workaround helped (assuming, you're using Solr 1.3): For each row, extract the content from the corresponding pdf file using a parser library of your choice (I suggest Apache PDFBox or Apache Tika in case you need to process other file types as well), put it between foo![CDATA[ and ]]/foo and store it in a text file. To keep the relationship between a file and its corresponding database row, use the primary key as the file name. Within data-config.xml use the XPathEntityProcessor as follows (replace dbRow and primaryKey respectively): entity name=pdfcontent processor=XPathEntityProcessor forEach=/foo url=${dbRow.primaryKey}.xml field column=pdftext xpath=/foo/ /entity And, by the way, in Solr 1.4 you do not have to put your content between xml tags: use the PlainTextEntityProcessor instead of XPathEntityProcessor. Best, Sascha Khai Doan schrieb: Hi all, My name is Khai. I have a table in a relational database. I have successfully use DataImportHandler to import this data into Apache Solr. However, one of the column store the location of PDF file. How can I configure DataImportHandler to use ExtractingRequestHandler to extract the content of the PDF? Thanks! Khai Doan
Re: how to scan dynamic field without specifying each field in query
A query parser, may be. But that would not help either. End of the day, someone has to create those many boolean queries in your case. Cheers Avlesh On Thu, Sep 3, 2009 at 10:59 PM, gdeconto gerald.deco...@topproducer.comwrote: thx for the reply. you mean into a multivalue field? possible, but was wondering if there was something more flexible than that. the ability to use a function (ie myfunction) would open up some possibilities for more complex searching and search syntax. I could write my own query parser with special extended syntax, but that is farther than I wanted to go. Manepalli, Kalyan wrote: You can copy the dynamic fields value into a different field and query on that field. Thanks, Kalyan Manepalli -- View this message in context: http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25280669.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to scan dynamic field without specifying each field in query
Hi, maybe SIREn [1] can help you for this task. SIREn is a Lucene plugin that allows to index and query tabular data. You can for example create a SIREn field foo, index n values in n cells, and then query a specific cell or a range of cells. Unfortunately, the Solr plugin is not yet available, and therefore you will have to write your own query syntax and parser for this task. Regards, [1] http://siren.sindice.com -- Renaud Delbru gdeconto wrote: thx for the reply. you mean into a multivalue field? possible, but was wondering if there was something more flexible than that. the ability to use a function (ie myfunction) would open up some possibilities for more complex searching and search syntax. I could write my own query parser with special extended syntax, but that is farther than I wanted to go. Manepalli, Kalyan wrote: You can copy the dynamic fields value into a different field and query on that field. Thanks, Kalyan Manepalli
Re: how to scan dynamic field without specifying each field in query
I am thinking that my example was too simple/generic :-U. It is possible for more several dynamic fields to exist and other functionality to be required. i.e. what about if my example had read: http://localhost:8994/solr/select?q=((Foo1:3 OR Foo2:3 OR Foo3:3 OR … Foo999:3) AND (Bar1:1 OR Bar2:1 OR Bar3:1...Bar999:1) AND (Etc1:7 OR Etc2:7 OR Etc3:7...Etc:999:7) obviously a nasty query (and care would be needed for MAX_BOOLEAN_CLAUSES). that said, are there other mechanisms to better handle that type of query, i.e.: http://localhost:8994/solr/select?q=(myfunction(‘Foo’, 3) AND myfunction('Bar', 1) AND (myfunction('Etc', 7)) gdeconto wrote: say I have a dynamic field called Foo* (where * can be in the hundreds) and want to search Foo* for a value of 3 (for example) I know I can do this via this: http://localhost:8994/solr/select?q=(Foo1:3 OR Foo2:3 OR Foo3:3 OR … Foo999:3) However, is there a better way? i.e. is there some way to query by a function I create, possibly something like this: http://localhost:8994/solr/select?q=myfunction(‘Foo’, 3) where myfunction itself iterates thru all the instances of Foo* any help appreciated -- View this message in context: http://www.nabble.com/how-to-scan-dynamic-field-without-specifying-each-field-in-query-tp25280228p25283094.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field Collapsing (was Re: Schema for group/child entity setup)
The collapsed documents are represented by one master document which can be part of the normal search result (the doc list), so pagination just works as expected, meaning taking only the returned documents in account (ignoring the collapsed ones). As for the scoring, the master document is actually the document with the highest score in the collapsed group. As for Solr 1.3 compatibility... well... it's very hart to tell. All latest patch are certainly *not* 1.3 compatible (I think they're also depending on some changes in lucene which are not available for solr 1.3). I guess you'll have to try some of the old patches, but I'm not sure about their stability. cheers, Uri R. Tan wrote: Thanks Uri. How does paging and scoring work when using field collapsing? What patch works with 1.3? Is it production ready? R On Thu, Sep 3, 2009 at 3:54 PM, Uri Boness ubon...@gmail.com wrote: The development on this patch is quite active. It works well for single solr instance, but distributed search (ie. shards) is not yet supported. Using this page you can group search results based on a specific field. There are two flavors of field collapsing - adjacent and non-adjacent, the former collapses only document which happen to be located next to each other in the otherwise-non-collapsed results set. The later (the non-adjacent) one collapses all documents with the same field value (regardless of their position in the otherwise-non-collapsed results set). Note, that non-adjacent performs better than adjacent one. There's currently discussion to extend this support so in addition to collapsing the documents, extra information will be returned for the collapsed documents (see the discussion on the issue page). Uri R. Tan wrote: I think this is what I'm looking for. What is the status of this patch? On Thu, Sep 3, 2009 at 12:00 PM, R. Tan tanrihae...@gmail.com wrote: Hi Solrers, I would like to get your opinion on how to best approach a search requirement that I have. The scenario is I have a set of business listings that may be group into one parent business (such as 7-eleven having several locations). On the results page, I only want 7-eleven to show up once but also show how many locations matched the query (facet filtered by state, for example) and maybe a preview of the some of the locations. Searching for the business name is straightforward but the locations within the a result is quite tricky. I can do the opposite, searching for the locations and faceting on business names, but it will still basically be the same thing and repeat results with the same business name. Any advice? Thanks, R
Re: Best way to do a lucene matchAllDocs not using q.alt=*:*
you can use LukeRequestHandler http://localhost:8983/solr/admin/luke Marc Sturlese wrote: Hey there, I need a query to get the total number of documents in my index. I can get if I do this using DismaxRequestHandler: q.alt=*:*facet=falsehl=falserows=0 I have noticed this query is very memory consuming. Is there any more optimized way in trunk to get the total number of documents of my index? Thanks in advanced
Using scoring from another program
Every document I put into Solr has a field origScore which is a floating point number between 0 and 1 that represents a score assigned by the program that generated the document. I would like it that when I do a query, it uses that origScore in the scoring, perhaps multiplying the Solr score to find a weighted score and using that to determine which are the highest scoring matches. Can I do that? -- http://www.linkedin.com/in/paultomblin
Re: Using scoring from another program
Function queries is what you need: http://wiki.apache.org/solr/FunctionQuery Paul Tomblin wrote: Every document I put into Solr has a field origScore which is a floating point number between 0 and 1 that represents a score assigned by the program that generated the document. I would like it that when I do a query, it uses that origScore in the scoring, perhaps multiplying the Solr score to find a weighted score and using that to determine which are the highest scoring matches. Can I do that?
Sanity check: ResonseWriter directly to a database?
Hello all, Are there any hidden gotchas--or even basic suggestions--regarding implementing something like a DBResponseWriter that puts responses right into a database? My specific questions are: 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin JDBC and then perhaps Hibernate libraries? I don't believe so, but I have just enough understanding to be dangerous at the moment. 2) Is JSONResponseWriter a reasonable copy/paste starting point for me? Is there anything that might match better, especially regarding initialization and connection pooling? 3) Say I have a read-write single-core solr server: a vanilla-out-of-the-box example install. Can I concurrently update the underlying index safely with EmbeddedSolrServer? (This is my backup approach, less preferred) I assume no, one of them has to be read only, but I've learned not to under-estimate the lucene/solr developers. I'm starting with adapting JSONResponseWriter and the http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to indicate all I need to do is package up the appropriate supporting (jdbc) jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g. c:\solr-svn\example\solr\lib). Of course, I need to update my solrconfig.xml to use the new DBResponseWriter. Straight straight JDBC seems like the easiest starting point. If that works, perhaps move the DB stuff to hibernate. Does anyone have a best practice suggestion for database access inside a plugin? I rather expect the answer might be use JNDI and well-configured hibernate; no special problems related to 'inside' a solr plugin. I will eventually be interested in saving both query results and document indexing information, so I expect to do this in both a (custom) ResponseWriter, and ... um... a DocumentAnalysisRequestHandler? I realize embedded solr might be a better choice (performance has been a big issue in my current implementation), and I am looking into that as well. If feasible, I'd like to keep solr in charge of the database content through plugins and extensions, rather than keeping both solr and db synced from my (grails) app. Thanks, Sean -- View this message in context: http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Clarifications to Synonym Filter Wiki entry? (1 of 2)
: I believe the following section is a bit misleading; I'm sure it's correct : for the case it describes, but there's another case I've tested, which on : the surface seemed similar, but where the actual results were different and : in hindsight not really a conflict, just a surprise. the crux of the issue is that *lines* in the file with only commas (no =) are ambiguious, and only have meaning once the expand property is evaluated. once that's done then you have a list of *mappings* ... and it's the mappings that get merged. : I tested this by actually looking at the word index with Luke. FYI: an easy way to test it would probably be the analysis.jsp page : If you DID want the merged behavior, where D would expand to match all 9 : letters you can either: : 1: Put the synonym filter in the pipeline twice, along with the remove : duplicates filter : OR : 2: Use the synonym filter at both index and query time using the filter at query time with expand=true would wreck havoc with phrase queries ... your best bet is to be more explicit when expressing the mappings in the file. : And what should be added to the Wiki doc? Add whatever you think would help ... users discovering behavior for hte first time are the best people to write documentation, because the devs who know the code really well don't apprecaite what isn't obvious. -Hoss
Single Core or Multiple Core?
It seems like it is really hard to decide when the Multiple Core solution is more appropriate.As I could understand from this list and wiki the Multiple Core feature was designed to address the need of handling different sets of data within the same solr instance, where the sets of data don't need to be joined. In my case the documents are of a specific site and country. So document A can be of Site 1 / Country 1, B of Site 2 / Country 1, C of Site 1 / Country 2, and so on. For the use cases of my application I will never query across countries or sites. I will always have to provide to the query the country id and the site id. Would you suggest to split my data into cores? I have few sites (around 20) and more countries (around 90). Should I split my data into sites (around 20 cores) and within a core filter by site? Should I split by Site and Country (around 1800 cores)? What should I consider when splitting my data into multiple cores? Thanks Jonathan
Re: Searching with or without diacritics
Take a look at the MappingCharFilterFactory (in Solr 1.4) and/or the ISOLatin1AccentFilterFactory. : Date: Thu, 27 Aug 2009 16:30:08 +0200 : From: [ISO-8859-1] Gy�rgy Frivolt gyorgy.friv...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user solr-user@lucene.apache.org : Subject: Searching with or without diacritics : : Hello, : : I started to use solr only recently using the ruby/rails sunspot-solr : client. I use solr on a slovak/czech data set and realized one not wanted : behaviour of the search. When the user searches an expression or word which : contains dicritics, letters like š, č, ť, ä, ô,... usually the special : characters are omitted in the search query. In this case solr does not : return records which contain the expression intended to be found by the : user. : How can I configure solr in a way, that it founds records containing : special characters, even if they are without special accents in the query? : : Some info about my solr instance: Solr Specification Version: 1.3.0Solr : Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 : 11:06:47Lucene Specification Version: 2.4-devLucene Implementation Version: : 2.4-dev 691741 - 2008-09-03 15:25:16 : : Thank for your help, regards, : : Georg : -Hoss
Re: SnowballPorterFilterFactory stemming word question
: If i give machine why is that it stems to machin, now from where does : this word come from : If i give revolutionary it stems to revolutionari, i thought it should : stem to revolution. : : How does stemming work? the porter stemmer (and all of the stemmers provided with solr) are programtic stemmers ... they don't actually know the root of any words the use an aproximate algorithm to compute a *token* from a word based on a set of rules ... these tokens aren't neccessarily real words (and most of the time they aren't words) but the same token tends to be produced from words with similar roots. if you want to see the actaul root word, you'll have to use a dictionary based stemmer. -Hoss
Re: Impact of compressed=true attribute (in schema.xml) on Indexing/Query
: Now the question is, how the compressed=true flag impacts the indexing : and Querying operations. I am sure that there will be CPU utilization : spikes as there will be operation of compressing(during indexing) and : uncompressing(during querying) of the indexed data. I am mainly looking : for any bench marks for the above scenario. i don't have any hard numbers for you, but the stored data isn't uncompressed when executing aquery -- queries are executed against the indexed terms (which are never compressed) ... the only time the data will be uncompressed is when returning results to the client -- so if you set rows=17 in your request, only the values for the 17 docs returned (or less if there were fewer then 17 matches) will be uncompressed. -Hoss
Re: Optimal Cache Settings, complicated by regular commits
: I'm trying to work out the optimum cache settings for our Solr server, I'll : begin by outlining our usage. ...but you didn't give any information about what your cache settings look like ... size is only part of the picture, the autowarm counts are more significant. : Commit frequency: sometimes we do massive amounts of sequential commits, if you know you are going to be indexing more docs soon, then you can hold off on issuing a commit ... it really comes down to what kind of SLA you have to provide on how quickly an add/update is visible in the index -- don't commit any more often then that. : The problem we have is that the default cache settings resulting in very low : hit rates (less than 30% for documents, less than 1% for filterCache), so we under 1% for filterCache sounds like you either have some really unique filter queries, or you are using enum based faceting on a huge field and the LRU cache is working against you by expunging values during a single request ... what version of solr are you using? what do the fieldtype declarations look like for the fields you are faceting on? what do the luke stats look like for hte fields you are faceting on? : now we have the issue of commits being very slow (more than 5 seconds for a : document), to the point where it causes a timeout elsewhere in our systems. : This is made worse by the fact that committing seems to empty the cache, : given that it takes about an hour to get the cache to a good state this is : obviously very problematic. 1) using waitSearch=false can help speed up the commit if all you care about is not having your client time out. 2) using autowarming can help fill the caches up prior to users making requests (you may already know that, but since you didn't provide your cache configs i have no idea) .. they key is finding a good autowarm count that helps your cache stats w/o taking too long to fill up. -Hoss
Re: Sorting performance + replication of index between cores
Did u guys find a solution? I am having a similar issue. Setup: One indexer box 2 searcher box. Each having 6 different solr-cores We have a lot of updates (in the range of a couple thousand items every few mins). The Snappuller/Snapinstaller pulls and commits every 5 mins. Query response time peaks to 60+ seconds when a new searcher is being prepared. I have disabled the caches (filter, query document). We have a strict requirement of response time 10 secs all the time. Thanks Sreeram sunnyfr wrote: Hi Christophe, Did you find a way to fix up your problem, cuz even with replication will have this problem, lot of update means clear cache and manage that. I've the same issue, I just wondering if I won't turn off servers during update ??? How did you fix that ? Thanks, sunny christophe-2 wrote: Hi, After fully reloading my index, using another field than a Data does not help that much. Using a warmup query avoids having the first request slow, but: - Frequents commits means that the Searcher is reloaded frequently and, as the warmup takes time, the clients must wait. - Having warmup slows down the index process (I guess this is because after a commit, the Searchers are recreated) So I'm considering, as suggested, to have two instances: one for indexing and one for searching. I was wondering if there are simple ways to replicate the index in a single Solr server running two cores ? Any such config already tested ? I guess that the standard replication based on rsync can be simplified a lot in this case as the two indexes are on the same server. Thanks Christophe Beniamin Janicki wrote: :so you can send your updates anytime you want, and as long as you only :commit every 5 minutes (or commit on a master as often as you want, but :only run snappuller/snapinstaller on your slaves every 5 minutes) your :results will be at most 5minutes + warming time stale. This is what I do as well ( commits are done once per 5 minutes ). I've got master - slave configuration. Master has turned off all caches (commented in solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB ,Xmx= 1GB and committing takes around 10 secs ( on default configuration with warming it took from 30 mins up to 2 hours). Slave caches are configured to have autowarmCount=0 and maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is done. I haven't noticed any huge delays while serving search request. Try to use those values - may be they'll help in your case too. Ben Janicki -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 22 October 2008 04:56 To: solr-user@lucene.apache.org Subject: Re: Sorting performance : The problem is that I will have hundreds of users doing queries, and a : continuous flow of document coming in. : So a delay in warming up a cache could be acceptable if I do it a few times : per day. But not on a too regular basis (right now, the first query that loads : the cache takes 150s). : : However: I'm not sure why it looks not to be a good idea to update the caches you can refresh the caches automaticly after updating, the newSearcher event is fired whenever a searcher is opened (but before it's used by clients) so you can configure warming queries for it -- it doesn't have to be done manually (or by the first user to use that reader) so you can send your updates anytime you want, and as long as you only commit every 5 minutes (or commit on a master as often as you want, but only run snappuller/snapinstaller on your slaves every 5 minutes) your results will be at most 5minutes + warming time stale. -Hoss -- View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p25286018.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Re : Using SolrJ with Tika
See https://issues.apache.org/jira/browse/SOLR-1411 On Sep 3, 2009, at 6:47 AM, Angel Ice wrote: Hi This is the solution I was testing. I got some difficulties with AutoDetectParser but I think it's the solution I will use in the end. Thanks for the advice anyway :) Regards, Laurent De : Abdullah Shaikh abdullah.sha...@viithiisys.com À : solr-user@lucene.apache.org Envoyé le : Jeudi, 3 Septembre 2009, 14h31mn 10s Objet : Re: Using SolrJ with Tika Hi Laurent, I am not sure if this is what you need, but you can extract the content from the uploaded document (MS Docs, PDF etc) using TIKA and then send it to SOLR for indexing. String CONTENT = extract the content using TIKA (you can use AutoDetectParser) and then, SolrInputDocument doc = new SolrInputDocument(); doc.addField(DOC_CONTENT, CONTENT); solrServer.add(doc); soltServer.commit(); On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice lbil...@yahoo.fr wrote: Hi everybody. I hope it's the right place for questions, if not sorry. I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene. I have seen a few examples explaining how to use tika to solve this. But most of these examples are using curl to send documents to Solr or an HTML POST with an input file. But i'd like to do it in full java. Is there a way to use Solrj to index the documents with the ExtractingRequestHandler of SolR or at least to get the extracted xml back (with the extract.only option) ? Many thanks. Laurent. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Solr, JNDI config, dataDir, and solr home problem
Here's my problem. I'm trying to follow a multi Solr setup, straight from the Solr wiki - http://wiki.apache.org/solr/SolrTomcat#head-024d7e11209030f1dbcac9974e55106abae837ac. Here's the relevant code: lt;Context docBase=/some/path/solr.war debug=0 crossContext=true gt; lt;Environment name=solr/home type=java.lang.String value=/some/path/solr1home override=true /gt; lt;/Contextgt; Now I want to set the Solr lt;dataDirgt; in solrconfig.xml, relative to the solr home property. The instructions http://wiki.apache.org/solr/SolrConfigXml#head-e8fbf2d748d90c5900aac712d0e3385ced5bd128 say lt;dataDirgt; is used to specify an alternate directory to hold all index data other than the default ./data under the Solr home. If replication is in use, this should match the replication configuration. If this directory is not absolute, then it is relative to the current working directory of the servlet container. However, no matter how I try to set the dataDir property, solr home is not being found. For example, lt;dataDirgt;${solr.home}/datalt;/dataDirgt; What's even more confusing are these INFO notices in the log: INFO: No /solr/home in JNDI Sep 3, 2009 4:33:26 PM org.apache.solr.core.SolrResourceLoader locateSolrHome INFO: solr home defaulted to 'solr/' (could not find system property or JNDI) The JNDI instructions instruct to specify solr/home, the log complains about /solr/home (extra slash), the solrconfig.xml file seems to expect ${solr.home} - how more confusing can it get? This person is having the same issue: http://mysolr.com/tips/setting-solr-home-solrhome-in-jndi-on-tomcat-55/ So, how does one refer to solr home from solrconfig.xml in a JNDI configuration scenario? Also, is there a way to debug/see variables that are defined in a specific context, such as solrconfig.xml? I feel like I'm completely blind here. Thank you! -- View this message in context: http://www.nabble.com/Solr%2C-JNDI-config%2C-dataDir%2C-and-solr-home-problem-tp25286277p25286277.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Logging solr requests
: - I think that the use of log files is discouraged, but i don't know if i : can modify solr settings to log to a server (via rmi or http) : - Don't want to drop down solr response performance discouraged by who? ... having aseperate process tail your log file and build an index that way is the simplest way to do this without impact Solr's performace ... alternately you could write a custom LogHandler that sends the data anywhere you want (so you never need a log file) but that would require some non-trivial asynch code in your LogHandler to keep the budiling of your new idex from affecting hte performace (log calls are synchronous) -Hoss
Re: Problem querying for a value with a space
: Use +specific_LIST_s:(For Sale) : or : +specific_LIST_s:For Sale those are *VERY* different queries. The first is just syntac sugar for... +specific_LIST_s:For +specific_LIST_s:Sale ...which is not the same as the second query (especially when using StrField, or KeyworddTokenizer) -Hoss
Re: Sanity check: ResonseWriter directly to a database?
Are there any hidden gotchas--or even basic suggestions--regarding implementing something like a DBResponseWriter that puts responses right into a database? Absolutely not! A QueryResponseWriter with an empty write method fulfills all interface obligations. My only question is, why do you want a ResponeWriter to do this for you? Why not write something outside Solr, which gets the response, and then puts it in database. If it has to be a Solr utility, then maybe a RequestHandler. The only reason I am asking this, is because your QueryResponseWriter will have to implement a method called getContentType. Sounds illigical in your case. Any problems adding non-trivial jars to a solr plugin? None. I have tonnes of them. Is JSONResponseWriter a reasonable copy/paste starting point for me? Is there anything that might match better, especially regarding initialization and connection pooling? As I have tried to expalain above, a QueryResponseWriter with an empty write method is just perfect. You can use anyone of the well know writers as a starting point. Say I have a read-write single-core solr server: a vanilla-out-of-the-box example install. Can I concurrently update the underlying index safely with EmbeddedSolrServer? Yes you can! Other searchers will only come to know of changes when they are re-opened. Cheers Avlesh On Fri, Sep 4, 2009 at 3:26 AM, seanoc5 sean...@gmail.com wrote: Hello all, Are there any hidden gotchas--or even basic suggestions--regarding implementing something like a DBResponseWriter that puts responses right into a database? My specific questions are: 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin JDBC and then perhaps Hibernate libraries? I don't believe so, but I have just enough understanding to be dangerous at the moment. 2) Is JSONResponseWriter a reasonable copy/paste starting point for me? Is there anything that might match better, especially regarding initialization and connection pooling? 3) Say I have a read-write single-core solr server: a vanilla-out-of-the-box example install. Can I concurrently update the underlying index safely with EmbeddedSolrServer? (This is my backup approach, less preferred) I assume no, one of them has to be read only, but I've learned not to under-estimate the lucene/solr developers. I'm starting with adapting JSONResponseWriter and the http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to indicate all I need to do is package up the appropriate supporting (jdbc) jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g. c:\solr-svn\example\solr\lib). Of course, I need to update my solrconfig.xml to use the new DBResponseWriter. Straight straight JDBC seems like the easiest starting point. If that works, perhaps move the DB stuff to hibernate. Does anyone have a best practice suggestion for database access inside a plugin? I rather expect the answer might be use JNDI and well-configured hibernate; no special problems related to 'inside' a solr plugin. I will eventually be interested in saving both query results and document indexing information, so I expect to do this in both a (custom) ResponseWriter, and ... um... a DocumentAnalysisRequestHandler? I realize embedded solr might be a better choice (performance has been a big issue in my current implementation), and I am looking into that as well. If feasible, I'd like to keep solr in charge of the database content through plugins and extensions, rather than keeping both solr and db synced from my (grails) app. Thanks, Sean -- View this message in context: http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact Word Search
Hi shalin, Thanks for your reply. I am not sure as how the query is formed in Solr. If you could throw some light on this , it will be helpful. Is it achievable?. Regards Bhaskar --- On Thu, 9/3/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote: From: Shalin Shekhar Mangar shalinman...@gmail.com Subject: Re: Exact Word Search To: solr-user@lucene.apache.org Date: Thursday, September 3, 2009, 5:14 AM On Thu, Sep 3, 2009 at 1:33 PM, bhaskar chandrasekar bas_s...@yahoo.co.inwrote: Hi, Can any one help me with the below scenario?. Scenario : I have integrated Solr with Carrot2. The issue is Assuming i give bhaskar as input string for search. It should give me search results pertaining to bhaskar only. Example: It should not display search results as chandarbhaskar or bhaskarc. Basically search should happen based on the exact word match. I am not bothered about case sensitive here How to achieve the above Scenario in Carrot2 ?. Bhaskar, I think this question is better suited for the Carrot mailing lists. Unless you yourself control how the solr query is created, we will not be able to help you. -- Regards, Shalin Shekhar Mangar.
Re: Sanity check: ResonseWriter directly to a database?
Avlesh, Great response, just what I was looking for. As far as QueryResponseWriter vs RequestHandler: you're absolutely right, request handling is the way to go. It looks like I can start with something like : public class SearchSavesToDBHandler extends RequestHandlerBase implements SolrCoreAware I am still weighing keeping this logic in my app, However, with solr-cell coming along nicely, and my the nature of my queries (95% pre-defined for content analysis), I am leaning toward the extra work of embedding the processing in solr. I'm still unclear where the best path is, but I think that's fairly specific to my app. Great news about the flexibility of having both approaches be able to work on the same index. That may well save me if I run out of time on the plugin development. Thank for your relply, it was a great help, Sean Avlesh Singh wrote: Are there any hidden gotchas--or even basic suggestions--regarding implementing something like a DBResponseWriter that puts responses right into a database? Absolutely not! A QueryResponseWriter with an empty write method fulfills all interface obligations. My only question is, why do you want a ResponeWriter to do this for you? Why not write something outside Solr, which gets the response, and then puts it in database. If it has to be a Solr utility, then maybe a RequestHandler. The only reason I am asking this, is because your QueryResponseWriter will have to implement a method called getContentType. Sounds illigical in your case. Any problems adding non-trivial jars to a solr plugin? None. I have tonnes of them. Is JSONResponseWriter a reasonable copy/paste starting point for me? Is there anything that might match better, especially regarding initialization and connection pooling? As I have tried to expalain above, a QueryResponseWriter with an empty write method is just perfect. You can use anyone of the well know writers as a starting point. Say I have a read-write single-core solr server: a vanilla-out-of-the-box example install. Can I concurrently update the underlying index safely with EmbeddedSolrServer? Yes you can! Other searchers will only come to know of changes when they are re-opened. Cheers Avlesh On Fri, Sep 4, 2009 at 3:26 AM, seanoc5 sean...@gmail.com wrote: Hello all, Are there any hidden gotchas--or even basic suggestions--regarding implementing something like a DBResponseWriter that puts responses right into a database? My specific questions are: 1) Any problems adding non-trivial jars to a solr plugin? I'm thinkin JDBC and then perhaps Hibernate libraries? I don't believe so, but I have just enough understanding to be dangerous at the moment. 2) Is JSONResponseWriter a reasonable copy/paste starting point for me? Is there anything that might match better, especially regarding initialization and connection pooling? 3) Say I have a read-write single-core solr server: a vanilla-out-of-the-box example install. Can I concurrently update the underlying index safely with EmbeddedSolrServer? (This is my backup approach, less preferred) I assume no, one of them has to be read only, but I've learned not to under-estimate the lucene/solr developers. I'm starting with adapting JSONResponseWriter and the http://wiki.apache.org/solr/SolrPlugins wiki notes . The docs seem to indicate all I need to do is package up the appropriate supporting (jdbc) jar files into my MyDBResponse.jar, and drop it into the ./lib dir (e.g. c:\solr-svn\example\solr\lib). Of course, I need to update my solrconfig.xml to use the new DBResponseWriter. Straight straight JDBC seems like the easiest starting point. If that works, perhaps move the DB stuff to hibernate. Does anyone have a best practice suggestion for database access inside a plugin? I rather expect the answer might be use JNDI and well-configured hibernate; no special problems related to 'inside' a solr plugin. I will eventually be interested in saving both query results and document indexing information, so I expect to do this in both a (custom) ResponseWriter, and ... um... a DocumentAnalysisRequestHandler? I realize embedded solr might be a better choice (performance has been a big issue in my current implementation), and I am looking into that as well. If feasible, I'd like to keep solr in charge of the database content through plugins and extensions, rather than keeping both solr and db synced from my (grails) app. Thanks, Sean -- View this message in context: http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25284734.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Sanity-check%3A-ResonseWriter-directly-to-a-database--tp25284734p25288206.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Responses getting truncated
So we have been running LucidWorks for Solr for about a week now and have seen no problems - so I believe it was due to that buffering issue in Jetty 6.1.3, estimated here: It really looks like you're hitting a lower-level IO buffering bug (esp when you see a response starting off with the tail of another response). That doesn't look like it could be a Solr bug... but rather smells like a thread safety bug in the servlet container. Thanks for everyones help and input. LucidWorks For The Win. -Rupert On Fri, Aug 28, 2009 at 4:07 PM, Rupert Fiascorufia...@gmail.com wrote: I deployed LucidWorks with my existing solrconfig / schema and re-indexed my data into it and pushed it out to production, we'll see how it stacks up over the weekend. Already queries that were breaking on the prior Jetty/stock Solr setup are now working - but I have seen it before where upon an initial re-index things work OK then a couple of days later they break. Keep y'all posted. Thanks -Rupert On Fri, Aug 28, 2009 at 3:12 PM, Rupert Fiascorufia...@gmail.com wrote: Yes, I am hitting the Solr server directly (medsolr1.colo:9007) Versions / architectures: Jetty(6.1.3) o...@medsolr1 ~ $ uname -a Linux medsolr1 2.6.18-xen-r12 #9 SMP Tue Mar 3 15:34:08 PST 2009 x86_64 Intel(R) Xeon(R) CPU L5420 @ 2.50GHz GenuineIntel GNU/Linux o...@medsolr1 ~ $ java -version java version 1.6.0_11 Java(TM) SE Runtime Environment (build 1.6.0_11-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.0-b16, mixed mode) I was thinking of trying LucidWorks for Solr (1.3.02) x64 - worth a try. -Rupert On Fri, Aug 28, 2009 at 3:08 PM, Yonik Seeleyysee...@gmail.com wrote: On Mon, Aug 24, 2009 at 6:30 PM, Rupert Fiascorufia...@gmail.com wrote: If I run these through curl on the command its truncated and if I run the search through the web-based admin panel then I get an XML parse error. Are you running curl directly against the solr server, or going through a load balancer? Cutting out the middle-men using curl was a great idea - just make sure to go all the way. At first I thought it could possibly be a FastWriter bug (internal Solr class), but that's only used on the TextWriter (JSON, Python, Ruby) based formats, not on the original XML format. It really looks like you're hitting a lower-level IO buffering bug (esp when you see a response starting off with the tail of another response). That doesn't look like it could be a Solr bug... but rather smells like a thread safety bug in the servlet container. What type of machine are you running on? What JVM? You could try upgrading your version of Jetty, the JVM, or try switching to Tomcat. -Yonik http://www.lucidimagination.com This appears to have just started recently and the only thing we have done is change our indexer from a PHP one to a Java one, but functionally they are identical. Any thoughts? Thanks in advance. - Rupert