Knowledge about contents of a page
Hi, My question is about crawling, I know this is not relevant here, but I asked nutch people, didn't get any response, I just thought of posing here, I'm trying to crawl reviews for business, a. is there any way to tell the content in a web pages are reviews or not? Is it possible to do it in automated fashion? b. How could be map a block of text to a particular business ? ex: like google reviews Thanks Ram -- View this message in context: http://old.nabble.com/Knowledge-about-contents-of-a-page-tp27358779p27358779.html Sent from the Solr - User mailing list archive at Nabble.com.
Dismax and Standard Queries together
Hi, I have three fields, business_name, category_name, sub_category_name in my solrconfig file. my query = pet clinic example sub_category_names: Veterinarians, Kennels, Veterinary Clinics Hospitals, Pet Grooming, Pet Stores, Clinics my ideal requirement is dismax searching on a. dismax over three or two fields b. followed by a Boolean match over any one of the field is acceptable. I played around with minimum match attributes, but doesn't seems to be helpful, I guess the dismax requires at-least two fields. The nest queries takes only one qf filed, so it doesn't help much either. Any suggestions will be helpful. Thanks Ram -- View this message in context: http://old.nabble.com/Dismax-and-Standard-Queries-together-tp26157830p26157830.html Sent from the Solr - User mailing list archive at Nabble.com.
Dismax params, mm lt explanation
Hi, consider this minimum match params in dismax query handler, str name=mm 2lt;-1 3lt;-2 6lt;100% /str I requested solr to match atleast two fields, which i understood from the documents. Can someone give me explanations for other params in it? lt;-1 3 lt;-2 6 lt;100% how are they significant? thanks ram -- View this message in context: http://www.nabble.com/Dismax-params%2C--mm--lt-explanation-tp26049472p26049472.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax params, mm lt explanation
After reading the explanation in the book, its very clear now. Thank you citing it with page number, Ram hossman wrote: What you are looking at is an XML escaped version of this string... 2-1 3-2 6100% ...the syntax is documented here... http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29 http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html ...note that the string you have listed there actually makes very little sense because of the 100% condition. it says that for queries of more then 6 clauses all of them are required (usually the mm param get's less strict as the number of clauses increase) (FYI: As the creator of the 'mm' param syntax, One of my favorite parts of the new Solr 1.4 book is the explanation of mm options with multiple clauses. It's descibes in in a completely differnet way from anything i'd ever thought of before (i was convinced it was a huge mistake the first two times i read that section before the light bulb went off and i realized how brilliant it was) and is probably a lot easier for many people to understand -- if you have the book it's on p139) : 2lt;-1 3lt;-2 6lt;100% ... : I requested solr to match atleast two fields, which i understood from the : documents. Can someone give me explanations for other params in it? : : lt;-1 3 : : lt;-2 6 : : lt;100% -Hoss -- View this message in context: http://www.nabble.com/Dismax-params%2C--mm--lt-explanation-tp26049472p26052492.html Sent from the Solr - User mailing list archive at Nabble.com.
Geo Coding Service
Hi, Can someone suggest me a good geo-coding service or software for commercial use. I want to find gecodes for large collection of address. I'm looking for a good long term service. Thanks Ram -- View this message in context: http://www.nabble.com/Geo-Coding-Service-tp25774277p25774277.html Sent from the Solr - User mailing list archive at Nabble.com.
dismax matches ranking
Hi I have following questions about dismax query handler? someone can clarify me about it. 1. dismax query handler and filter query (fq) if query= coffee , fq= yiw_bus_city: san jose, I get 0 results for this query again, but this one works fine, If mention qt=standard query handler 2. dismax and ranking q=san jose but my collection have more document for San Francisco, less for San Jose, a. i get san francisco listed or listed before san jose some time, i guess this is because of the term frequency of san francisco, how can I present the results for the exact query match first? , I don't want to manually boost the particular keyword for some reason. listing the exact matches first and following by other results will be good. configs: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf yiw_bus_name^1.0 yiw_bus_city^1.0 yiw_bus_ps_info^0.2 yiw_bus_description^0.2 yiw_bus_general_information^0.2 yiw_bus_zip^0.5 yiw_bus_street^0.5 /str str name=pf yiw_bus_city^1.0 yiw_bus_zip^0.5 yiw_bus_street^0.5 /str str name=bf ord(yiw_bus_name)^0.5 recip(rord(yiw_bus_city),1,1000,1000)^0.3 /str !-- str name=fl/str -- str name=mm 2lt;-1 5lt;-2 6lt;70% /str int name=ps100/int str name=q.alt*:*/str !-- example highlighter config, enable per-query with hl=true -- str name=hl.fl/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize0/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.name.hl.alternateFieldyiw_bus_name/str str name=f.text.hl.fragmenterregex/str !-- defined below -- /lst /requestHandler schema: field name=yiw_bus_general_information type=text indexed=true stored=true default=NA / field name=yiw_bus_ps_info type=string indexed=true stored=true default=NA / field name=yiw_bus_city type=string indexed=true stored=true multiValued=false default=NA / field name=yiw_bus_state type=string indexed=true stored=true multiValued=false default=NA / field name=yiw_bus_country type=string indexed=true stored=true multiValued=false default=NA / field name=yiw_bus_street type=string indexed=true stored=true multiValued=false default=NA / field name=yiw_bus_zip type=string indexed=true stored=true multiValued=false default=0 / -- View this message in context: http://www.nabble.com/dismax-matches---ranking-tp25300011p25300011.html Sent from the Solr - User mailing list archive at Nabble.com.
Determining Search Query Category
Hi, I have more than 20 categories for my search application. I'm interested in finding the category of query entered by user dynamically instead of asking the user to filter the results through long list of categories. Its a general question, its not specific to solr though, any suggestion about how to approach this problem will be helpful. Thanks Ram -- View this message in context: http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html Sent from the Solr - User mailing list archive at Nabble.com.