Knowledge about contents of a page

2010-01-28 Thread ram_sj

Hi,

My question is about crawling, I know this is not relevant here, but I asked
nutch people, didn't get any response, I just thought of posing here,

I'm trying to crawl reviews for business, 

a. is there any way to tell the content in a web pages are reviews or not?
Is it possible to do it in automated fashion?

b. How could be map a block of text to a particular business ? ex: like
google reviews

   

Thanks
Ram
-- 
View this message in context: 
http://old.nabble.com/Knowledge-about-contents-of-a-page-tp27358779p27358779.html
Sent from the Solr - User mailing list archive at Nabble.com.



Dismax and Standard Queries together

2009-11-02 Thread ram_sj

Hi,

I have three fields, business_name, category_name, sub_category_name in my
solrconfig file.

my query = pet clinic

example sub_category_names: Veterinarians, Kennels, Veterinary Clinics  
Hospitals, Pet Grooming, Pet Stores, Clinics

my ideal requirement is dismax searching on 

a. dismax over three or two fields
b. followed by a Boolean match over any one of the field is acceptable.

I played around with minimum match attributes, but doesn't seems to be
helpful, I guess the dismax requires at-least two fields. 

The nest queries takes only one qf filed, so it doesn't help much either.

Any suggestions will be helpful.

Thanks
Ram
-- 
View this message in context: 
http://old.nabble.com/Dismax-and-Standard-Queries-together-tp26157830p26157830.html
Sent from the Solr - User mailing list archive at Nabble.com.



Dismax params, mm lt explanation

2009-10-25 Thread ram_sj

Hi,
consider this minimum match params in dismax query handler, 

  str name=mm
2lt;-1 3lt;-2 6lt;100%
  /str

I requested solr to match atleast two fields, which i understood from the
documents. Can someone give me explanations for other params in it? 

lt;-1 3

lt;-2 6

lt;100%

how are they significant? 

thanks
ram
-- 
View this message in context: 
http://www.nabble.com/Dismax-params%2C--mm--lt-explanation-tp26049472p26049472.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dismax params, mm lt explanation

2009-10-25 Thread ram_sj

After reading the explanation in the book, its very clear now. Thank you
citing it with page number, 

Ram


hossman wrote:
 
 
 What you are looking at is an XML escaped version of this string...
 
   2-1 3-2 6100%
 
 ...the syntax is documented here...
 
 http://wiki.apache.org/solr/DisMaxRequestHandler#mm_.28Minimum_.27Should.27_Match.29
 http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html
 
 ...note that the string you have listed there actually makes very little 
 sense because of the 100% condition.  it says that for queries of more 
 then 6 clauses all of them are required (usually the mm param get's less 
 strict as the number of clauses increase)
 
 (FYI: As the creator of the 'mm' param syntax, One of my favorite parts of 
 the new Solr 1.4 book is the explanation of mm options with multiple 
 clauses.  It's descibes in in a completely differnet way from anything i'd 
 ever thought of before (i was convinced it was a huge mistake the first 
 two times i read that section before the light bulb went off and i 
 realized how brilliant it was) and is probably a lot easier for many 
 people to understand -- if you have the book it's on p139)
 
 : 2lt;-1 3lt;-2 6lt;100%
   ...
 : I requested solr to match atleast two fields, which i understood from
 the
 : documents. Can someone give me explanations for other params in it? 
 : 
 : lt;-1 3
 : 
 : lt;-2 6
 : 
 : lt;100%
 
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Dismax-params%2C--mm--lt-explanation-tp26049472p26052492.html
Sent from the Solr - User mailing list archive at Nabble.com.



Geo Coding Service

2009-10-06 Thread ram_sj

Hi,

Can someone suggest me a good geo-coding service or software for commercial
use. I want to find gecodes for large collection of address. I'm looking for
a good long term service. 

Thanks
Ram
-- 
View this message in context: 
http://www.nabble.com/Geo-Coding-Service-tp25774277p25774277.html
Sent from the Solr - User mailing list archive at Nabble.com.



dismax matches ranking

2009-09-04 Thread ram_sj

Hi I have following questions about dismax query handler? someone can clarify
me about it.

1. dismax query handler and filter query (fq)

if query= coffee , fq= yiw_bus_city: san jose, 

I get 0 results for this query again, but this one works fine, If mention
qt=standard query handler

2. dismax and ranking

q=san jose 

but my collection have more document for San Francisco, less for San Jose,

a. i get san francisco listed or listed before san jose some time, i guess
this is because of the term frequency of san francisco,

how can I present the results for the exact query match first? , I don't
want to manually boost the particular keyword for some reason. listing the
exact matches first and following by other results will be good.


configs:

requestHandler name=dismax class=solr.SearchHandler default=true 
lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qf
yiw_bus_name^1.0 yiw_bus_city^1.0 yiw_bus_ps_info^0.2
yiw_bus_description^0.2 yiw_bus_general_information^0.2 yiw_bus_zip^0.5
yiw_bus_street^0.5
  /str
  str name=pf
yiw_bus_city^1.0 yiw_bus_zip^0.5 yiw_bus_street^0.5
  /str
  str name=bf
ord(yiw_bus_name)^0.5 recip(rord(yiw_bus_city),1,1000,1000)^0.3
  /str
  !-- 
 str name=fl/str
 --
  str name=mm
2lt;-1 5lt;-2 6lt;70%
  /str
  int name=ps100/int
  str name=q.alt*:*/str
  !-- example highlighter config, enable per-query with hl=true --
  str name=hl.fl/str
  !-- for this field, we want no fragmenting, just highlighting --
  str name=f.name.hl.fragsize0/str
  !-- instructs Solr to return the field itself if no query terms are
  found --
  str name=f.name.hl.alternateFieldyiw_bus_name/str
  str name=f.text.hl.fragmenterregex/str
  !-- defined below --
/lst
  /requestHandler

schema:

field name=yiw_bus_general_information type=text indexed=true
stored=true default=NA /
field name=yiw_bus_ps_info type=string indexed=true stored=true
default=NA /
field name=yiw_bus_city type=string indexed=true stored=true
multiValued=false default=NA /
field name=yiw_bus_state type=string indexed=true stored=true
multiValued=false default=NA /
field name=yiw_bus_country type=string indexed=true stored=true
multiValued=false default=NA /
field name=yiw_bus_street type=string indexed=true stored=true
multiValued=false default=NA /
field name=yiw_bus_zip type=string   indexed=true  stored=true
multiValued=false default=0 /







 
-- 
View this message in context: 
http://www.nabble.com/dismax-matches---ranking-tp25300011p25300011.html
Sent from the Solr - User mailing list archive at Nabble.com.



Determining Search Query Category

2009-06-04 Thread ram_sj

Hi,

I have more than 20 categories for my search application. I'm interested in
finding the category of query entered by user dynamically instead of asking
the user to filter the results through long list of categories. 

Its a general question, its not specific to solr though, any suggestion
about how to approach this problem will be helpful.

Thanks
Ram 
-- 
View this message in context: 
http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html
Sent from the Solr - User mailing list archive at Nabble.com.