Re: Snippets of indexed text

2007-03-30 Thread Pierre-Yves LANDRON

hello,

thanks for the info ; it's exactly what i need. i can't manage to make it 
works, though. it's strange because i have the same problem with facets : it 
seems that some options are not taken in account...


for example, here is my request to solr:
q=%28%28titre:moulin%29+OR+%28texte:moulin%29+OR+%28sujet:moulin%29+OR+%28desc:moulin%29%29&version=2.1&start=0&rows=12&fl=*+score&qt=standard&hl=true&hl.fl=texte,desc&hl.snippets=3&hl.fragsize=150

and an extract of the response is :

0.0151801035
bml:8071

Les Grands Moulins
Le chemin de la Bouteille n'est pas, comme son nom semblerait l'indiquer, le 
chemin préféré des ivrognes. En l'occurrence, c'est plutôt le chemin des 
Boulangers ou mieux encore (... cutted by me, in fact all the field is 
returned)


http://10.208.0.215:8080/fedora/get/bml:8071/Thumb
page


obviously  the hl parameters haven't been taken in account. I've hot the 
same problem with the facet.mincount parameter; facets works fine, but this 
parameter is not taken in account for some reason...


did i done something wrong ?

thanks,
kind regards,
p-y





From: "Thierry Collogne" <[EMAIL PROTECTED]>
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Snippets of indexed text
Date: Thu, 29 Mar 2007 08:56:51 +0200

It is possible. You need to pass highlighting parameters. Look here :

 http://wiki.apache.org/solr/HighlightingParameters

Hope this helps.



_
It’s tax season, make sure to follow these few simple tips 
http://articles.moneycentral.msn.com/Taxes/PreparationTips/PreparationTips.aspx?icid=HMMartagline




Re: Snippets of indexed text

2007-03-30 Thread Thierry Collogne

I can't see anything wrong. But perhaps you are looking at the wrong part of
the response. It is the same lake with facets.
You need to look further down in the xml reponse. Here I asked solr to
highlight the field "content" and I used a facer called type.

This is a sample of an xml response in our application





0
5

 10
 0

 true
 stamp AND site:3
 2.2
 content
 type
 on

 true




 col_36863_NL
 ALL
 
 HR





 
   1
 




 
   
 





If you look at the end you see the following for facets




 
   1
 




And this is the part for the highlighted text :



 
   
 



I hope this helps a bit. By the way, if you are using java, it may be good
to check out the java client here

  http://issues.apache.org/jira/browse/SOLR-20

There is a comment with some code that I added. This code can be added to
the java client to support highlighting.

If you need anymore help, just post it and I will try to help more.


On 30/03/07, Pierre-Yves LANDRON <[EMAIL PROTECTED]> wrote:


hello,

thanks for the info ; it's exactly what i need. i can't manage to make it
works, though. it's strange because i have the same problem with facets :
it
seems that some options are not taken in account...

for example, here is my request to solr:

q=%28%28titre:moulin%29+OR+%28texte:moulin%29+OR+%28sujet:moulin%29+OR+%28desc:moulin%29%29&version=
2.1&start=0&rows=12&fl=*+score&qt=standard&hl=true&hl.fl=texte
,desc&hl.snippets=3&hl.fragsize=150

and an extract of the response is :

0.0151801035
bml:8071

Les Grands Moulins
Le chemin de la Bouteille n'est pas, comme son nom semblerait l'indiquer,
le
chemin préféré des ivrognes. En l'occurrence, c'est plutôt le chemin des
Boulangers ou mieux encore (... cutted by me, in fact all the field is
returned)

http://10.208.0.215:8080/fedora/get/bml:8071/Thumb
page


obviously  the hl parameters haven't been taken in account. I've hot the
same problem with the facet.mincount parameter; facets works fine, but
this
parameter is not taken in account for some reason...

did i done something wrong ?

thanks,
kind regards,
p-y




>From: "Thierry Collogne" <[EMAIL PROTECTED]>
>Reply-To: solr-user@lucene.apache.org
>To: solr-user@lucene.apache.org
>Subject: Re: Snippets of indexed text
>Date: Thu, 29 Mar 2007 08:56:51 +0200
>
>It is possible. You need to pass highlighting parameters. Look here :
>
>  http://wiki.apache.org/solr/HighlightingParameters
>
>Hope this helps.
>

_
It's tax season, make sure to follow these few simple tips

http://articles.moneycentral.msn.com/Taxes/PreparationTips/PreparationTips.aspx?icid=HMMartagline




Re: storing results

2007-03-30 Thread Joan Codina




Thanks for your answers,

Yes, its true that with boolen queries things are much easier...
+(query1) +(query2) should do an and
or 
(query1) (query2) should do an or

and this does not need a special ability to parse the queries
I like the dismax approach, I think is interesting but then to
merge queries is a bit difficult, and I think for this option is better
to try the filter, but then there is no OR option, only , AND. And bq
(from what I understand from de docs) seems that also performs and AND

It's a pity I have to do it for a demo in a few days (and nights) and 
I don't think the resulting code will be nice enough to be shown.

Joan

Chris Hostetter wrote:

  : To do so I need to  store the results as a filter, with a given name, so
: the user can use it later on. But I need to store this in disk, as I can
: not trust on the cache or the web session.
: The user should  then indicate that the query that is doing now has a
: filter (a previous query) and this filter should be added to the query
: (this is allowed in solr,  i think) but as filter_ID, to be loaded to
: solve the query.

if you really want to be ableto refer to these previous searches using
some sort of identifier, and have them persist for an indefinite amount of
time, it's really out of Solr's hands -- if someone were to try and add a
fewture like this to Solr, how would it know which queries to remember and
generate names for? how long would it store each name? ... these are the
kidns of questsions that your app can understand more easily ... you could
concievable use Solr to store the name=>querystring mappings in little
custom solr docs if you wanted, but you have to decide when to create
those mappings and when to expire them.

in general though, all you really need to remember is hte query string,
remembering all of the results really isn't neccessary.  The next time
your user wants to "refine" his search -- wether it's 10 seconds latter or
10 days later -- just take the old query string and combine it with the
new query string.  how you combine it depends on how you want the scoring
to work, use an "fq" param if the old query shouldn't affect the score,
just define the super set of docs, use a BooleanQuery if you want all the
clauses from both searches to impact the score.

it's important to understand that trying to keep track of the actual
results would be very, very, very bad  (especially if you want remember
them for a really long time) because when the user comes back, the index
may have chagned, docs may have shifted ids, or been edited so they no
longer match the criteria, or have been deleted completely.



-Hoss

  


-- 

signUPF
Joan Codina
Filbà
Departament de Tecnologia
Universitat Pompeu Fabra
___

Abans d'imprimir aquest e-mail, pensa si realment
és necessari, i en
cas de que ho sigui, pensa que si ho fas a doble cara estalvies un 25%
del paper, els arbres t'ho agrairan.
___

La informació
d'aquest missatge electrònic és confidencial,
personal i intransferible
i només està dirigida a la/les
adreça/ces indicades a dalt. Si vostè
llegeix aquest missatge per equivocació, l'informem que
queda prohibida
la seva divulgació, ús o distribució,
completa o en part, i li preguem esborri el missatge original juntament
amb els seus fitxers annexos
sense llegir-lo ni gravar-lo. 
Gràcies. 





Re: Solr finding doc by one field but not by another

2007-03-30 Thread Theodan


Mike Klaas wrote:
> 
> This is almost certainly due to a mismatch between the index- and
> query-time analysis of the fields.  For instance, your schema defines
> the title field to be "string" (unanalyzed), but it is likely that
> some tokenization (perhaps via StandardAnalyzer) occurred in the
> original index.
> 

Yep, that was exactly the problem.  I changed all of my field types from
"string" to "text", and things still didn't work right when querying.  So I
asked the guy who created the Lucene index what analyzers he used, and he
had used the StandardAnalyzer, whereas my Solr configuration was using the
default advanced analyzer setup that Solr comes with in schema.xml.  So I
changed my schema.xml to use just StandardAnalyzer, and the searches now
seem to be returning expected results.

-Dan
-- 
View this message in context: 
http://www.nabble.com/Solr-finding-doc-by-one-field-but-not-by-another-tf3481287.html#a9761451
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question: how to config memory with SolrPerformanceFactor

2007-03-30 Thread Chris Hostetter

: if we mv index.tmp$$ index,,,is it truly deleted?

it's not treuly deleted until no running processes have the file handles
open anymore.

: if we notify solr open new searcher, solr just redirect to new index,,

that will cause Solr to close the existing filehandles and open new ones,
so then the files will be deleted.

: Do old index cost memory size and hard disc space?

if they are open, then yes they cost memory -- they cost disk until truely
deleted.

: if it is not cached in memory,, does it means we no warry about OutOfMemory
: when index file increase.

i dont' understand your question.

: if it cached in memory, how to limit it? use autowarmCount?

autowarmCount is a solr cache option .. solr caches are for very specific
things -- they have no control over how much memory LUcene uses for the
bulk of your index.

changing an autowarnCount actually has very little to do with the amount
of total memory Solr uses -- the *size* of each of your caches is a much
more significant factor in how much ram is used by the process.



-Hoss



Re: How can you perform a fuzzy search on a phrase without it turning into a word distance search?

2007-03-30 Thread Chris Hostetter

: Is it possible to do this without manually splitting up the title
: string I'm searching for into terms and then making a compound query
: with each of the terms as a fuzzy?

not out of the box ... Lucene has no native concept of a "fuzzy phrase
query" ... you would either need to implement one, or come up with a
custom QueryParser or Analyzer to do the bulk of the work.

writing that QueryParser night not be that hard, Lucene already has a
FuzzyTermEnum class for getting hte list of all Terms similar to a
specific term, and a MultiPhraseQuery for making phrase queries where
each position in the phrase can match any one of several terms you
specificy ... it's would just be a matter of subclassing the appropriate
QueryParser method for dealing with PhraseQueries, and taking each "raw"
term and running it through FuzzyTermEnum to get all the variations to put
in your MultiPhraseQuery.



-Hoss



Re: Auto index update

2007-03-30 Thread Chris Hostetter

The simplest approach is to poll your DB for changes in a cron, and
anytime you find some, format them send them to solr.

the specific details of the "best" way to keep and index up to date depend
largely on the nature of your data.

: Can anybody suggest me of what is the best method to implement auto index
: update on SOLR from mysql database.


-Hoss



Re: Auto index update

2007-03-30 Thread Ken Krugler

Can anybody suggest me of what is the best method to implement auto index
update on SOLR from mysql database.


As Hoss noted, there are a lot of different approaches.

You could search this email list archive for the discussion re using 
Solr with Compass (subject "Does Solr support integration with the 
Compass framework?").


Also Jira issue SOLR-20 talks about the steps Ryan is talking towards 
providing a Compass-like mechanism for keeping a Solr index in sync 
with a DB, via a HibernateEventListener.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"


mainIndex and indexDefaults

2007-03-30 Thread Koji Sekiguchi
Hello,

This is a trivial question, but I'm curious about mainIndex and
indexDefaults
in solrconfig.xml. They imply that I can have a second index other than
mainIndex.
But SolrConfig instantiates a mainIndex statically.
Are there any plan to improve around index setting in the future?

Thanks in advance,

Koji