Re: Restrict access to localhost

2010-12-03 Thread Tom

If you are using another app to create the index, I think you can remove the
update servlet mapping in the web.xml.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Restrict-access-to-localhost-tp2004475p2014129.html
Sent from the Solr - User mailing list archive at Nabble.com.


highlighting wiki confusion

2010-12-03 Thread Lance Norskog
http://wiki.apache.org/solr/HighlightingParameters?#hl.highlightMultiTerm

If the SpanScorer is also being used, enables highlighting for
range/wildcard/fuzzy/prefix queries. Default is false.  Solr1.4. This
parameter makes sense for Highlighter only.

I think this meant 'for PhraseHighlighter only'?


-- 
Lance Norskog
goks...@gmail.com


Re: score from two cores

2010-12-03 Thread Paul
On Fri, Dec 3, 2010 at 4:47 PM, Erick Erickson  wrote:
> But why do you have two cores in the first place? Is it really necessary or
> is it just
> making things more complex?

I don't know why the OP wants two cores, but I ran into this same
problem and had to abandon using a second core. My use case is: I have
lots of slowing-changing documents, and a few often-changing
documents. Those classes of documents are updated by different people
using different processes. I wanted to split them into separate cores
so that:

1) The large core wouldn't change except deliberately so there would
be less chance of a bug creeping in. Also, that core is the same on
different servers, so they could be replicated.

2) The small core would update and optimize quickly and the data in it
is different on different servers.

The problem is that the search results should return relevancy as if
there were only one core.


Re: score from two cores

2010-12-03 Thread Erick Erickson
The scores will not be comparable. Scores are only relevant within one
search
on one core, so comparing them across two queries (even if it's the same
query
but against two different cores) is meaningless.

So, given your setup I would just use the results from one of the cores and
fill in
data from the other...

But why do you have two cores in the first place? Is it really necessary or
is it just
making things more complex?

Best
Erick

On Fri, Dec 3, 2010 at 1:36 PM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> Please correct me if I am doing something wrong. I really appreciate your
> help!
>
> I have a core for metadata (xml files) and a core for pdf documents.
> Sometimes I need search them separately, sometimes I need search both of
> them together. There is the same key which is related them for each item.
>
> For example, the xml files look like following:
> 
> 
>
>rmaaac.pdf
>something
>rmaaac
>
>
>   .
> 
>
> I index rmaaac.pdf file with same Key and UI field in another core. Here is
> the example after I index rmaaac.pdf.
>  
>  
>  
>  0
>  3
>  
>  on
>  0
>  collectionid: RM
>  10
>  2.2
>  
>  
>  
>  
>rm
>rm.pdf
>something
>  
>  
>
> The result information which is display to user comes from metadata, not
> from pdf files. If I search a term from documents, in order to display
> search results to user, I have to get Keys from documents and then redo
> search from metadata. Then score is different.
>
> Please give me some suggestions!
>
> Thanks so much,
> Xiaohui
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, December 03, 2010 12:37 PM
> To: solr-user@lucene.apache.org
> Subject: Re: score from two cores
>
> Uhhm, what are you trying to do? What do you want to do with the scores
> from
> two cores?
>
> Best
> Erick
>
> On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
> xiao...@mail.nlm.nih.gov> wrote:
>
> > I have multiple cores. How can I deal with score?
> >
> > Thanks so much for help!
> > Xiaohui
> >
>


Re: Batch Update Fields

2010-12-03 Thread Erick Erickson
That will certainly work. Another option, assuming the country codes are
in their own field would be to put the transformations into a synonym file
that was only used on that field. That way you'd get this without having
to do the pre-process step of the raw data...

That said, if you pre-processing is working for you it may  not be worth
your while
to worry about doing it differently

Best
Erick

On Fri, Dec 3, 2010 at 12:51 PM, Adam Estrada  wrote:

> First off...I know enough about Solr to be VERY dangerous so please bare
> with me ;-) I am indexing the geonames database which only provides country
> codes. I can facet the codes but to the end user who may not know all 249
> codes, it isn't really all that helpful. Therefore, I want to map the full
> country names to the country codes provided in the geonames db.
> http://download.geonames.org/export/dump/
>
> I used a simple split function
> to
> chop the 850 meg txt file in to manageable csv's that I can import in to
> Solr. Now that all 7 million + documents are in there, I want to change the
> country codes to the actual country names. I would of liked to have done it
> in the index but finding and replacing the strings in the csv seems to be
> working fine. After that I can just reindex the entire thing.
>
> Adam
>
> On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson  >wrote:
>
> > Have you consider defining synonyms for your code <->country
> > conversion at index time (or query time for that matter)?
> >
> > We may have an XY problem here. Could you state the high-level
> > problem you're trying to solve? Maybe there's a better solution...
> >
> > Best
> > Erick
> >
> > On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada <
> > estrada.adam.gro...@gmail.com
> > > wrote:
> >
> > > I wonder...I know that sed would work to find and replace the terms in
> > all
> > > of the csv files that I am indexing but would it work to find and
> replace
> > > key terms in the index?
> > >
> > > find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {}
> \;
> > >
> > > That command would iterate through all the files in the data directory
> > and
> > > replace the country code with the full country name. I many just back
> up
> > > the
> > > directory and try it. I have it running on csv files right now and it's
> > > working wonderfully. For those of you interested, I am indexing the
> > entire
> > > Geonames dataset
> > http://download.geonames.org/export/dump/(allCountries.zip)
> > > which gives me a pretty comprehensive world gazetteer. My next step is
> > > gonna
> > > be to display the results as KML to view over a google globe.
> > >
> > > Thoughts?
> > >
> > > Adam
> > >
> > > On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson <
> erickerick...@gmail.com
> > > >wrote:
> > >
> > > > No, there's no equivalent to SQL update for all values in a column.
> > > You'll
> > > > have to reindex all the documents.
> > > >
> > > > On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada <
> > > > estrada.adam.gro...@gmail.com
> > > > > wrote:
> > > >
> > > > > OK part 2 of my previous question...
> > > > >
> > > > > Is there a way to batch update field values based on a certain
> > > criteria?
> > > > > For example, if thousands of documents have a field value of 'US'
> can
> > I
> > > > > update all of them to 'United States' programmatically?
> > > > >
> > > > > Adam
> > > >
> > >
> >
>


Re: boosting certain docs based on a filed value

2010-12-03 Thread Ahmet Arslan
> 
> thanks!! that worked..
> 
> Can i enter the sequence too like "postpaid,free,costly"?
> 

Does that mean you want to display first postpaid, after that free, and lastly 
costly? 

If thats you want, i think it is better to create a tint field using these 
types and then sort by this field.

pstpaid=300
free=200
costy=100   sort=newTintField desc, score desc

http://wiki.apache.org/solr/CommonQueryParameters#sort


  


Re: Highlighting parameters

2010-12-03 Thread Ahmet Arslan
> Is there a way I can specify separate
> configuration for 2 different fields.
> 
> For field 1 I wan to display only 100 chars, Field 2 200
> chars
> 

yes with the parameter accepts per-field overrides. the syntax is 
http://wiki.apache.org/solr/HighlightingParameters#HowToOverride

&f.TEXT.hl.maxAlternateFieldLength=80&f.CATEGORY.hl.maxAlternateFieldLength=100


  


Re: Highlighting parameters

2010-12-03 Thread Markus Jelsma
Yes


Some parameters may be overriden on a per-field basis with the following 
syntax:

  f..=

http://wiki.apache.org/solr/HighlightingParameters


> Is there a way I can specify separate configuration for 2 different fields.
> 
> For field 1 I wan to display only 100 chars, Field 2 200 chars


Re: boosting certain docs based on a filed value

2010-12-03 Thread abhayd

thanks!! that worked..

Can i enter the sequence too like "postpaid,free,costly"?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/boosting-certain-docs-based-on-a-filed-value-tp2012962p2013895.html
Sent from the Solr - User mailing list archive at Nabble.com.


Syncing 'delta-import' with 'select' query

2010-12-03 Thread Juan Manuel Alvarez
Hello everyone! I would like to ask you a question about DIH.

I am using a database and DIH to sync against Solr, and a GUI to
display and operate on the items retrieved from Solr.
When I change the state of an item through the GUI, the following happens:
a. The item is updated in the DB.
b. A delta-import command is fired to sync the DB with Solr.
c. The GUI is refreshed by making a query to Solr.

My problem comes between (b) and (c). The delta-import operation is
executed in a new thread, so my call returns immediately, refreshing
the GUI before the Solr index is updated causing the item state in the
GUI to be outdated.

I had two ideas so far:
1. Querying the status of the DIH after the delta-import operation and
do not return until it is "idle". The problem I see with this is that
if other users execute delta-imports, the status will be "busy" until
all operations are finished.
2. Use Zoie. The first problem is that configuring it is not as
straightforward as it seems, so I don't want to spend more time trying
it until I am sure that this will solve my issue. On the other hand, I
think that I may suffer the same problem since the delta-import is
still firing in another thread, so I can't be sure it will be called
fast enough.

Am I pointing on the right direction or is there another way to
achieve my goal?

Thanks in advance!
Juan M.


Highlighting parameters

2010-12-03 Thread Mark

Is there a way I can specify separate configuration for 2 different fields.

For field 1 I wan to display only 100 chars, Field 2 200 chars




Re: Negative fl param

2010-12-03 Thread Mark
Ok simple enough. I just created a SearchComponent that removes values 
from the fl param.


On 12/3/10 9:32 AM, Ahmet Arslan wrote:

When returning results is there a way
I can say to return all fields except a certain one?

So say I have stored fields foo, bar and baz but I only
want to return foo and bar. Is it possible to do this
without specifically listing out the fields I do want?


There were a similar discussion. http://search-lucene.com/m/2qJaU1wImo3/

A workaround can be getting all stored field names from 
http://wiki.apache.org/solr/LukeRequestHandler and construct fl accordingly.





RE: score from two cores

2010-12-03 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Please correct me if I am doing something wrong. I really appreciate your help!

I have a core for metadata (xml files) and a core for pdf documents. Sometimes 
I need search them separately, sometimes I need search both of them together. 
There is the same key which is related them for each item.

For example, the xml files look like following:


  
rmaaac.pdf
something
rmaaac


   .


I index rmaaac.pdf file with same Key and UI field in another core. Here is the 
example after I index rmaaac.pdf.
   
  
  
  0 
  3 
  
  on 
  0 
  collectionid: RM 
  10 
  2.2 
  
  
  
  
rm 
rm.pdf  
something
  
  

The result information which is display to user comes from metadata, not from 
pdf files. If I search a term from documents, in order to display search 
results to user, I have to get Keys from documents and then redo search from 
metadata. Then score is different.

Please give me some suggestions!

Thanks so much,
Xiaohui 

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, December 03, 2010 12:37 PM
To: solr-user@lucene.apache.org
Subject: Re: score from two cores

Uhhm, what are you trying to do? What do you want to do with the scores from
two cores?

Best
Erick

On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> I have multiple cores. How can I deal with score?
>
> Thanks so much for help!
> Xiaohui
>


Re: dataimports response returns before done?

2010-12-03 Thread Ahmet Arslan

--- On Fri, 12/3/10, Tri Nguyen  wrote:

> From: Tri Nguyen 
> Subject: dataimports response returns before done?
> To: "solr user" 
> Date: Friday, December 3, 2010, 7:55 PM
> Hi,
>  
> After issueing a dataimport, I've noticed solr returns a
> response prior to finishing the import. Is this correct?  
> Is there anyway i can make solr not return until it
> finishes?
>  
> If not, how do I ping for the status whether it finished or
> not?
>  

So you want to do something at the end of the import?
http://wiki.apache.org/solr/DataImportHandler#EventListeners may help.

Also you can always poll solr/dataimport url and check status (busy,idle)





Question about Solr Fieldtypes, Chaining of Tokenizers

2010-12-03 Thread Matthew Hall
Hey folks, I'm working with a fairly specific set of requirements for 
our corpus that needs a somewhat tricky text type for both indexing and 
searching.


The chain currently looks like this:





protected="protwords.txt"/>




Now you will notice that I'm trying to add in a second tokenizer to this 
chain at the very end, this is due to the final replacement of 
punctuation to whitespace.  At that point I'd like to further break up 
these tokens to smaller tokens.


The reason for this is that we have a mixed normal english word and 
scientific corpus.  For example you could expect string like "The 
symposium of Tg(RX3fg+and) gene studies" being added to the index, 
and parts of those phrases being searched on.


We want to be able to remove the stopwords in the mostly english parts 
of these types of statements, which the whitespace tokenizer, followed 
by removing trailing punctuation,  followed by the stopfilter takes care 
of.  We do not want to remove references to genetic information 
contained in allele symbols and the like.


Sadly as far as I can tell, you cannot chain tokenizers in the 
schema.xml, so does anyone have some suggestions on how this could be 
accomplished?


Oh, and let me add that the WordDelimiterFilter comes really close to 
what I want, but since we are unwilling to promote our solr version to 
the trunk (we are on the 1.4x) version atm, the inability to turn off 
the automatic phrase queries makes it a no go.  We need to be able to 
make searches on "left/right" match "right/left."


My searches through the old material on this subject isn't really 
showing me much except some advice on using the copyField attribute.  
But my understanding is that this will simply take your original input 
to the field, and then analyze it in two different ways depending on the 
field definitions.  It would be very nice if it were copying the already 
analyzed version of the text... but that's not what its doing, right?


Thanks for any advice on this matter.

Matt




dataimports response returns before done?

2010-12-03 Thread Tri Nguyen
Hi,
 
After issueing a dataimport, I've noticed solr returns a response prior to 
finishing the import. Is this correct?   Is there anyway i can make solr not 
return until it finishes?
 
If not, how do I ping for the status whether it finished or not?
 
thanks,
 
tri

Re: Batch Update Fields

2010-12-03 Thread Adam Estrada
First off...I know enough about Solr to be VERY dangerous so please bare
with me ;-) I am indexing the geonames database which only provides country
codes. I can facet the codes but to the end user who may not know all 249
codes, it isn't really all that helpful. Therefore, I want to map the full
country names to the country codes provided in the geonames db.
http://download.geonames.org/export/dump/

I used a simple split function to
chop the 850 meg txt file in to manageable csv's that I can import in to
Solr. Now that all 7 million + documents are in there, I want to change the
country codes to the actual country names. I would of liked to have done it
in the index but finding and replacing the strings in the csv seems to be
working fine. After that I can just reindex the entire thing.

Adam

On Fri, Dec 3, 2010 at 12:42 PM, Erick Erickson wrote:

> Have you consider defining synonyms for your code <->country
> conversion at index time (or query time for that matter)?
>
> We may have an XY problem here. Could you state the high-level
> problem you're trying to solve? Maybe there's a better solution...
>
> Best
> Erick
>
> On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada <
> estrada.adam.gro...@gmail.com
> > wrote:
>
> > I wonder...I know that sed would work to find and replace the terms in
> all
> > of the csv files that I am indexing but would it work to find and replace
> > key terms in the index?
> >
> > find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
> >
> > That command would iterate through all the files in the data directory
> and
> > replace the country code with the full country name. I many just back up
> > the
> > directory and try it. I have it running on csv files right now and it's
> > working wonderfully. For those of you interested, I am indexing the
> entire
> > Geonames dataset
> http://download.geonames.org/export/dump/(allCountries.zip)
> > which gives me a pretty comprehensive world gazetteer. My next step is
> > gonna
> > be to display the results as KML to view over a google globe.
> >
> > Thoughts?
> >
> > Adam
> >
> > On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson  > >wrote:
> >
> > > No, there's no equivalent to SQL update for all values in a column.
> > You'll
> > > have to reindex all the documents.
> > >
> > > On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada <
> > > estrada.adam.gro...@gmail.com
> > > > wrote:
> > >
> > > > OK part 2 of my previous question...
> > > >
> > > > Is there a way to batch update field values based on a certain
> > criteria?
> > > > For example, if thousands of documents have a field value of 'US' can
> I
> > > > update all of them to 'United States' programmatically?
> > > >
> > > > Adam
> > >
> >
>


can solrj swap cores?

2010-12-03 Thread Will Milspec
hi all,

Does solrj support "swapping cores"?

One of our developers had initially tried swapping solr cores (e.g. core0
and core1) using the solrj api, but it failed. (don't have the exact error)
He susequently replaced the call with straight http (i.e. http client).

Unfortunately I don't have the exact error in front of me...

Solrj code:

   CoreAdminRequest car = new CoreAdminRequest();
   car.setCoreName("production");
   car.setOtherCoreName("reindex");
   car.setAction(CoreAdminParams.CoreAdminAction.SWAP);

  SolrServer solrServer = SolrUtil.getSolrServer();
  car.process(solrServer);
  solrServer.commit();

Finally, can someone comment on the solrj javadoc on CoreAdminRequest:
 * This class is experimental and subject to change.

thanks,

will


Re: Batch Update Fields

2010-12-03 Thread Erick Erickson
Have you consider defining synonyms for your code <->country
conversion at index time (or query time for that matter)?

We may have an XY problem here. Could you state the high-level
problem you're trying to solve? Maybe there's a better solution...

Best
Erick

On Fri, Dec 3, 2010 at 12:20 PM, Adam Estrada  wrote:

> I wonder...I know that sed would work to find and replace the terms in all
> of the csv files that I am indexing but would it work to find and replace
> key terms in the index?
>
> find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
>
> That command would iterate through all the files in the data directory and
> replace the country code with the full country name. I many just back up
> the
> directory and try it. I have it running on csv files right now and it's
> working wonderfully. For those of you interested, I am indexing the entire
> Geonames dataset http://download.geonames.org/export/dump/(allCountries.zip)
> which gives me a pretty comprehensive world gazetteer. My next step is
> gonna
> be to display the results as KML to view over a google globe.
>
> Thoughts?
>
> Adam
>
> On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson  >wrote:
>
> > No, there's no equivalent to SQL update for all values in a column.
> You'll
> > have to reindex all the documents.
> >
> > On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada <
> > estrada.adam.gro...@gmail.com
> > > wrote:
> >
> > > OK part 2 of my previous question...
> > >
> > > Is there a way to batch update field values based on a certain
> criteria?
> > > For example, if thousands of documents have a field value of 'US' can I
> > > update all of them to 'United States' programmatically?
> > >
> > > Adam
> >
>


Re: boosting certain docs based on a filed value

2010-12-03 Thread Ahmet Arslan
> I was looking to boost certain docs based on some values in
> a indexed field.
> 
> e.g.
> pType
> -
> post paid
> go phone
> 
> Would like to have post paid docs first and then go phone.
> I checked the functional query but could not figure out.

You can use 
http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29 if you are 
using dismax. 

bq=pType:"post paid"^100

If you are using default query parser then you can append this optional clause 
to your query q = some other query pType:"post paid"^100


  


Re: score from two cores

2010-12-03 Thread Erick Erickson
Uhhm, what are you trying to do? What do you want to do with the scores from
two cores?

Best
Erick

On Fri, Dec 3, 2010 at 11:21 AM, Ma, Xiaohui (NIH/NLM/LHC) [C] <
xiao...@mail.nlm.nih.gov> wrote:

> I have multiple cores. How can I deal with score?
>
> Thanks so much for help!
> Xiaohui
>


Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread Erick Erickson
Arrrgh, Geert-Jan is right, that't the 15th time at least this has tripped
me up.

I'm pretty sure that text will work if you escape the space, e.g.
city:(den\ haag). The debug output is a little confusing since it has a line
like
city:den haag

which almost looks wrong... but it worked
out OK on a couple of queries I tried.

Geert-Jan is also right in that filters aren't applied to string types
so there's two possibilities, either handle the casing on the client
side as he suggests and use string or make the text type work.


Sorry for the confusion
Erick

On Fri, Dec 3, 2010 at 11:54 AM, Geert-Jan Brits  wrote:

> when you went from strField to TextField in your config you enabled
> tokenizing (which I believe splits on spaces by default),
> which is why you see seperate 'words' / terms in the
> debugQuery-explanation.
>
> I believe you want to keep your old strField config and try quoting:
>
> fq=city:"den+haag" or fq=city:"den haag"
>
> Concerning the lower-casing: wouldn't if be easiest to do that at the
> client? (I'm not sure at the moment how to do lowercasing with a strField)
> .
>
> Geert-jan
>
>
> 2010/12/3 PeterKerk 
>
> >
> >
> > You are right, this is what I see when I append the debug query (very
> very
> > useful btw!!!) in old situation:
> > 
> >city:den title:haag
> >PhraseQuery(themes:"hotel en restaur")
> > 
> >
> >
> >
> > I then changed the schema.xml to:
> >
> >  > omitNorms="true">
> > 
> >
> >
> > 
> > 
> >
> >  
> >
> >
> > I then tried adding parentheses:
> >
> >
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:(den+haag)&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city
> > also tried (without +):
> > http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:(den
> > haag)&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city
> >
> > Then I get:
> >
> > 
> >city:den city:haag
> > 
> >
> > And still 0 results
> >
> > But as you can see the query is split up into 2 separate words, I dont
> > think
> > that is what I need?
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>


Re: Negative fl param

2010-12-03 Thread Ahmet Arslan
> When returning results is there a way
> I can say to return all fields except a certain one?
> 
> So say I have stored fields foo, bar and baz but I only
> want to return foo and bar. Is it possible to do this
> without specifically listing out the fields I do want?


There were a similar discussion. http://search-lucene.com/m/2qJaU1wImo3/

A workaround can be getting all stored field names from 
http://wiki.apache.org/solr/LukeRequestHandler and construct fl accordingly.


  


Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma


On Friday 03 December 2010 18:20:44 Adam Estrada wrote:
> I wonder...I know that sed would work to find and replace the terms in all
> of the csv files that I am indexing but would it work to find and replace
> key terms in the index?

It'll most likely corrupt your index. Offsets, positions etc won't have the 
proper meaning anymore.

> find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;
> 
> That command would iterate through all the files in the data directory and
> replace the country code with the full country name. I many just back up
> the directory and try it. I have it running on csv files right now and
> it's working wonderfully. For those of you interested, I am indexing the
> entire Geonames dataset http://download.geonames.org/export/dump/
> (allCountries.zip) which gives me a pretty comprehensive world gazetteer.
> My next step is gonna be to display the results as KML to view over a
> google globe.
> 
> Thoughts?
> 
> Adam
> 
> On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson 
wrote:
> > No, there's no equivalent to SQL update for all values in a column.
> > You'll have to reindex all the documents.
> > 
> > On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada <
> > estrada.adam.gro...@gmail.com
> > 
> > > wrote:
> > > 
> > > OK part 2 of my previous question...
> > > 
> > > Is there a way to batch update field values based on a certain
> > > criteria? For example, if thousands of documents have a field value of
> > > 'US' can I update all of them to 'United States' programmatically?
> > > 
> > > Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


nexus of synonyms and stemming, take 2

2010-12-03 Thread Will Milspec
hi all,

[This is a second attempt at emailing. The apache mailing list spam filter
apparently did not like my synonyms entry, ie.. classified my email as spam.
I have replaced phone with 'foo' , 'cell' with 'sell' and 'mobile' with
'nubile' ]

This is a fairly basic synonyms question: how does synonyms handle stemming?


Example: Synonyms.txt has entry:
  sell,sell foo,nubile,nubile foo,wireless foo

If I want to match on 'sell foos'...

a) do I need to add an entry for 'sell foos' (i.e. in addition to sell foo)
b) or will the stemmer (porter/snowball) handle this already


thanks

will


Re: Batch Update Fields

2010-12-03 Thread Adam Estrada
I wonder...I know that sed would work to find and replace the terms in all
of the csv files that I am indexing but would it work to find and replace
key terms in the index?

find C:\\tmp\\index\\data -type f -exec sed -i 's/AF/AFGHANISTAN/g' {} \;

That command would iterate through all the files in the data directory and
replace the country code with the full country name. I many just back up the
directory and try it. I have it running on csv files right now and it's
working wonderfully. For those of you interested, I am indexing the entire
Geonames dataset http://download.geonames.org/export/dump/ (allCountries.zip)
which gives me a pretty comprehensive world gazetteer. My next step is gonna
be to display the results as KML to view over a google globe.

Thoughts?

Adam

On Fri, Dec 3, 2010 at 7:57 AM, Erick Erickson wrote:

> No, there's no equivalent to SQL update for all values in a column. You'll
> have to reindex all the documents.
>
> On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada <
> estrada.adam.gro...@gmail.com
> > wrote:
>
> > OK part 2 of my previous question...
> >
> > Is there a way to batch update field values based on a certain criteria?
> > For example, if thousands of documents have a field value of 'US' can I
> > update all of them to 'United States' programmatically?
> >
> > Adam
>


Re: Limit number of characters returned

2010-12-03 Thread Ahmet Arslan
> Couldn't I just use the highlighter and configure it to use
> the 
> alternative field to return the first 200 characters? 
> In cases where 
> there is a highlighter match I would prefer to show the
> excerpts anyway.
> 
> http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField
> http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength
> 
> Is this something wrong with this method?

No, you can do that. It is perfectly fine.





boosting certain docs based on a filed value

2010-12-03 Thread abhayd

hi 

I was looking to boost certain docs based on some values in a indexed field.

e.g.
pType
-
post paid
go phone

Would like to have post paid docs first and then go phone.
I checked the functional query but could not figure out.

Any help?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/boosting-certain-docs-based-on-a-filed-value-tp2012962p2012962.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 1.4 suggester component

2010-12-03 Thread abhayd

thanks ..

i used
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

with fuzzy operator..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-1-4-suggester-component-tp1766915p2012946.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread Geert-Jan Brits
when you went from strField to TextField in your config you enabled
tokenizing (which I believe splits on spaces by default),
which is why you see seperate 'words' / terms in the debugQuery-explanation.

I believe you want to keep your old strField config and try quoting:

fq=city:"den+haag" or fq=city:"den haag"

Concerning the lower-casing: wouldn't if be easiest to do that at the
client? (I'm not sure at the moment how to do lowercasing with a strField)
.

Geert-jan


2010/12/3 PeterKerk 

>
>
> You are right, this is what I see when I append the debug query (very very
> useful btw!!!) in old situation:
> 
>city:den title:haag
>PhraseQuery(themes:"hotel en restaur")
> 
>
>
>
> I then changed the schema.xml to:
>
>  omitNorms="true">
> 
>
>
> 
> 
>
>  
>
>
> I then tried adding parentheses:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:(den+haag)&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city
> also tried (without +):
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:(den
> haag)&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city
>
> Then I get:
>
> 
>city:den city:haag
> 
>
> And still 0 results
>
> But as you can see the query is split up into 2 separate words, I dont
> think
> that is what I need?
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: spellchecker results not as desired

2010-12-03 Thread abhayd

Thanks,
I was able to fix this issue with combination of EdgeNGrams and fuzzy query.

here are details 
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

I just added fuzzyquery operator and seems to be working so far
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/spellchecker-results-not-as-desired-tp1789192p2012887.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread PeterKerk


You are right, this is what I see when I append the debug query (very very
useful btw!!!) in old situation:

city:den title:haag
PhraseQuery(themes:"hotel en restaur")




I then changed the schema.xml to:

 
 
 
 
 
 

 


I then tried adding parentheses:
http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:(den+haag)&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city
also tried (without +):
http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:(den
haag)&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city

Then I get:


city:den city:haag


And still 0 results

But as you can see the query is split up into 2 separate words, I dont think
that is what I need?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: finding exact case insensitive matches on single and multiword values

2010-12-03 Thread Erick Erickson
The root of your problem, I think, is fq=city:den+haag which parses into
city:den +defaultfield:haag

Try parens, i.e. city:(den haag).

Attaching &debugQuery=on is often a way to see thing like this quickly

Also, if you haven't seen the analysis page from the admin page, it's really
valuable
for figuring out the effects of analyzers. You can probably do something
like:








to get what you want.

Best
Erick

On Fri, Dec 3, 2010 at 10:46 AM, PeterKerk  wrote:

>
>
> Users call this URL on my site:
> /?search=1&city=den+haag
> or even /?search=1&city=Den+Haag (casing of ctyname can be anything)
>
>
> Under water I call Solr:
>
> http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:den+haag&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city
>
>
> but this returns 0 results, even though I KNOW there are exactly 54 records
> that have an exact match on "den haag" (in this case even with lower casing
> in DB).
>
> citynames are stored with various casings in DB, so when searching with
> solr, the search must ignore casing.
>
>
> my schema.xml
>
>  omitNorms="true" />
> 
>
>
> To check what was going on, I opened my analysis.jsp,
>
> for field  I provide: "city"
> for Field value (Index)  I provide: "den haag"
> When I analyze this I get:
> "den haag"
>
> So that seems correct to me. Why is it that no results are returned?
>
> My requirements summarized:
> - I want to search independant of case on cityname:
>when user searches on "DEn HaAG" he will get the records that have
> value
> "Den Haag", but also records that have "den haag" etc.
> - citynames may consists of multiple words but only an exact match is
> valid,
> so when user searches for "den", he will not find "den haag" records. And
> when searched on "den haag" it will only return match on that and not other
> cities like "den bosch".
>
> How can I achieve this?
>
> I think I need a new fieldtype  in my schema.xml, but am not sure which
> tokenizers and analyzers I need, here's what I tried:
>
>  positionIncrementGap="100" >
>  
>
> ignoreCase="true" expand="false"/>
> words="stopwords_dutch.txt" />
>
>
>  
> 
>
>
> Help is really appreciated!
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012207.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Limit number of characters returned

2010-12-03 Thread Mark

Thanks for the response.

Couldn't I just use the highlighter and configure it to use the 
alternative field to return the first 200 characters?  In cases where 
there is a highlighter match I would prefer to show the excerpts anyway.


http://wiki.apache.org/solr/HighlightingParameters#hl.alternateField
http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength

Is this something wrong with this method?

On 12/3/10 8:03 AM, Erick Erickson wrote:

Yep, you're correct. CopyField is probably your simplest option here as
Ahmet suggested.
A more complex solution would be your own response writer, but unless and
until you
index gets cumbersome, I'd avoid that. Plus, storing the copied contents
only shouldn't
impact search much, since this doesn't add any terms...

Best
Erick

On Fri, Dec 3, 2010 at 10:32 AM, Mark  wrote:


Correct me if I am wrong but I would like to return highlighted excerpts
from the document so I would still need to index and store the whole
document right (ie.. highlighting only works on stored fields)?


On 12/3/10 3:51 AM, Ahmet Arslan wrote:


--- On Fri, 12/3/10, Mark   wrote:

  From: Mark

Subject: Limit number of characters returned
To: solr-user@lucene.apache.org
Date: Friday, December 3, 2010, 5:39 AM
Is there way to limit the number of
characters returned from a stored field?

For example:

Say I have a document (~2K words) and I search for a word
that's somewhere in the middle. I would like the document to
match the search query but the stored field should only
return the first 200 characters of the document. Is there
anyway to accomplish this that doesn't involve two fields?


I don't think it is possible out-of-the-box. May be you can hack
highlighter to return that first 200 characters in highlighting response.
Or a custom response writer can do that.

But if you will be always returning first 200 characters of documents, I
think creating additional field with indexed="false" stored="true" will be
more efficient. And you can make your original field indexed="true"
stored="false", your index size will be diminished.








Re: Limit number of characters returned

2010-12-03 Thread Erick Erickson
Yep, you're correct. CopyField is probably your simplest option here as
Ahmet suggested.
A more complex solution would be your own response writer, but unless and
until you
index gets cumbersome, I'd avoid that. Plus, storing the copied contents
only shouldn't
impact search much, since this doesn't add any terms...

Best
Erick

On Fri, Dec 3, 2010 at 10:32 AM, Mark  wrote:

> Correct me if I am wrong but I would like to return highlighted excerpts
> from the document so I would still need to index and store the whole
> document right (ie.. highlighting only works on stored fields)?
>
>
> On 12/3/10 3:51 AM, Ahmet Arslan wrote:
>
>>
>> --- On Fri, 12/3/10, Mark  wrote:
>>
>>  From: Mark
>>> Subject: Limit number of characters returned
>>> To: solr-user@lucene.apache.org
>>> Date: Friday, December 3, 2010, 5:39 AM
>>> Is there way to limit the number of
>>> characters returned from a stored field?
>>>
>>> For example:
>>>
>>> Say I have a document (~2K words) and I search for a word
>>> that's somewhere in the middle. I would like the document to
>>> match the search query but the stored field should only
>>> return the first 200 characters of the document. Is there
>>> anyway to accomplish this that doesn't involve two fields?
>>>
>> I don't think it is possible out-of-the-box. May be you can hack
>> highlighter to return that first 200 characters in highlighting response.
>> Or a custom response writer can do that.
>>
>> But if you will be always returning first 200 characters of documents, I
>> think creating additional field with indexed="false" stored="true" will be
>> more efficient. And you can make your original field indexed="true"
>> stored="false", your index size will be diminished.
>>
>> 
>>
>>
>>
>>


Negative fl param

2010-12-03 Thread Mark
When returning results is there a way I can say to return all fields 
except a certain one?


So say I have stored fields foo, bar and baz but I only want to return 
foo and bar. Is it possible to do this without specifically listing out 
the fields I do want?


finding exact case insensitive matches on single and multiword values

2010-12-03 Thread PeterKerk


Users call this URL on my site:
/?search=1&city=den+haag
or even /?search=1&city=Den+Haag (casing of ctyname can be anything)


Under water I call Solr:
http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=city:den+haag&q=*:*&start=0&rows=25&fl=id,title,friendlyurl,city&facet.field=city


but this returns 0 results, even though I KNOW there are exactly 54 records
that have an exact match on "den haag" (in this case even with lower casing
in DB).

citynames are stored with various casings in DB, so when searching with
solr, the search must ignore casing.


my schema.xml





To check what was going on, I opened my analysis.jsp, 

for field  I provide: "city"
for Field value (Index)  I provide: "den haag"
When I analyze this I get:
"den haag"

So that seems correct to me. Why is it that no results are returned?

My requirements summarized:
- I want to search independant of case on cityname:
when user searches on "DEn HaAG" he will get the records that have value
"Den Haag", but also records that have "den haag" etc.
- citynames may consists of multiple words but only an exact match is valid,
so when user searches for "den", he will not find "den haag" records. And
when searched on "den haag" it will only return match on that and not other
cities like "den bosch".

How can I achieve this?

I think I need a new fieldtype  in my schema.xml, but am not sure which
tokenizers and analyzers I need, here's what I tried:


  





  



Help is really appreciated!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/finding-exact-case-insensitive-matches-on-single-and-multiword-values-tp2012207p2012207.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Limit number of characters returned

2010-12-03 Thread Mark
Correct me if I am wrong but I would like to return highlighted excerpts 
from the document so I would still need to index and store the whole 
document right (ie.. highlighting only works on stored fields)?


On 12/3/10 3:51 AM, Ahmet Arslan wrote:


--- On Fri, 12/3/10, Mark  wrote:


From: Mark
Subject: Limit number of characters returned
To: solr-user@lucene.apache.org
Date: Friday, December 3, 2010, 5:39 AM
Is there way to limit the number of
characters returned from a stored field?

For example:

Say I have a document (~2K words) and I search for a word
that's somewhere in the middle. I would like the document to
match the search query but the stored field should only
return the first 200 characters of the document. Is there
anyway to accomplish this that doesn't involve two fields?

I don't think it is possible out-of-the-box. May be you can hack highlighter to 
return that first 200 characters in highlighting response.
Or a custom response writer can do that.

But if you will be always returning first 200 characters of documents, I think creating additional field with 
indexed="false" stored="true" will be more efficient. And you can make your original field 
indexed="true" stored="false", your index size will be diminished.







Re: Problem with dismax mm

2010-12-03 Thread Em

Thank you both!

Erick,

what you said was absolutely correct.
I missunderstood the definition completely.

Now it works as intended.

Thank you!

Kind regards


Erick Erickson wrote:
> 
> from:
> http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
> "If there are less than 3 optional clauses, they all must match; for 3 to
> 5
> clauses, one less than the number of clauses must match, for 6 or more
> clauses, 80% must match, rounded down: "2<-1 5<80%""
> 
> Personally, the mm parameter makes my head hurt.
> As I read it, there are actually 4 buckets that rules apply to, not three
> in your mm definition, see below.
> 
> 
> Your mm param says, I think, that
> 
> clauses  number  rule
>  required
> 1   1 We haven't gotten to a rule yet, this is the
> default
> 2   2 We haven't gotten to a rule yet, this is the
> default
> 
> 3   2 2<-1
> 4   3 2<-1
> 
> 5   2 4<50% rounded down
> 
> 6   3 5<66% (6 * 0.66 = 3.96)
> 7   4 5<66% rounded down
> 
> Personally, I think the percentages are mind warping and lead to
> "interesting" behavior. I prefer to explicitly list the number of causes
> required or relatively constant numbers of required clauses, something
> like "between 3 and 5, one less. 6 to 9 two less" etc. you don't get
> weird steps like between 4 and 5 above. Plus, by the time you get to,
> say, 7 clauses nobody can keep track of what correct behavior is anyway
> .
> 
> So I think you're off by one position when applying your rules. Or the
> Wiki
> page is misleading. Or the Wiki page is exactly correct and I'm
> mis-reading
> it.
> Like I said, mm makes my head hurt.
> 
> Best
> Erick
> 
> 
> On Fri, Dec 3, 2010 at 8:18 AM, Em  wrote:
> 
>>
>> Hi list,
>>
>> I got a little problem with my mm definition:
>>
>> 2<-1 4<50% 5<66%
>>
>> Here is what it *should* mean:
>>
>> If there are 2 clauses, at least one has to match.
>> If there are more than 2 clauses, at least 50% should match (both rules
>> seem
>> to mean the same, don't they?).
>> And if there are 5 or more than 5 claues, at least 66% should match.
>>
>> In case of 5 clauses, 3 should match, in case of 6 at least 4 should
>> match
>> and so on.
>>
>> However in some test-case I get only the intended behaviour with a
>> 2-clause-query when I say mm=1.
>> If I got longer queries this would lead to very bad
>> search-quality-results.
>>
>> What is wrong with this mm-definition?
>>
>> Thanks for suggestions.
>> - Em
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2012079.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with dismax mm

2010-12-03 Thread Erick Erickson
from:
http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
"If there are less than 3 optional clauses, they all must match; for 3 to 5
clauses, one less than the number of clauses must match, for 6 or more
clauses, 80% must match, rounded down: "2<-1 5<80%""

Personally, the mm parameter makes my head hurt.
As I read it, there are actually 4 buckets that rules apply to, not three
in your mm definition, see below.


Your mm param says, I think, that

clauses  number  rule
 required
1   1 We haven't gotten to a rule yet, this is the
default
2   2 We haven't gotten to a rule yet, this is the
default

3   2 2<-1
4   3 2<-1

5   2 4<50% rounded down

6   3 5<66% (6 * 0.66 = 3.96)
7   4 5<66% rounded down

Personally, I think the percentages are mind warping and lead to
"interesting" behavior. I prefer to explicitly list the number of causes
required or relatively constant numbers of required clauses, something
like "between 3 and 5, one less. 6 to 9 two less" etc. you don't get
weird steps like between 4 and 5 above. Plus, by the time you get to,
say, 7 clauses nobody can keep track of what correct behavior is anyway .

So I think you're off by one position when applying your rules. Or the Wiki
page is misleading. Or the Wiki page is exactly correct and I'm mis-reading
it.
Like I said, mm makes my head hurt.

Best
Erick


On Fri, Dec 3, 2010 at 8:18 AM, Em  wrote:

>
> Hi list,
>
> I got a little problem with my mm definition:
>
> 2<-1 4<50% 5<66%
>
> Here is what it *should* mean:
>
> If there are 2 clauses, at least one has to match.
> If there are more than 2 clauses, at least 50% should match (both rules
> seem
> to mean the same, don't they?).
> And if there are 5 or more than 5 claues, at least 66% should match.
>
> In case of 5 clauses, 3 should match, in case of 6 at least 4 should match
> and so on.
>
> However in some test-case I get only the intended behaviour with a
> 2-clause-query when I say mm=1.
> If I got longer queries this would lead to very bad search-quality-results.
>
> What is wrong with this mm-definition?
>
> Thanks for suggestions.
> - Em
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Problem with dismax mm

2010-12-03 Thread Shawn Heisey

On 12/3/2010 6:18 AM, Em wrote:

I got a little problem with my mm definition:

2<-1 4<50% 5<66%


Are you defining this in a request handler in solrconfig.xml?  If you 
have it entered just like that, I think it may not be understanding it.  
You need to encode the < character.  Here's an excerpt from my dismax 
handler:


2<-1 4<-50%

If that's not the problem, then I am not sure what it is, and the 
experts will need more information - version, query URL, configs.


Shawn



Problem with dismax mm

2010-12-03 Thread Em

Hi list,

I got a little problem with my mm definition:

2<-1 4<50% 5<66%

Here is what it *should* mean:

If there are 2 clauses, at least one has to match.
If there are more than 2 clauses, at least 50% should match (both rules seem
to mean the same, don't they?). 
And if there are 5 or more than 5 claues, at least 66% should match.

In case of 5 clauses, 3 should match, in case of 6 at least 4 should match
and so on.

However in some test-case I get only the intended behaviour with a
2-clause-query when I say mm=1.
If I got longer queries this would lead to very bad search-quality-results.

What is wrong with this mm-definition?

Thanks for suggestions.
- Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-dismax-mm-tp2011496p2011496.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Batch Update Fields

2010-12-03 Thread Erick Erickson
No, there's no equivalent to SQL update for all values in a column. You'll
have to reindex all the documents.

On Thu, Dec 2, 2010 at 10:52 PM, Adam Estrada  wrote:

> OK part 2 of my previous question...
>
> Is there a way to batch update field values based on a certain criteria?
> For example, if thousands of documents have a field value of 'US' can I
> update all of them to 'United States' programmatically?
>
> Adam


Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Robert Muir
On Fri, Dec 3, 2010 at 7:49 AM, Tanguy Moal  wrote:
> Thank you very much Robert for replying that fast and accurately.
>
> I have effectively an other idea in mind to provide similar
> suggestions less expansively, I was balancing between the work around
> and the report issue options.
>
> I don't regret it since you came with a possible fix. I'll give it a
> try as soon as possible, and let the list know.

I'm afraid the patch is only a hack for the case where you have more
than 1 * sequentially (e.g. foobar).
It doesn't fix the more general problem, which is that WildcardQuery
itself uses an inefficient algorithm: this more general problem is
only fixed in lucene/solr trunk.

If you really need these queries i definitely suggest at least trying
trunk, because you should get much better performance.

But it sounds like you might already have an idea to avoid using these
queries so this is of course the best.


Facet same field with different preifx

2010-12-03 Thread Eric Grobler
Hi Everyone,

Can I facet the same field twice with a different prefix as per example
below?

facet.field=myfield
f.myfield.facet.prefix=*make*
f.myfield.facet.sort=count

facet.field=myfield
f.myfield.facet.prefix=*model*
f.myfield.facet.sort=count


Thanks and Regards
Ericz


Re: Joining Fields in and Index

2010-12-03 Thread Jan Høydahl / Cominvent
Hi,

I made a MappingUpdateRequestHandler which lets you map country codes to full 
country names with a config file. See 
https://issues.apache.org/jira/browse/SOLR-2151

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 3. des. 2010, at 00.03, Adam Estrada wrote:

> Hi,
> 
> I was hoping to do it directly in the index but it was more out of curiosity 
> than anything. I can certainly map it in the DAO but again...I was hoping to 
> learn if it was possible in the index.
> 
> Thanks for the feedback!
> 
> Adam
> 
> On Dec 2, 2010, at 5:48 PM, Savvas-Andreas Moysidis wrote:
> 
>> Hi,
>> 
>> If you are able to do a full re-index then you could index the full names
>> and not the codes. When you later facet on the Country field you'll get the
>> actual name rather than the code.
>> If you are not able to re-index then probably this conversion could be added
>> at your application layer prior to displaying your results.(e.g. in your DAO
>> object)
>> 
>> On 2 December 2010 22:05, Adam Estrada wrote:
>> 
>>> All,
>>> 
>>> I have an index that has a field with country codes in it. I have 7 million
>>> or so documents in the index and when displaying facets the country codes
>>> don't mean a whole lot to me. Is there any way to add a field with the full
>>> country names then join the codes in there accordingly? I suppose I can do
>>> this before updating the records in the index but before I do that I would
>>> like to know if there is a way to do this sort of join.
>>> 
>>> Example: US -> United States
>>> 
>>> Thanks,
>>> Adam
> 



Re: Solr Multi-thread Update Transaction Control

2010-12-03 Thread Erick Erickson
>From Solr's perspective, the fact that multiple threads are
sending data to be indexed is invisible, Solr is just
reading http requests. So I don't think what you're asking
for is possible.

Could you outline the reason you want to do this? Perhaps
there's another way to accomplish it.

Best
Erick

2010/12/2 wangjb 

> Hi,
>  Now we are using solr1.4.1, and encounter a problem.
>  When multi-threads update solr data at the same time, can every thread
> have its separate transaction?
>  If this is possible, how can we realize this.
>  Is there any suggestion here?
>  Waiting online.
>  Thank you for any useful reply.
>
>
>
>


Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Tanguy Moal
Thank you very much Robert for replying that fast and accurately.

I have effectively an other idea in mind to provide similar
suggestions less expansively, I was balancing between the work around
and the report issue options.

I don't regret it since you came with a possible fix. I'll give it a
try as soon as possible, and let the list know.

Regards,

Tanguy

2010/12/3 Robert Muir :
> Actually, i took a look at the code again, the queries you mentioned:
> "I send queries to that field in the form (*term1*term2*)"
>
> I think the patch will not fix your problem... The only way i know you
> can fix this would be to upgrade to lucene/solr trunk, where wildcard
> comparison is linear to the length of the string.
>
> In all other versions, it has much worse runtime, and thats what you
> are experiencing.
>
> Separately, even better than this would be to see if you can index
> your content in a way to avoid these expensive queries. But this is
> just a suggestion, what you are doing should still work fine.
>
> On Fri, Dec 3, 2010 at 6:56 AM, Robert Muir  wrote:
>> On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal  wrote:
>>> However suddenly CPU usage simply doubles, and sometimes eventually
>>> start using all 16 cores of the server, whereas the number of handled
>>> request is pretty stable, and even starts decreasing because of
>>> degraded user experience due to dramatic response times.
>>>
>>
>> Hi Tanguy: This was fixed here:
>> https://issues.apache.org/jira/browse/LUCENE-2620.
>>
>> You can apply the patch file there
>> (https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch)
>> and recompile your own lucene 2.9.x, or you can replace the lucene jar
>> file in your solr war with the newly released lucene-2.9.4 core jar...
>> which I think is due to be released later today!
>>
>> Thanks for spending the time to report the problem... let us know the
>> patch/lucene 2.9.4 doesnt fix it!
>>
>


Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Robert Muir
Actually, i took a look at the code again, the queries you mentioned:
"I send queries to that field in the form (*term1*term2*)"

I think the patch will not fix your problem... The only way i know you
can fix this would be to upgrade to lucene/solr trunk, where wildcard
comparison is linear to the length of the string.

In all other versions, it has much worse runtime, and thats what you
are experiencing.

Separately, even better than this would be to see if you can index
your content in a way to avoid these expensive queries. But this is
just a suggestion, what you are doing should still work fine.

On Fri, Dec 3, 2010 at 6:56 AM, Robert Muir  wrote:
> On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal  wrote:
>> However suddenly CPU usage simply doubles, and sometimes eventually
>> start using all 16 cores of the server, whereas the number of handled
>> request is pretty stable, and even starts decreasing because of
>> degraded user experience due to dramatic response times.
>>
>
> Hi Tanguy: This was fixed here:
> https://issues.apache.org/jira/browse/LUCENE-2620.
>
> You can apply the patch file there
> (https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch)
> and recompile your own lucene 2.9.x, or you can replace the lucene jar
> file in your solr war with the newly released lucene-2.9.4 core jar...
> which I think is due to be released later today!
>
> Thanks for spending the time to report the problem... let us know the
> patch/lucene 2.9.4 doesnt fix it!
>


Re: [Wildcard query] Weird behaviour

2010-12-03 Thread Robert Muir
On Fri, Dec 3, 2010 at 6:28 AM, Tanguy Moal  wrote:
> However suddenly CPU usage simply doubles, and sometimes eventually
> start using all 16 cores of the server, whereas the number of handled
> request is pretty stable, and even starts decreasing because of
> degraded user experience due to dramatic response times.
>

Hi Tanguy: This was fixed here:
https://issues.apache.org/jira/browse/LUCENE-2620.

You can apply the patch file there
(https://issues.apache.org/jira/secure/attachment/12452947/LUCENE-2620_3x.patch)
and recompile your own lucene 2.9.x, or you can replace the lucene jar
file in your solr war with the newly released lucene-2.9.4 core jar...
which I think is due to be released later today!

Thanks for spending the time to report the problem... let us know the
patch/lucene 2.9.4 doesnt fix it!


Re: Limit number of characters returned

2010-12-03 Thread Ahmet Arslan


--- On Fri, 12/3/10, Mark  wrote:

> From: Mark 
> Subject: Limit number of characters returned
> To: solr-user@lucene.apache.org
> Date: Friday, December 3, 2010, 5:39 AM
> Is there way to limit the number of
> characters returned from a stored field?
> 
> For example:
> 
> Say I have a document (~2K words) and I search for a word
> that's somewhere in the middle. I would like the document to
> match the search query but the stored field should only
> return the first 200 characters of the document. Is there
> anyway to accomplish this that doesn't involve two fields?

I don't think it is possible out-of-the-box. May be you can hack highlighter to 
return that first 200 characters in highlighting response.
Or a custom response writer can do that.

But if you will be always returning first 200 characters of documents, I think 
creating additional field with indexed="false" stored="true" will be more 
efficient. And you can make your original field indexed="true" stored="false", 
your index size will be diminished.




  


Re: Batch Update Fields

2010-12-03 Thread Markus Jelsma
You must reindex the complete document, even if you just want to update a 
single field.

On Friday 03 December 2010 04:52:04 Adam Estrada wrote:
> OK part 2 of my previous question...
> 
> Is there a way to batch update field values based on a certain criteria?
> For example, if thousands of documents have a field value of 'US' can I
> update all of them to 'United States' programmatically?
> 
> Adam

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350