Re: phpnative response writer in SOLR 3.1 ?

2011-04-15 Thread Ralf Kraus

Am 14.04.2011 09:53, schrieb Ralf Kraus:


I just updatet to SOLR 3.1 and wondering if the phpnative response 
writer plugin is part of it?

( )

When I try to compile the sources files I get some errors : 
org.apache.solr.request.PHPNativeResponseWriter is not abstract and 
does not override abstract method 
in org.apache.solr.response.QueryResponseWriter

public class PHPNativeResponseWriter implements QueryResponseWriter {
   ^ method does not override a method 
from its superclass


Is there a new JAR File or something I could use with SOLR 3.1? 
Because the SOLR pecl Package only uses XML oder PHPNATIVE as response 
writer ( )

No hints at all ?

Ralf Kraus

Dismax Minimum Match/Stopwords Bug

2011-04-15 Thread Jan Høydahl
A thread with this same subject from 2008/2009 is here:

We're seeing customers being bitten by this bug now and then, and normally my 
workaround is to simply not use stopwords at all.
However, is there an actual fix in the 3.1 eDisMax parser which solves the 
problem for real? Cannot find a JIRA issue for it.

Jan Høydahl, search solution architect
Cominvent AS -

Re: SOLR support for unicode?

2011-04-15 Thread Sivasakthivel

Thanks for your response. I am currently working in this issue. 

When I run the script, I got the following result. 
Solr server is up. 
HTTP GET is accepting UTF-8 
HTTP POST is accepting UTF-8 
HTTP POST defaults to UTF-8 
ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane 
ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane 
ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic
multilingual plane

I also placed TM symbol and – Symbol in one of the example XML docs and
indexed that with post.jar, 
with  wt=python param. 

  Good unicode support: héllo (hello with an™ accent OLB – Account  over the

Good unicode support: héllo (hello with an� accent OLB � Account over the e)  

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Search and index Result

2011-04-15 Thread Erick Erickson
You're possibly getting hit by server caching. Are you by chance
submitting the exact same query after your commit? What
happens if you change your query do one you haven't used before?

Turning off http caching might help. Solr should be searching
the new contents after a commit (and any attendant warmup


On Fri, Apr 15, 2011 at 1:43 AM, satya swaroop satya.yada...@gmail.comwrote:

 Hi all,
   i just made a duplication  of solrdispatchfilter as
 solrdispatchfilter1 and solrdispatchfilter2 such that all the /update or
 /update/extract things are passed through the solrdispatchfilter1
 and all search (/select)  things are passes through the
 solrdispatchfilter2. It is because i need to establish a privacy concern
 the search result.
 I need to check whether the required user has access to the particular
 or not.. it was success in implementing the privacy of results.
 one major problem i am getting is after indexing some documents and
 commiting it, i am not getting the commited data in the search result, i am
 getting the old data that was before commit...
 But i get the result only after restarting the server.. can anyone tell me
 where to modify such that the search will give the results from the recent

 Thanks and Regards,

newbie - filter to only show queried field when query is free text

2011-04-15 Thread bryan rasmussen

If I want to filter a search result to not return all fields as per
the default but I don't know what field my hits will be in.

This is basically for unstructured document type data, for example
large HTML or DOCBOOK documents.

Bryan Rasmussen

Re: newbie - filter to only show queried field when query is free text

2011-04-15 Thread Marek Tichy
There may be better ways but as far as my knowledge goes, I'd try to use
the highhlighting component, with hl.requireFieldMatch the hightlighting
response only includes fields where hightlights were applied (match was
found), which is probably what you want.

 Marek Tichy

 If I want to filter a search result to not return all fields as per
 the default but I don't know what field my hits will be in.

 This is basically for unstructured document type data, for example
 large HTML or DOCBOOK documents.

 Bryan Rasmussen


DataImportHandler - importing XML documents, undeclared general entity - DTD right there

2011-04-15 Thread bryan rasmussen
I am importing a number of XML documents from the filesystem. The
dataimporthandler finds them, but returns an undeclared general entity
error - even though my DTD is present and findable by other parsers.

DTD Declaration
In XML file in the same folder as the DTD allartikel.dtd

Bryan Rasmussen

Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open
Hi everybody,

Recently I implemented an autocomplete mechanism for my website using a
custom TermsComponent. I was quite happy with that because it also enables
me to do a Google-like feature where complete sentences where suggested to
the user when he typed in the search field. I used Shingles to search
against pieces of sentences.
(I have resources for French people if somebody asks)

Then came solr 3.1 and its new suggest component. I have looked at the
documentation but it's still unclear how it works exactly. So please let me
ask some questions :

   - Is there performance improvements over TermsComponent ?
   - Is it able to autosuggest sentences and not only words ? If yes, how ?
   Should I keep my shingles ?
   - What is this threshold value that I see ? Is it a mandatory field to
   complete ? I want to have suggestion no matter what the frequency is in the
   document !

Thank you all, if I succeed to do that I will try to provide a tutorial to
do what with Jquery UI autocomplete + Suggest component if anyone's
Best regards.


Strange DisMax results

2011-04-15 Thread Daniel Persson

I've got a strange result of a DisMax search function. I might have
understood the functionallity wrong. But after I read the manual I
understood it is used to do ranked results with simple search terms.

Solr Version 1.4.0

I've got the setup

Schema fields
field name=name type=wc_text indexed=true stored=true
field name=shortDescription type=wc_text indexed=true stored=true
field name=longDescription type=wc_keywordTextLowerCase indexed=true
stored=false  multiValued=false/
field name=prodShortDescription type=wc_keywordTextLowerCase
indexed=true stored=false  multiValued=false/
field name=prodLongDescription type=wc_keywordTextLowerCase
indexed=true stored=false  multiValued=false/

copyField source=name dest=defaultSearch/
copyField source=shortDescription dest=defaultSearch/
copyField source=longDescription dest=defaultSearch/
copyField source=prodShortDescription dest=defaultSearch/
copyField source=prodLongDescription dest=defaultSearch/

DisMax config
  requestHandler name=dismax class=solr.DisMaxRequestHandler
lst name=defaults
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
name^1.2 shortDescription^1.0 longDescription^1.0
prodShortDescription^0.5 prodLongDescription^0.5
 str name=pf
name^1.2 shortDescription^1.0 longDescription^1.0
prodShortDescription^0.5 prodLongDescription ^0.5
 str name=q.alt*:*/str
 int name=ps100/int
 arr name=last-components

Standard config
requestHandler name=standard class=solr.StandardRequestHandler
 lst name=defaults
   str name=echoParamsexplicit/str
 arr name=last-components

When I search for a term q=term I get 68 hits. But when I search for
q=termqt=dismax I get 0 hits.

Of course I got more fields and search parameters. But the only difference I
could see is that in one case I use dismax and the other I don't.

What have I missed? Any suggestions?

Best regards


Re: Strange DisMax results

2011-04-15 Thread Erick Erickson
If you haven't modified your schema.xml, you'll find that the
defaultSearchField is set to the text field. So when
you issue the q=term you're going against your default
search field.

Assuming you've changed the default search field to
defaultSearch, then the problem is probably that your
analysis chain for default search is different from that
applied to your individual fields. Which I absolutely
guarantee since you have two different fieldTypes in
your 5 fields. I'm extremely suspicious of your fieldTypes
that involve the word keyword, because if this indicates
the KeywordTokenizer is being used, then everything
in the input is a single token, the input stream isn't being
split up...

But the best way to understand this is in the admin/analysis page.
If you check the verbose box and put in some text you'll see
the effects of each part of the chain. Try this for the field you
expect Dismax to find your term in, and also for your
defaultSearch field and I suspect you'll see what's going on


On Fri, Apr 15, 2011 at 10:35 AM, Daniel Persson mailto.wo...@gmail.comwrote:


 I've got a strange result of a DisMax search function. I might have
 understood the functionallity wrong. But after I read the manual I
 understood it is used to do ranked results with simple search terms.

 Solr Version 1.4.0

 I've got the setup

 Schema fields
 field name=name type=wc_text indexed=true stored=true
 field name=shortDescription type=wc_text indexed=true stored=true
 field name=longDescription type=wc_keywordTextLowerCase indexed=true
 stored=false  multiValued=false/
 field name=prodShortDescription type=wc_keywordTextLowerCase
 indexed=true stored=false  multiValued=false/
 field name=prodLongDescription type=wc_keywordTextLowerCase
 indexed=true stored=false  multiValued=false/

 copyField source=name dest=defaultSearch/
 copyField source=shortDescription dest=defaultSearch/
 copyField source=longDescription dest=defaultSearch/
 copyField source=prodShortDescription dest=defaultSearch/
 copyField source=prodLongDescription dest=defaultSearch/

 DisMax config
  requestHandler name=dismax class=solr.DisMaxRequestHandler
lst name=defaults
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
name^1.2 shortDescription^1.0 longDescription^1.0
 prodShortDescription^0.5 prodLongDescription^0.5
 str name=pf
name^1.2 shortDescription^1.0 longDescription^1.0
 prodShortDescription^0.5 prodLongDescription ^0.5
 str name=q.alt*:*/str
 int name=ps100/int
 arr name=last-components

 Standard config
requestHandler name=standard class=solr.StandardRequestHandler
 lst name=defaults
   str name=echoParamsexplicit/str
 arr name=last-components

 When I search for a term q=term I get 68 hits. But when I search for
 q=termqt=dismax I get 0 hits.

 Of course I got more fields and search parameters. But the only difference
 could see is that in one case I use dismax and the other I don't.

 What have I missed? Any suggestions?

 Best regards


Sort by function - 400 error

2011-04-15 Thread Michael Owen

Using solr 3.1.
When I do:
sort=score desc
it works.
sort=product(typeId,2) desc (typeId is a valid attribute in document)
it works.
sort=product(score,typeId) desc
fails on 400 error? Also sort=product(score,2) desc fails too.
Must be something basic I'm missing? Tried adding fl=*,score too.


RE: Understanding the DisMax tie parameter

2011-04-15 Thread Burton-West, Tom
Thanks everyone.

I updated the wiki.  If you have a chance please take a look and check to make 
sure I got it right on the wiki.


-Original Message-
From: Chris Hostetter [] 
Sent: Thursday, April 14, 2011 5:41 PM
Cc: Burton-West, Tom
Subject: Re: Understanding the DisMax tie parameter

: Perhaps the parameter could have had a better name.  It's essentially
: max(score of matching clauses) + tie * (score of matching clauses that
: are not the max)
: So it can be used and thought of as a tiebreak only in the sense that
: if two docs match a clause (with essentially the same score), then a
: small tie value will act as a tiebreaker *if* one of those docs also
: matches some other fields.

correct.  w/o a tiebreaker value, a dismax query will only look at the 
maximum scoring clause for each doc -- the tie param is named for it's 
ability to help break ties when multiple documents have the same score 
from the max scoring clause -- by adding in a small portion of the scores 
(based on the 0-1 ratio of the tie param) from the other clauses.


Re: Sort by function - 400 error

2011-04-15 Thread Yonik Seeley
On Fri, Apr 15, 2011 at 11:50 AM, Michael Owen wrote:

 Using solr 3.1.
 When I do:
        sort=score desc
 it works.
        sort=product(typeId,2) desc (typeId is a valid attribute in document)
 it works.
        sort=product(score,typeId) desc
 fails on 400 error? Also sort=product(score,2) desc fails too.

You can't currently use score in function queries.
You can embed another query in a function query though.
 sort=product($qq,typeId) descqq=my_query_here

In your case, when you just want to multiply the score by a field,
then you can either use the edismax query parser and the boost


Or you could directly use the boost query parser

q={!boost b=typeId}my_query_here
q={!boost b=typeId v=$qq}qq=my_query_here

-Yonik -- Lucene/Solr User Conference, May
25-26, San Francisco

Field compression

2011-04-15 Thread Charlie Jackson
I know I'm late to the party, but I recently learned that field compression was 
removed as of Solr 1.4.1. I think a lot of sites were relying on that feature, 
so I'm curious what people are doing now that it's gone. Specifically, what are 
people doing to efficiently store *and highlight* large fulltext fields? I can 
think of ways to store the text efficiently (compress it myself), or highlight 
it (leave it uncompressed), but not both at the same time.

Also, is anyone working on anything to restore compression to Solr? I 
understand it was removed because Lucene removed support for it, but I was 
hoping to upgrade my site to 3.1 soon and we rely on that feature.

- Charlie

Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Trey Grainger
I was just hoping someone might be able to point me in the right direction
here.  We just upgraded from Solr 1.4 to Solr 3.1 this past week and we're
having issues running out of disk space on our Master servers.  Our Master
has dozens of cores.  We have a script that kicks off once per day to do a
rolling optimize.  The script optimizes a single core, waits 5 minutes to
give the server some breathing room to catch up on indexing in a non-i/o
intensive state, and then moves onto the next core (repeating until done).

The problem we are facing is that under Solr 1.4, the old index files were
deleted very quickly after each optimize, but under Solr 3.1, the old index
files hang around for hours... in many cases they don't disappear until we
restart Solr completely.  This is leading to us running out of disk space,
as each core's index doubles in size during the optimize process and stays
that way until the next solr restart.

I was just wondering if anyone could point me to some specific changes or
settings which may be leading to the difference between solr versions (or
any other environmental issues you may know about).  I see several tickets
in Jira about similar issues, but they mostly appear to have been resolved
in the past.

Has anyone else seen this behavior under Solr 3.1, or do you think we may be
missing some kind of new configuration setting?

For reference, we are running on 64bit RedHat Linux.  This is what I have
right now: [From SolrConfig.xml]:

requestHandler name=/replication class=solr.ReplicationHandler
lst name=master
str name=replicateAftercommit/str
str name=replicateAfteroptimize/str
str name=replicateAfterstartup/str

  updateHandler class=solr.DirectUpdateHandler2

deletionPolicy class=solr.SolrDeletionPolicy
  str name=keepOptimizedOnlyfalse/str
  str name=maxCommitsToKeep1/str

Thanks in advance,


Split token

2011-04-15 Thread roySolr

I want to split my string when it contains (. Example:

spurs (London)
Internationale (milan)



What tokenizer can i use to fix this problem?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun

It seems the file count in index directory is the segment# * 8 in my dev

I see there are .fnm .frq .fdt .fdx .nrm .prx .tii .tis (8) file extensions,
and each has as many as segment# files.

Is it always safe to calculate the file counts using segment number multiply
by 8? of course this excludes the segment_N, segment.gen and xxx_del files.

I found most of the cores has the file count that can be calculated just
using above formula, but few cores do not have a match number... 


View this message in context:
Sent from the Solr - User mailing list archive at

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun
yeah, I can figure out the segment number by going to stat page of solr...
but my question was how to figure out exact total number of files in 'index'
folder for each core.

Like I mentioned in previous message, I currently have 8 files per segment
(.prx .tii etc), but it seems this might change if I use term vector for
example.  So I need suggestions on how to accurately figure out the total
file number.


View this message in context:
Sent from the Solr - User mailing list archive at

most stable way to get facet pivoting

2011-04-15 Thread Nikolas Tautenhahn

I want to evaluate (and probably use in production) facet pivoting -
what is the best approach to get a as-stable-as-can-be version of solr
which is able to do facet pivoting? I was hoping to see this in Solr
3.1, but apparently it is only in the dev versions/nightlies...

Is it possible to patch this feature into Solr 3.1 stable?

best regards,

Nikolas Tautenhahn

LivingLogic AG
Markgrafenallee 44
95448 Bayreuth
Amtsgericht Bayreuth ++ HRB 3274
Aufsichtsratsvorsitzender: Achim Lindner
Vorstand: Philipp Ambrosch, Alois Kastner-Maresch (Vors.)

How to combine Deduplication and Elevation

2011-04-15 Thread shamex
Hi I have a question. How to combine the Deduplication and Elevation
implementations in Solr. Currently , I managed to implement either one only.

View this message in context:
Sent from the Solr - User mailing list archive at

Solr 3.1.0 core not reloading with RamDirectoryFactory

2011-04-15 Thread nskmda

We just tried core reloading on a freshly installed Solr 3.1.0 with
It doesn't seem to happen.
With the FSDirectoryFactory everything works fine.

Looks like the RamDirectoryFactory implementation caches directory and if
it's available it doesn't really reopen it thus not having updated index
loaded into memory.

Can anyone comment on this?
Should we implement our own RamDirectoryFactory?

Here is the code snippet from Solr 3.1.0. It looks a bit confusing.

public Directory open(String path) throws IOException {
synchronized (RAMDirectoryFactory.class) {
  RefCntRamDirectory directory = directories.get(path);
  if (directory == null || !directory.isOpen()) {
directory = (RefCntRamDirectory) openNew(path);
directories.put(path, directory);
  } else {

  return directory;


View this message in context:
Sent from the Solr - User mailing list archive at

Avoiding corrupted index

2011-04-15 Thread Laurent Vaills
Hi everyone,

We are using Solr 1.4.1 in my company and we need to do some backups of the

After some googling, I'm quite confused about the differents ways of backing
up the index.

First, I tried the scripts provided in the Solr distribution without success
I untarred the apache-solr-1.4.1.tar.gz into /opt; then I launched but I get
this error :
$ /opt/apache-solr-1.4.1/src/scripts/backup
/opt/apache-solr-1.4.1/src/scripts/backup: line 26:
/opt/apache-solr-1.4.1/src/bin/scripts-util: No such file or directory
And that's true : there is no /opt/apache-solr-1.4.1/src/bin/scripts-util
but a /opt/apache-solr-1.4.1/src/scripts/scripts-util
Is this normal to distribute the scripts with a bad path ?

Then I discovered that these utility scripts were not distributed anymore
with the version 3.1.0 : were they not reliable ? can we get corrupted
backups with this scripts ?

Finally, we found the page about SolrReplication on the Solr wiki also this
in particular the answer advising to use the replication.
So we tried to use this replication mecanism (and call the URL on the slave
with the query parameters command=backup and location=/backup) but this
method requires lots of i/o for big index.

Is it the best way to get not corrupted backup of the index ?

Is there another way to do the backup with Solr 3.1 ?

Thanks in advance for your time.


Re: SOLR support for unicode?

2011-04-15 Thread Sivasakthivel

Thanks for your response. I am currently working in this issue.

When I run the script, I got the following result.
Solr server is up.
HTTP GET is accepting UTF-8
HTTP POST is accepting UTF-8
HTTP POST defaults to UTF-8
ERROR: HTTP GET is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST is not accepting UTF-8 beyond the basic multilingual plane
ERROR: HTTP POST + URL params is not accepting UTF-8 beyond the basic
multilingual plane

I also placed TM symbol and – Symbol in one of the example XML docs and
indexed that with post.jar,
with  wt=python param.

  Good unicode support: h#xE9;llo (hello with an™ accent OLB – Account 
over the e)

Good unicode support: héllo (hello with an� accent OLB � Account over the e)

View this message in context:
Sent from the Solr - User mailing list archive at

Indexing relations for sorting

2011-04-15 Thread derk.h
Hi everybody,

I have the following problem/question:

In our system we have some categories and products in those categories. Our
structure looks a bit like this:

product X belongs to category: cat1_subcat1 (10)
product X belongs to category: cat2_subcat1 (20)
product Y belongs to category: cat1_subcat2 (30)
product Z belongs to category: cat2_subcat1 (15)

Every product-to-category relation has its own sorting order which we would
like to index in solr. 

To make the problem more complex, we have two ways of searching for a

We want all products of subcat1 (no mather what the parent category is)
ordered by their sorting order
We want all products of cat2_subcat1 ordered by their sorting order

This probably is not what solr is designed for, but everything else in our
system is indexed and searched by solr. 
So it would be very helpfull if someone has an idea or suggestion to make
this work.

Our solr version is 1.3.0

Many thanks!

View this message in context:
Sent from the Solr - User mailing list archive at

how to import data from database combine with file content in solr

2011-04-15 Thread

I am new to solr,

my requirements are,

1. at regular interval need solr to fetch data from sql server database and
do indexing on it.
2. fetch only those records which is not yet indexed
3. for each record there is one file associated, so with database table
fields also want to index content of that particular file

e.g. there is one table Customer in database and customerid is primary key
   for each customerid there is associated file of that customerprofile
named with customerid,

4. as i metioned above that when solr fetch data from sql server database
table , should fetch only data which is not yet indexed, (we have one older
lucene code, in which there is one field in table that isindexed so when
fetching data in select clause there is one condition that isindexed=false,
and when indexing is done update particular record of database with
isindexed=true) is there any mechanism in solr for that?

how to achieve same ?
do i need to write custom code for that or it can be done with configuration
provided by solr?


Vishal Parekh 

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Using autocomplete with the new Suggest component

2011-04-15 Thread Quentin Proust
Hi Victor,

I have the same questions about the new Suggest component.
I can't really help you as I didn't really manage to understand how it
Sometimes, I had more results, sometimes less.

Even so, I would really be interested in your resources using Terms and
shingles to implement auto-complete.
I am myself a French student and it could help me improve the solution of
one of my project.

Best regards,

2011/4/15 openvictor Open

 Hi everybody,

 Recently I implemented an autocomplete mechanism for my website using a
 custom TermsComponent. I was quite happy with that because it also enables
 me to do a Google-like feature where complete sentences where suggested to
 the user when he typed in the search field. I used Shingles to search
 against pieces of sentences.
 (I have resources for French people if somebody asks)

 Then came solr 3.1 and its new suggest component. I have looked at the
 documentation but it's still unclear how it works exactly. So please let me
 ask some questions :

   - Is there performance improvements over TermsComponent ?
   - Is it able to autosuggest sentences and not only words ? If yes, how ?
   Should I keep my shingles ?
   - What is this threshold value that I see ? Is it a mandatory field to
   complete ? I want to have suggestion no matter what the frequency is in
   document !

 Thank you all, if I succeed to do that I will try to provide a tutorial to
 do what with Jquery UI autocomplete + Suggest component if anyone's
 Best regards.



Quentin Proust
Email :
Tel :

Re: Split token

2011-04-15 Thread Erick Erickson
What you've shown would be handled with WhitespaceTokenizer, but you'd have
prevent filters from stripping the parens. If you have to handle things like
blah ( stuff )
WhitespaceTokenizer wouldn't work.

PatternTokenizerFactory might work for you, see:


On Tue, Apr 12, 2011 at 6:02 AM, roySolr wrote:


 I want to split my string when it contains (. Example:

 spurs (London)
 Internationale (milan)



 What tokenizer can i use to fix this problem?

 View this message in context:
 Sent from the Solr - User mailing list archive at

Re: Using autocomplete with the new Suggest component

2011-04-15 Thread openvictor Open
Hi Quentin, well stick in this thread, I will try to see how it works and
get inputs from other people.

Here is the link to my blog who shows how to do it :

Note that I used Tomcat + SolR, but it can easily done with PHP. Also solrj
in 1.4.1 didn't have terms component so I had to find a way around that
problem but it's provided.

2011/4/15 Quentin Proust

 Hi Victor,

 I have the same questions about the new Suggest component.
 I can't really help you as I didn't really manage to understand how it
 Sometimes, I had more results, sometimes less.

 Even so, I would really be interested in your resources using Terms and
 shingles to implement auto-complete.
 I am myself a French student and it could help me improve the solution of
 one of my project.

 Best regards,

 2011/4/15 openvictor Open

  Hi everybody,
  Recently I implemented an autocomplete mechanism for my website using a
  custom TermsComponent. I was quite happy with that because it also
  me to do a Google-like feature where complete sentences where suggested
  the user when he typed in the search field. I used Shingles to search
  against pieces of sentences.
  (I have resources for French people if somebody asks)
  Then came solr 3.1 and its new suggest component. I have looked at the
  documentation but it's still unclear how it works exactly. So please let
  ask some questions :
- Is there performance improvements over TermsComponent ?
- Is it able to autosuggest sentences and not only words ? If yes, how
Should I keep my shingles ?
- What is this threshold value that I see ? Is it a mandatory field
complete ? I want to have suggestion no matter what the frequency is in
document !
  Thank you all, if I succeed to do that I will try to provide a tutorial
  do what with Jquery UI autocomplete + Suggest component if anyone's
  Best regards.

 Quentin Proust
 Email :
 Tel :

Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Erick Erickson
Why do you care? You haven't outlined why having the precise numbers
here is necessary. Perhaps with a higher-level statement of the problem
you're trying to solve we could make some better suggestions


On Wed, Apr 13, 2011 at 5:23 PM, Renee Sun wrote:

 yeah, I can figure out the segment number by going to stat page of solr...
 but my question was how to figure out exact total number of files in
 folder for each core.

 Like I mentioned in previous message, I currently have 8 files per segment
 (.prx .tii etc), but it seems this might change if I use term vector for
 example.  So I need suggestions on how to accurately figure out the total
 file number.


 View this message in context:
 Sent from the Solr - User mailing list archive at

RE: Split token

2011-04-15 Thread Steven A Rowe
This pattern split tokens *only* in the presence of parentheses with adjoining 
whitespace, and includes the parentheses with the tokens:


So you'll get this kind of behavior:

   Tottenham Hotspur (London)
   F.C. Internationale (milan)
   FC Midtjylland (Herning) (Ikast)


   Tottenham Hotspur
   F.C. Internationale
   FC Midtjylland 

 -Original Message-
 From: Erick Erickson []
 Sent: Friday, April 15, 2011 1:51 PM
 Subject: Re: Split token
 What you've shown would be handled with WhitespaceTokenizer, but you'd
 prevent filters from stripping the parens. If you have to handle things
 blah ( stuff )
 WhitespaceTokenizer wouldn't work.
 PatternTokenizerFactory might work for you, see:
 On Tue, Apr 12, 2011 at 6:02 AM, roySolr wrote:
  I want to split my string when it contains (. Example:
  spurs (London)
  Internationale (milan)
  What tokenizer can i use to fix this problem?
  View this message in context:
  Sent from the Solr - User mailing list archive at

Re: how to import data from database combine with file content in solr

2011-04-15 Thread Erick Erickson
Sorry if this comes through twice, but my first got rejected (this one
is plain text,
should come through better).

Part of this is solved by the Data Import Handler (DIH) see:

And think about a database data source. This can be combined
with the TikaEntityParser, and maybe some transformers to assemble
the file name and send it through parsing. Don't overlook the possibility
of parameters (the ${ reference pattern).

If you need some custom code, you can also implement a custom
Transformer that gets into the transformation chain in DIH, but you
should only approach that after you exhaust the above approach.

Hope this helps

On Fri, Apr 15, 2011 at 10:24 AM, wrote:


 I am new to solr,

 my requirements are,

 1. at regular interval need solr to fetch data from sql server database and
 do indexing on it.
 2. fetch only those records which is not yet indexed
 3. for each record there is one file associated, so with database table
 fields also want to index content of that particular file

 e.g. there is one table Customer in database and customerid is primary key
       for each customerid there is associated file of that customerprofile
 named with customerid,

 4. as i metioned above that when solr fetch data from sql server database
 table , should fetch only data which is not yet indexed, (we have one older
 lucene code, in which there is one field in table that isindexed so when
 fetching data in select clause there is one condition that isindexed=false,
 and when indexing is done update particular record of database with
 isindexed=true) is there any mechanism in solr for that?

 how to achieve same ?
 do i need to write custom code for that or it can be done with configuration
 provided by solr?


 Vishal Parekh

 View this message in context:
 Sent from the Solr - User mailing list archive at

Re: Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Yonik Seeley
I can reproduce this with the example server w/ your deletionPolicy
and replicationHandler configs.
I'll dig further to see what's behind this behavior.

-Yonik -- Lucene/Solr User Conference, May
25-26, San Francisco

On Fri, Apr 15, 2011 at 1:14 PM, Trey Grainger wrote:
 I was just hoping someone might be able to point me in the right direction
 here.  We just upgraded from Solr 1.4 to Solr 3.1 this past week and we're
 having issues running out of disk space on our Master servers.  Our Master
 has dozens of cores.  We have a script that kicks off once per day to do a
 rolling optimize.  The script optimizes a single core, waits 5 minutes to
 give the server some breathing room to catch up on indexing in a non-i/o
 intensive state, and then moves onto the next core (repeating until done).

 The problem we are facing is that under Solr 1.4, the old index files were
 deleted very quickly after each optimize, but under Solr 3.1, the old index
 files hang around for hours... in many cases they don't disappear until we
 restart Solr completely.  This is leading to us running out of disk space,
 as each core's index doubles in size during the optimize process and stays
 that way until the next solr restart.

 I was just wondering if anyone could point me to some specific changes or
 settings which may be leading to the difference between solr versions (or
 any other environmental issues you may know about).  I see several tickets
 in Jira about similar issues, but they mostly appear to have been resolved
 in the past.

 Has anyone else seen this behavior under Solr 3.1, or do you think we may be
 missing some kind of new configuration setting?

 For reference, we are running on 64bit RedHat Linux.  This is what I have
 right now: [From SolrConfig.xml]:

 requestHandler name=/replication class=solr.ReplicationHandler
    lst name=master
        str name=replicateAftercommit/str
        str name=replicateAfteroptimize/str
        str name=replicateAfterstartup/str

  updateHandler class=solr.DirectUpdateHandler2

    deletionPolicy class=solr.SolrDeletionPolicy
      str name=keepOptimizedOnlyfalse/str
      str name=maxCommitsToKeep1/str

 Thanks in advance,


Re: partial optimize does not reduce the segment number to maxNumSegments

2011-04-15 Thread Renee Sun
sorry I should elaborate that earlier...

in our production environment, we have multiple cores and the ingest
continuously all day long; we only do optimize periodically, and optimize
once a day in mid night.

So sometimes we could see 'too many open files' error. To prevent it from
happening, in production we maintain a script to monitor the segment files
total with all cores, and send out warnings if that number exceed a
threshold... it is kind of preventive measurement.  Currently we are using
the linux command to count the files. We are wondering if we can simply use
some formula to figure out this number, it will be better that way. Seems we
could use the stat url to get segment number and multiply it by 8 (that is
what we have given our schema).

Any better way to approach this? thanks a lot!

View this message in context:
Sent from the Solr - User mailing list archive at


2011-04-15 Thread Juan Grande
Hi John,

¿How can split the file of the solr index into multiple files?

Actually, the index is organized in a set of files called segments. It's not
just a single file, unless you tell Solr to do so.

That's because some file systems are about to support a maximun
 of space in a single file for example some UNIX file systems only support
 a maximun of 2GB per file.

As far as I know, Solr will never arrive to a segment file greater than 2GB,
so this shouldn't be a problem.

¿What is the recommended storage strategy for a big solr index files?

I guess that it depends in the indexing/querying performance that you're
having, the performance that you want, and what big exactly means for you.
If your index is so big that individual queries take too long, sharding may
be what you're looking for.

To better understand the index format, you can see

Also, you can take a look at my blog (, in
my last post I speak about segments merging.





 I have a quiestion about the maximun file size of solr index,
 when i have a lot of data in the solr index,

 -¿How can split the file of the solr index into multiple files?

 That's because some file systems are about to support a maximun
 of space in a single file for example some UNIX file systems only support
 a maximun of 2GB per file.

 -¿What is the recommended storage strategy for a big solr index files?

 Thanks for the reply.

 Bogotá - Colombia - South America

Re: Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Trey Grainger
Thank you, Yonik!

I see the Jira issue you created and am guessing it's due to this issue.
 We're going to remove replicateAfter=startup in the mean-time to see if
that helps (assuming this is the issue the jira ticket described).

I appreciate you taking a look at this.



On Fri, Apr 15, 2011 at 2:58 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 I can reproduce this with the example server w/ your deletionPolicy
 and replicationHandler configs.
 I'll dig further to see what's behind this behavior.

 -Yonik -- Lucene/Solr User Conference, May
 25-26, San Francisco

 On Fri, Apr 15, 2011 at 1:14 PM, Trey Grainger wrote:
  I was just hoping someone might be able to point me in the right
  here.  We just upgraded from Solr 1.4 to Solr 3.1 this past week and
  having issues running out of disk space on our Master servers.  Our
  has dozens of cores.  We have a script that kicks off once per day to do
  rolling optimize.  The script optimizes a single core, waits 5 minutes to
  give the server some breathing room to catch up on indexing in a non-i/o
  intensive state, and then moves onto the next core (repeating until
  The problem we are facing is that under Solr 1.4, the old index files
  deleted very quickly after each optimize, but under Solr 3.1, the old
  files hang around for hours... in many cases they don't disappear until
  restart Solr completely.  This is leading to us running out of disk
  as each core's index doubles in size during the optimize process and
  that way until the next solr restart.
  I was just wondering if anyone could point me to some specific changes or
  settings which may be leading to the difference between solr versions (or
  any other environmental issues you may know about).  I see several
  in Jira about similar issues, but they mostly appear to have been
  in the past.
  Has anyone else seen this behavior under Solr 3.1, or do you think we may
  missing some kind of new configuration setting?
  For reference, we are running on 64bit RedHat Linux.  This is what I have
  right now: [From SolrConfig.xml]:
  requestHandler name=/replication class=solr.ReplicationHandler
 lst name=master
 str name=replicateAftercommit/str
 str name=replicateAfteroptimize/str
 str name=replicateAfterstartup/str
   updateHandler class=solr.DirectUpdateHandler2
 deletionPolicy class=solr.SolrDeletionPolicy
   str name=keepOptimizedOnlyfalse/str
   str name=maxCommitsToKeep1/str
  Thanks in advance,


2011-04-15 Thread François Schiettecatte
Specifically to the file size support, all the file systems on current releases 
of linux (and unixes too) support large files with 64 bit offsets, and I am 
pretty sure that java VM supports 64 bit offsets in files, so there is no 2GB 
file size limit anymore.


On Apr 15, 2011, at 4:31 PM, JOHN JAIRO GÓMEZ LAVERDE wrote:

 I have a quiestion about the maximun file size of solr index,
 when i have a lot of data in the solr index,
 -¿How can split the file of the solr index into multiple files?
 That's because some file systems are about to support a maximun
 of space in a single file for example some UNIX file systems only support
 a maximun of 2GB per file.
 -¿What is the recommended storage strategy for a big solr index files?
 Thanks for the reply.
 Bogotá - Colombia - South America   

Re: Understanding the DisMax tie parameter

2011-04-15 Thread Jay Hill
Looks good, thanks Tom.


On Fri, Apr 15, 2011 at 8:55 AM, Burton-West, Tom tburt...@umich.eduwrote:

 Thanks everyone.

 I updated the wiki.  If you have a chance please take a look and check to
 make sure I got it right on the wiki.


 -Original Message-
 From: Chris Hostetter []
 Sent: Thursday, April 14, 2011 5:41 PM
 Cc: Burton-West, Tom
 Subject: Re: Understanding the DisMax tie parameter

 : Perhaps the parameter could have had a better name.  It's essentially
 : max(score of matching clauses) + tie * (score of matching clauses that
 : are not the max)
 : So it can be used and thought of as a tiebreak only in the sense that
 : if two docs match a clause (with essentially the same score), then a
 : small tie value will act as a tiebreaker *if* one of those docs also
 : matches some other fields.

 correct.  w/o a tiebreaker value, a dismax query will only look at the
 maximum scoring clause for each doc -- the tie param is named for it's
 ability to help break ties when multiple documents have the same score
 from the max scoring clause -- by adding in a small portion of the scores
 (based on the 0-1 ratio of the tie param) from the other clauses.


Re: Solr 3.1: Old Index Files Not Removed on Optimize?

2011-04-15 Thread Yonik Seeley
On Fri, Apr 15, 2011 at 5:28 PM, Trey Grainger wrote:
 Thank you, Yonik!
 I see the Jira issue you created and am guessing it's due to this issue.
  We're going to remove replicateAfter=startup in the mean-time to see if
 that helps (assuming this is the issue the jira ticket described).

Yes, removing replicateAfter=startup will avoid this bug. fixes the bug, if you
need to replicate after startup.

-Yonik -- Lucene/Solr User Conference, May
25-26, San Francisco