SnapPuller error : Unable to move index file

2010-11-22 Thread kafka0102
my replication got errors like :
Unable to move index file from: 
/home/data/tuba/search-index/eshequn.post.db_post/index.20101122034500/_21.frq 
to: 
/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000/_21.frq

I looked at log and found the last slave replication commit before the error is 
:
[2010-11-22 
15:10:18][INFO][pool-6-thread-1][SolrDeletionPolicy.java(114)]SolrDeletionPolicy.onInit:
 commits:num=4

commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_3,version=1290358965331,generation=3,filenames=[_21.fdt,
 _21.frq, _21.prx, _21.tii, _21.nrm, _21.fdx, _21.tis, segments_3, _21.fnm]

commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_kq,version=1290358966074,generation=746,filenames=[_21.frq,
 _21.prx, _q8.frq, _21.tii, _q8.prx, _q8.tii, _q8.fdt, _21.nrm, _q8.fnm, 
_21.tis, _21.fdt, _q8.nrm, _q8.fdx, segments_kq, _q8.tis, _21.fdx, _21_1r.del, 
_21.fnm]

commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_ky,version=1290358966082,generation=754,filenames=[_21.frq,
 _qg.fnm, _qe.tis, _21.tii, _qe.nrm, _qg.nrm, _qg.fdt, _21_1u.del, _qd.tii, 
_qd.nrm, _qg.tii, _21.tis, _21.fdt, _qe.fdx, _qe.prx, _qf.tii, _21.fdx, 
_qf.nrm, segments_ky, _qf.fdt, _qe.fdt, _qd.fdt, _qf.tis, _21.prx, _qd_2.del, 
_qd.fnm, _qd.fdx, _qf.fdx, _qe.frq, _qd.prx, _21.nrm, _qd.frq, _qg.prx, 
_qg.tis, _qf.frq, _qd.tis, _qf.prx, _qe.tii, _qf.fnm, _qg.fdx, _qe.fnm, 
_qg.frq, _21.fnm]

commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_l3,version=1290358966087,generation=759,filenames=[_21.frq,
 _21.prx, _21.tii, _qn.fnm, _qn.fdt, _21_1u.del, _qn.fdx, _21.nrm, _qn.nrm, 
_qn.frq, _21.tis, _qn.prx, _21.fdt, segments_l3, _qn.tis, _qn.tii, _21.fdx, 
_21.fnm]

When the error happened, the dir index.20101122031000 had been deleted. Does 
the SolrDeletionPolicy delete the index dir not only files? The problem happend 
some times.Does anyone know the reason?



How to write custom component

2010-11-22 Thread sivaprasad

Hi,

I want to write a custom component which will be invoked before the query
parser.The out put of this component should go to the query parser.

How can i configure it in solrConfig.xml

How can i get SynonymFilterFactory object programmatically.

Please share your ideas.

Regards,
Siva
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-write-custom-component-tp1945093p1945093.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax - Boosting

2010-11-22 Thread Solr User
Hi Ahmet,

In the past we used /spell and if there is not match then we use to get a
list of suggestions and then we use to make another call with the first
suggestion to get search results. After that we show user both suggestions
for the spelling mistake and results of the first suggestion.

I think the URL that you provided which has plug in will do help doing that.

Is there a way from Solr to directly get the spelling suggestions as well as
first suggestion data at the same time?

For example:

if seach keywork is mooon (typed by mistake instead of moon)

the we need all suggestions like:

Did you mean:  moon, mo, mooing, moonen, soon, mood, moose, moore,
spoon, moons?

and also the search results for the first suggestion moon.

Thanks,
Solr User

On Fri, Nov 19, 2010 at 6:41 PM, Ahmet Arslan  wrote:

> > The below is my previous configuration which use to work
> > correctly.
> >
> >  > class="solr.SpellCheckComponent">
> >   > name="queryAnalyzerFieldType">textSpell
> >  
> >   default
> >   searchFields
> >> name="spellcheckIndexDir">/solr/qa/tradedata/spellchecker
> >   true
> >  
> > 
> >
> > We use to search only in one field which is "searchFields"
> > but with
> > implementing dismax we are searching in different fields
> > like
> >
> > title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint
> > category isbn13
> > isbn10 format series season bisacsub award.
> >
> > Do we need to modify the above configuration to include all
> > the above
> > fields:??? Please give me an example.
>
> Searching and spell checking are independent. For example you can search on
> 10 fields, and create suggestions from 2 fields. Spell checker accepts one
> field in its configuration. So you need to populate this field with
> copyField. Using the fields that you want to use spell checking. And type of
> this field should be textSpell in your case. You can use above config.
>
> >
> > In the past we use to query twice to get first the
> > suggestions and then we
> > use to query using the first suggestion to show the data.
> >
> > Is there a way that we can do it in one step?
>
> Are you talking about queries that return 0 numFound? Re-executing the
> search like, described here
> http://sematext.com/products/dym-researcher/index.html
>
> Not out-of-the-box.
>
>
>
>


Special Characters

2010-11-22 Thread Solr User
Hi,

I am searching for j.r.r. tolkien and getting results back but if I search
for jrr I am not getting any results. Also not getting any results if I am
searching for jrr tolkien. I am using AND as the default operator.

The search results should work for both j.r.r. tolkien and jrr tolkien.

What configuration changes I need to make so that special characters like
hypen (-), period (.) are ignored while indexing? or any other suggestions?

Thanks,
Solr User


Re: Can a URL based datasource in DIH return non xml

2010-11-22 Thread Erick Erickson
DIH does some good stuff, but it doesn't handle bad input very robustly
(actually, how could it intuit what "the right thing" is?). I'd consider
SolrJ coupled with a "forgiving" HTML parser, e.g.
http://sourceforge.net/projects/nekohtml/

Best
Erick

On Sun, Nov 21, 2010 at 7:46 PM, lee carroll
wrote:

> Hi,
>
> Can a URL based datasource in DIH return non xml. My pages being indexed
> are
> writen by many authors and will
> often be invalid xhtml. Can DIH cope with htis or will i need another
> approach ?
>
> thanks in advance Lee C
>


Re: sort desc and out of memory exception

2010-11-22 Thread Erick Erickson
Peter's point is that sorting on a tokenized field is meaningless. Say you
index "erick xu peter" and it's tokenized. You have three tokens:
"erick", "xu", and "peter". What does sorting mean now? Should the
document be in the e's? x's? p's?

So if you're sorting on a tokenized field, trying to understand why you
get OOMs sorting desc (which I agree is kinda strange) is a waste
of time.

If you're NOT sorting on a tokenized field, can you answer some questions
about your environment? How much memory are you giving the JVM? What
version of Solr? etc. You might want to review:
http://wiki.apache.org/solr/HowToContribute

Best
Erick

On Sun, Nov 21, 2010 at 9:43 PM, xu cheng  wrote:

> thanks for replying
>
> but when it's sort with asc, it runs pretty well
> only if I sort with desc , it has the out o f memory exception
>
> 2010/11/17 Peter Karich 
>
> >  You are applying the sort against a (tokenized) text field?
> > You should better sort against a number or a string. Probably using the
> > copyField directive.
> >
> > Regards,
> > Peter.
> >
> >
> >  hi all:
> >>  I configure a solr application and there is a field of type text,and
> some
> >> kind like this 123456, that is a string of number
> >> and I wanna solr to sort the result on this field
> >> however, when I use sort asc , it works perfectly ,and when I sort it
> with
> >> desc, the application became unacceptablly slow
> >> and finally , an OutOfMemoryException was throw.
> >> does anyone have the same kind of problem?or any suggestions?
> >>
> >> thanks
> >>
> >>
> >
> > --
> > http://jetwick.com twitter search prototype
> >
> >
>


Re: Phrase Search & Multiple Keywords with Double quotes

2010-11-22 Thread Erick Erickson
In general, just escape things. See:
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping Special
Characters

But I have to say that you might want to consider carefully
whether
this is a good idea. Do your users really expect
"search for quoted phrase" to fail to match "search for "quoted" phrase"?

And how will you figure out what to quote? Or will you *require* that users
enter completely syntactically correct searches? That is, will
"search for "quoted phrase" be transformed into
"search for "quoted" phrase" or "search for "quoted phrase""?

You may be better off just dropping everything that's not alphanum from
your processing

Best
Erick

On Mon, Nov 22, 2010 at 12:39 AM, Pawan Darira wrote:

> Hi
>
> I want to do pharse searching with single/double quotes. Also there are
> cases that those phrases include special characters like & etc.
>
> What all i need to do while indexing such special characters & while
> searching them. How to handle phrase search with quotes
>
> Please suggest
>
> --
> Thanks,
> Pawan Darira
>


Re: SnapPuller error : Unable to move index file

2010-11-22 Thread Erick Erickson
what op system are you on? what version of Solr? what filesystem?

It's really hard to help without more information, you might want to review:
http://wiki.apache.org/solr/UsingMailingLists

Best
Erick

2010/11/22 kafka0102 

> my replication got errors like :
> Unable to move index file from:
> /home/data/tuba/search-index/eshequn.post.db_post/index.20101122034500/_21.frq
> to:
> /home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000/_21.frq
>
> I looked at log and found the last slave replication commit before the
> error is :
> [2010-11-22
> 15:10:18][INFO][pool-6-thread-1][SolrDeletionPolicy.java(114)]SolrDeletionPolicy.onInit:
> commits:num=4
>
>  
> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_3,version=1290358965331,generation=3,filenames=[_21.fdt,
> _21.frq, _21.prx, _21.tii, _21.nrm, _21.fdx, _21.tis, segments_3, _21.fnm]
>
>  
> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_kq,version=1290358966074,generation=746,filenames=[_21.frq,
> _21.prx, _q8.frq, _21.tii, _q8.prx, _q8.tii, _q8.fdt, _21.nrm, _q8.fnm,
> _21.tis, _21.fdt, _q8.nrm, _q8.fdx, segments_kq, _q8.tis, _21.fdx,
> _21_1r.del, _21.fnm]
>
>  
> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_ky,version=1290358966082,generation=754,filenames=[_21.frq,
> _qg.fnm, _qe.tis, _21.tii, _qe.nrm, _qg.nrm, _qg.fdt, _21_1u.del, _qd.tii,
> _qd.nrm, _qg.tii, _21.tis, _21.fdt, _qe.fdx, _qe.prx, _qf.tii, _21.fdx,
> _qf.nrm, segments_ky, _qf.fdt, _qe.fdt, _qd.fdt, _qf.tis, _21.prx,
> _qd_2.del, _qd.fnm, _qd.fdx, _qf.fdx, _qe.frq, _qd.prx, _21.nrm, _qd.frq,
> _qg.prx, _qg.tis, _qf.frq, _qd.tis, _qf.prx, _qe.tii, _qf.fnm, _qg.fdx,
> _qe.fnm, _qg.frq, _21.fnm]
>
>  
> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_l3,version=1290358966087,generation=759,filenames=[_21.frq,
> _21.prx, _21.tii, _qn.fnm, _qn.fdt, _21_1u.del, _qn.fdx, _21.nrm, _qn.nrm,
> _qn.frq, _21.tis, _qn.prx, _21.fdt, segments_l3, _qn.tis, _qn.tii, _21.fdx,
> _21.fnm]
>
> When the error happened, the dir index.20101122031000 had been deleted.
> Does the SolrDeletionPolicy delete the index dir not only files? The problem
> happend some times.Does anyone know the reason?
>
>


Re: sort desc and out of memory exception

2010-11-22 Thread Erick Erickson
Needmorecoffee That link should have been:
http://wiki.apache.org/solr/UsingMailingLists

Erick

On Mon, Nov 22, 2010 at 8:03 AM, Erick Erickson wrote:

> Peter's point is that sorting on a tokenized field is meaningless. Say you
> index "erick xu peter" and it's tokenized. You have three tokens:
> "erick", "xu", and "peter". What does sorting mean now? Should the
> document be in the e's? x's? p's?
>
> So if you're sorting on a tokenized field, trying to understand why you
> get OOMs sorting desc (which I agree is kinda strange) is a waste
> of time.
>
> If you're NOT sorting on a tokenized field, can you answer some questions
> about your environment? How much memory are you giving the JVM? What
> version of Solr? etc. You might want to review:
> http://wiki.apache.org/solr/HowToContribute
>
> Best
> Erick
>
> On Sun, Nov 21, 2010 at 9:43 PM, xu cheng  wrote:
>
>> thanks for replying
>>
>> but when it's sort with asc, it runs pretty well
>> only if I sort with desc , it has the out o f memory exception
>>
>> 2010/11/17 Peter Karich 
>>
>> >  You are applying the sort against a (tokenized) text field?
>> > You should better sort against a number or a string. Probably using the
>> > copyField directive.
>> >
>> > Regards,
>> > Peter.
>> >
>> >
>> >  hi all:
>> >>  I configure a solr application and there is a field of type text,and
>> some
>> >> kind like this 123456, that is a string of number
>> >> and I wanna solr to sort the result on this field
>> >> however, when I use sort asc , it works perfectly ,and when I sort it
>> with
>> >> desc, the application became unacceptablly slow
>> >> and finally , an OutOfMemoryException was throw.
>> >> does anyone have the same kind of problem?or any suggestions?
>> >>
>> >> thanks
>> >>
>> >>
>> >
>> > --
>> > http://jetwick.com twitter search prototype
>> >
>> >
>>
>
>


Re: Special Characters

2010-11-22 Thread Erick Erickson
What version of Solr are you using? You can think about
PatternReplaceCharFilterFactory if you're using the right
version of Solr.

But you have other problems than that. Let's claim you
get the periods removed. Do you tokenize three tokens or
one? I.e. jrr or j r r? In the latter case your search still won't
match.

Best
Erick

On Mon, Nov 22, 2010 at 7:45 AM, Solr User  wrote:

> Hi,
>
> I am searching for j.r.r. tolkien and getting results back but if I search
> for jrr I am not getting any results. Also not getting any results if I am
> searching for jrr tolkien. I am using AND as the default operator.
>
> The search results should work for both j.r.r. tolkien and jrr tolkien.
>
> What configuration changes I need to make so that special characters like
> hypen (-), period (.) are ignored while indexing? or any other suggestions?
>
> Thanks,
> Solr User
>


Re: Special Characters

2010-11-22 Thread Solr User
Hi Eric,

I use solr version 1.4.0 and below is my schema.xml





















It creates 3 tokens j r r tolkien works fine but not jrr tolkien.

I will read about PatternReplaceCharFilterFactory and try it. Please let me
know if I need to do anything differently.

Thanks,
Solr User



On Mon, Nov 22, 2010 at 8:19 AM, Erick Erickson wrote:

> What version of Solr are you using? You can think about
> PatternReplaceCharFilterFactory if you're using the right
> version of Solr.
>
> But you have other problems than that. Let's claim you
> get the periods removed. Do you tokenize three tokens or
> one? I.e. jrr or j r r? In the latter case your search still won't
> match.
>
> Best
> Erick
>
> On Mon, Nov 22, 2010 at 7:45 AM, Solr User  wrote:
>
> > Hi,
> >
> > I am searching for j.r.r. tolkien and getting results back but if I
> search
> > for jrr I am not getting any results. Also not getting any results if I
> am
> > searching for jrr tolkien. I am using AND as the default operator.
> >
> > The search results should work for both j.r.r. tolkien and jrr tolkien.
> >
> > What configuration changes I need to make so that special characters like
> > hypen (-), period (.) are ignored while indexing? or any other
> suggestions?
> >
> > Thanks,
> > Solr User
> >
>


Re: How to write custom component

2010-11-22 Thread Grant Ingersoll

On Nov 22, 2010, at 6:21 AM, sivaprasad wrote:

> 
> Hi,
> 
> I want to write a custom component which will be invoked before the query
> parser.The out put of this component should go to the query parser.

Probably best to start with http://wiki.apache.org/solr/SolrPlugins.  Also, 
have a look at the existing components, such as the TermsComponent or 
QueryComponent or SpellCheckComponent.

You might also want to consider explaining more what you are after 
(http://people.apache.org/~hossman/#xyproblem) before you go down the path.  
Perhaps there is a way to do what you are after already?  Or perhaps you could 
just write a QParser that does what you need?

> 
> How can i configure it in solrConfig.xml

See the example, as it shows how to do it for the SpellCheckComponent.

> 
> How can i get SynonymFilterFactory object programmatically.

You should be able to just construct one.


--
Grant Ingersoll
http://www.lucidimagination.com



Re: Spell-Check Component Functionality

2010-11-22 Thread Grant Ingersoll

On Nov 21, 2010, at 7:14 AM, rajini maski wrote:

> If any one know articles or blog on solr spell-check component configuration
> type..please let me know..solr-wiki not helping me solve maze..

Might be helpful: 
http://www.lucidimagination.com/blog/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/

BTW, in your schema, how are you populating the spell field?  Do you have a 
copy field setup or are you sending in directly to the spell field?

> 
> On Fri, Nov 19, 2010 at 12:40 PM, rajini maski wrote:
> 
>> And If I am trying to do :
>> 
>> http://localhost:8909/solr/select/?spellcheck.q=Curst&version=2.2&start=0&rows=10&indent=on&spellcheck=true
>> &q=Curst&
>> 
>> The XML OUTPUT IS
>> 
>> -
>> 
>> -
>> 
>>   0
>>   0
>> -
>> 
>>   on
>>   0
>>   Curst
>>   Curst
>>   10
>>   2.2
>>  
>>  
>>   
>>  
>> 
>> No suggestion Tags also...
>> 
>> If I am trying to do :
>> 
>> http://localhost:8909/solr/select/?spellcheck.q=Curst&version=2.2&start=0&rows=10&indent=on&spellcheck=true
>> &q=Crust&
>> 
>> The XML OUTPUT IS
>> 
>> -
>> 
>> -
>> 
>>   0
>>   0
>> -
>> 
>>   on
>>   0
>>   Crust
>>   Curst
>>   10
>>   2.2
>>  
>>  
>> -
>> 
>> -
>> 
>>   Crust
>>  
>>  
>>  
>> 
>> No suggestion Tags..
>> 
>> What is the proper configuration for this? Is there any specific article
>> written on spell check-solr  other then in solr-wiki page..I am not getting
>> clear idea about this component in solr-wiki..
>> 
>> Awaiting replies..
>> Rajani Maski
>> 
>> 
>> On Fri, Nov 19, 2010 at 11:32 AM, rajini maski wrote:
>> 
>>> Hello Peter,
>>>Thanks For reply :)I did spellcheck.q=Curst as you said ...Query
>>> is like:
>>> 
>>> 
>>> http://localhost:8909/solr/select/?spellcheck.q=Curst&version=2.2&start=0&rows=10&indent=on&spellcheck=true
>>> 
>>> 
>>> 
>>> I am getting this error :(
>>> 
>>> HTTP Status 500 - null java.lang.NullPointerException at
>>> java.io.StringReader.(Unknown Source) at
>>> org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197) at
>>> org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at
>>> org.apache.solr.search.QParser.getQuery(QParser.java:131) at
>>> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
>>> at
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
>>> at
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>> at
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>> at
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>> at
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>> at
>>> 
>>> 
>>> What is the error mean ... ? what do I need to do for this.. Any mistake
>>> in config?
>>> 
>>> The config.xml and schema I have attached in the mail below FYI..Please
>>> let me know if anyone know why is this error..
>>> 
>>> Awaiting reply
>>> Rajani Maski
>>> 
>>> 
>>> On Thu, Nov 18, 2010 at 8:09 PM, Peter Karich  wrote:
>>> 
 Hi Rajani,
 
 some notes:
 * try spellcheck.q=curst or completely without spellcheck.q but with q
 * compared to the normal q parameter spellcheck.q can have a different
 analyzer/tokenizer and is used if present
 * do not do spellcheck.build=true for every request (creating the
 spellcheck index can be very expensive)
 * if you got spellcheck working embed the spellcheck component into your
 normal query component. otherwise you need to query 2 times ...
 
 Regards,
 Peter.
 
 
 All,
> 
>I am trying apply the Solr spell check component functionality to
> our
> data.
> 
> The configuration set up I needed to make for it by updating config.xml
> and
> schema.xml is done as follows..
> Please let me know if any errors in it.
> 
> I am not getting any suggestions in suggestion tags of solr output xml.
> 
>

DisMaxQParserPlugin and Tokenization

2010-11-22 Thread jan.kurella
Hi,



Using the SearchHandler with the deftype=”dismax” option enables the 
DisMaxQParserPlugin. From investigating it seems, it is just tokenizing by 
whitespace.



Although by looking in the code I could not find the place, where this behavior 
is enforced? I only found, that for each field the getFieldQuery() method is 
called, which either throws an “unknownField” exception or returns the correct 
analyzer including tokenizer and filter for the given field.



We want to use a more fancier Tokenizer/filter setting with the DisMaxQuery 
stuff.



Where to hook in best?



Jan



Jetwick Twitter Search now Open Source

2010-11-22 Thread Peter Karich

Jetwick is now available under the Apache 2 license:
http://www.pannous.info/2010/11/jetwick-is-now-open-source/

Regards,
Peter.


PS:
features http://www.pannous.info/products/jetwick-twitter-search/
installation https://github.com/karussell/Jetwick/wiki
for devs 
http://karussell.wordpress.com/2010/11/22/jetwick-twitter-search-is-now-free-software-wicket-and-solr-pearls-for-developers/




Re:Re: SnapPuller error : Unable to move index file

2010-11-22 Thread kafka0102
sorry for my unclear question.
My solr's version is 1.4.1,and I  maybe hit a solr's bug.
In my case,my slave's using index's directory is index.20101122031000.It was 
generated at 2010-11-22 03:10:00 because of some reasons(It's not important). 
And at 2010-11-22 15:10:00,the slave got a replication.I find the function in 
SnapPuller:
  private File createTempindexDir(final SolrCore core) {
final String tmpIdxDirName = "index." + new 
SimpleDateFormat(SnapShooter.DATE_FMT).format(new Date());
final File tmpIdxDir = new File(core.getDataDir(), tmpIdxDirName);
tmpIdxDir.mkdirs();
return tmpIdxDir;
  }
and SnapShooter.DATE_FMT = "MMddhhmmss"
So in this replication, the tmpIndexDir and indexDir both are 
"20101122031000".In the end of the replication,delTree(tmpIndexDir) will delete 
the index dir.

So SnapShooter.DATE_FMT = "MMddHHmmss" should be fine.






At 2010-11-22 21:13:41,"Erick Erickson"  wrote:

>what op system are you on? what version of Solr? what filesystem?
>
>It's really hard to help without more information, you might want to review:
>http://wiki.apache.org/solr/UsingMailingLists
>
>Best
>Erick
>
>2010/11/22 kafka0102 
>
>> my replication got errors like :
>> Unable to move index file from:
>> /home/data/tuba/search-index/eshequn.post.db_post/index.20101122034500/_21.frq
>> to:
>> /home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000/_21.frq
>>
>> I looked at log and found the last slave replication commit before the
>> error is :
>> [2010-11-22
>> 15:10:18][INFO][pool-6-thread-1][SolrDeletionPolicy.java(114)]SolrDeletionPolicy.onInit:
>> commits:num=4
>>
>>  
>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_3,version=1290358965331,generation=3,filenames=[_21.fdt,
>> _21.frq, _21.prx, _21.tii, _21.nrm, _21.fdx, _21.tis, segments_3, _21.fnm]
>>
>>  
>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_kq,version=1290358966074,generation=746,filenames=[_21.frq,
>> _21.prx, _q8.frq, _21.tii, _q8.prx, _q8.tii, _q8.fdt, _21.nrm, _q8.fnm,
>> _21.tis, _21.fdt, _q8.nrm, _q8.fdx, segments_kq, _q8.tis, _21.fdx,
>> _21_1r.del, _21.fnm]
>>
>>  
>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_ky,version=1290358966082,generation=754,filenames=[_21.frq,
>> _qg.fnm, _qe.tis, _21.tii, _qe.nrm, _qg.nrm, _qg.fdt, _21_1u.del, _qd.tii,
>> _qd.nrm, _qg.tii, _21.tis, _21.fdt, _qe.fdx, _qe.prx, _qf.tii, _21.fdx,
>> _qf.nrm, segments_ky, _qf.fdt, _qe.fdt, _qd.fdt, _qf.tis, _21.prx,
>> _qd_2.del, _qd.fnm, _qd.fdx, _qf.fdx, _qe.frq, _qd.prx, _21.nrm, _qd.frq,
>> _qg.prx, _qg.tis, _qf.frq, _qd.tis, _qf.prx, _qe.tii, _qf.fnm, _qg.fdx,
>> _qe.fnm, _qg.frq, _21.fnm]
>>
>>  
>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_l3,version=1290358966087,generation=759,filenames=[_21.frq,
>> _21.prx, _21.tii, _qn.fnm, _qn.fdt, _21_1u.del, _qn.fdx, _21.nrm, _qn.nrm,
>> _qn.frq, _21.tis, _qn.prx, _21.fdt, segments_l3, _qn.tis, _qn.tii, _21.fdx,
>> _21.fnm]
>>
>> When the error happened, the dir index.20101122031000 had been deleted.
>> Does the SolrDeletionPolicy delete the index dir not only files? The problem
>> happend some times.Does anyone know the reason?
>>
>>


Re: Special Characters

2010-11-22 Thread Erick Erickson
As I remember, PatternReplace... isn't in 1.4, so you'd have to move to 3.x
or trunk.

You could always write a custom class that did what you wanted, it's
actually
pretty easy.

Best
Erick

On Mon, Nov 22, 2010 at 8:37 AM, Solr User  wrote:

> Hi Eric,
>
> I use solr version 1.4.0 and below is my schema.xml
>
> 
> 
> 
> 
> 
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
> 
>  ignoreCase="true" expand="true"/>
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
> 
> 
>
> It creates 3 tokens j r r tolkien works fine but not jrr tolkien.
>
> I will read about PatternReplaceCharFilterFactory and try it. Please let me
> know if I need to do anything differently.
>
> Thanks,
> Solr User
>
>
>
> On Mon, Nov 22, 2010 at 8:19 AM, Erick Erickson  >wrote:
>
> > What version of Solr are you using? You can think about
> > PatternReplaceCharFilterFactory if you're using the right
> > version of Solr.
> >
> > But you have other problems than that. Let's claim you
> > get the periods removed. Do you tokenize three tokens or
> > one? I.e. jrr or j r r? In the latter case your search still won't
> > match.
> >
> > Best
> > Erick
> >
> > On Mon, Nov 22, 2010 at 7:45 AM, Solr User  wrote:
> >
> > > Hi,
> > >
> > > I am searching for j.r.r. tolkien and getting results back but if I
> > search
> > > for jrr I am not getting any results. Also not getting any results if I
> > am
> > > searching for jrr tolkien. I am using AND as the default operator.
> > >
> > > The search results should work for both j.r.r. tolkien and jrr tolkien.
> > >
> > > What configuration changes I need to make so that special characters
> like
> > > hypen (-), period (.) are ignored while indexing? or any other
> > suggestions?
> > >
> > > Thanks,
> > > Solr User
> > >
> >
>


passing arguments to analyzer/filter at runtime

2010-11-22 Thread jan.kurella
Hi,

I’m trying to find a solution to search only in a given language.

On index time the language is known per string to be tokenized so I would like 
to write a filter that prefixes each token according to its language.
First question: how to pass the language argument to the filter best?

I’m going to use multivalued fields, and each value I put in that field has 
another language.
How do I pass several languages on to the filter best?

on search side it gets a bit trickier, here I do not know exactly the language 
of the input query but several possible. So instead of prefixing each token 
with one language code I need to prefix each token with every possible language 
code.
How do I pass parameters to the filter at query time?

I’m not using the URL variant I am using the SolrServer.query(SolrQuery) 
interface.

Jan


Re: passing arguments to analyzer/filter at runtime

2010-11-22 Thread Markus Jelsma
Hi,

I wouldn't use a multiValued field for this because you then you would have the 
same analyzers (and possibly stemmers) for different languages.

The usual method is to have fieldTypes for each language (en_text, de_text etc) 
and then create specific fields that map to them (en_content, de_content etc).

Since you know the language at index time, you can simply add the content to 
the proper LANG_content field.

Cheers,

On Monday 22 November 2010 15:58:41 jan.kure...@nokia.com wrote:
> Hi,
> 
> I’m trying to find a solution to search only in a given language.
> 
> On index time the language is known per string to be tokenized so I would
> like to write a filter that prefixes each token according to its language.
> First question: how to pass the language argument to the filter best?
> 
> I’m going to use multivalued fields, and each value I put in that field has
> another language. How do I pass several languages on to the filter best?
> 
> on search side it gets a bit trickier, here I do not know exactly the
> language of the input query but several possible. So instead of prefixing
> each token with one language code I need to prefix each token with every
> possible language code. How do I pass parameters to the filter at query
> time?
> 
> I’m not using the URL variant I am using the SolrServer.query(SolrQuery)
> interface.
> 
> Jan

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Problem with synonyms

2010-11-22 Thread sivaprasad


In synonyms.txt file i have the below synonyms.

ipod, i-pod, i pod

If expand==false  during the index time, Is it going to replace all the
occurences of "i-pod", "i pod" with "ipod" ?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-synonyms-tp1905051p1946336.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax - Boosting

2010-11-22 Thread Ahmet Arslan
> In the past we used /spell and if there is not match then
> we use to get a
> list of suggestions and then we use to make another call
> with the first
> suggestion to get search results. After that we show user
> both suggestions
> for the spelling mistake and results of the first
> suggestion.
> 
> I think the URL that you provided which has plug in will do
> help doing that.

Yes, it does exactly about what you describe.

> Is there a way from Solr to directly get the spelling
> suggestions as well as
> first suggestion data at the same time?

You can't do that in one step with out-of-the-box solr. You need a plugin for 
that.



  


Re: Problem with synonyms

2010-11-22 Thread Yonik Seeley
On Sat, Nov 20, 2010 at 5:59 AM, sivaprasad  wrote:
> Even after expanding the synonyms also i am unable to get same results.

What you are trying to do should work with index-time synonym expansion.
Just make sure to remove the synonym filter at query time (or use a
synonym filter w/o multi-word synonyms).

What's the original text in the document you are trying to match?

-Yonik
http://www.lucidimagination.com


RE: Empty value/string matching

2010-11-22 Thread Bob Sandiford
One possibility to consider - if you really need documents with specifically 
empty or non-defined values (if that's not an oxymoron :)), and you have 
control over the values you send into the indexing, you could set a special 
value that means 'no value'. We've done that in a similar vein, using something 
like '@@EMPTY@@' for a given field, meaning that the original document didn't 
actually have a value for that field.  I.E. it is something very unlikely to be 
a 'real' value - and then we can easily select on documents by querying for the 
field:@@EMPTY@@ instead of the negated form of the select...  However, we 
haven't considered things like what it does to index size.  It's relatively 
rare for us (that there not be a value), so our 'gut feel' is that it's not 
impacting the indexes very much size-wise or performance-wise.

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 

> -Original Message-
> From: Viswa S [mailto:svis...@hotmail.com]
> Sent: Saturday, November 20, 2010 5:38 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Empty value/string matching
> 
> 
> Erick,
> Thanks for the quick response. The output i showed is on a test
> instance i created to simulate this issue. I intentionally tried to
> create documents with no values by creating xml nodes with " name="fieldName">", but having values in the other fields in a
> document.
> Are you saying that there is no way have a field with no value?, with
> text fields they seem to make sense than for string?.
> You are right on fieldName:[* TO *] results, which basically returned
> all the documents which included the couple of documents in question.
> -Viswa
> > Date: Sat, 20 Nov 2010 17:20:53 -0500
> > Subject: Re: Empty value/string matching
> > From: erickerick...@gmail.com
> > To: solr-user@lucene.apache.org
> >
> > I don't think that's correct. The documents wouldn't be showing
> > up in the facets if they had no value for the field. So I think
> you're
> > being mislead by the printout from the faceting. Perhaps you
> > have unprintable characters in there or some such. Certainly the
> > name:" " is actually a value, admittedly just a space. As for the
> > other, I suspect something similar.
> >
> > What results do you get back when you just search for
> > FieldName:[* TO *]? I'm betting you get all the docs back,
> > but I've been very wrong before.
> >
> > Best
> > Erick
> >
> > On Sat, Nov 20, 2010 at 5:02 PM, Viswa S  wrote:
> >
> > >
> > > Yes I do have a couple of documents with no values and one with an
> empty
> > > string. Find below the output of a facet on the fieldName.
> > > ThanksViswa
> > >
> > >
> > > 22 > > name="GDOGPRODY.424">22 name="
> > > ">1
> > > > Date: Sat, 20 Nov 2010 15:29:06 -0500
> > > > Subject: Re: Empty value/string matching
> > > > From: erickerick...@gmail.com
> > > > To: solr-user@lucene.apache.org
> > > >
> > > > Are you absolutely sure your documents really don't have any
> values for
> > > > "FieldName"? Because your results are perfectly correct if every
> doc has
> > > a
> > > > value for "FieldName".
> > > >
> > > > Or are you saying there no such field as "FieldName"?
> > > >
> > > > Best
> > > > Erick
> > > >
> > > > On Sat, Nov 20, 2010 at 3:12 PM, Viswa S 
> wrote:
> > > >
> > > > >
> > > > > Folks,Am trying to query documents which have no values
> present, I have
> > > > > used the following constructs and it doesn't seem to work on
> the solr
> > > dev
> > > > > tip (as of 09/22) or the 1.4 builds.1. (*:* AND -FieldName[* TO
> *]) -
> > > > > returns no documents, parsedquery was "+MatchAllDocsQuery(*:*)
> > > -FieldName:[*
> > > > > TO *]"2. -FieldName:[* TO *] -  returns no documents,
> parsedquery was
> > > > > "-FieldName:[* TO *]"3. FieldName:"" - returns no documents,
> > > parsedquery was
> > > > > empty ()The field is type string,
> using the
> > > > > LuceneQParser, I have also tried to see if "FieldName:[* TO *]"
> if the
> > > > > documents with no terms are ignored and didn't seem to be the
> case, the
> > > > > result set was everything.Any help would be appreciated.-Viswa
> > > > >
> > >
> > >
> 



Re: Problem with synonyms

2010-11-22 Thread Yonik Seeley
On Mon, Nov 22, 2010 at 10:29 AM, Yonik Seeley
 wrote:
> On Sat, Nov 20, 2010 at 5:59 AM, sivaprasad  
> wrote:
>> Even after expanding the synonyms also i am unable to get same results.
>
> What you are trying to do should work with index-time synonym expansion.
> Just make sure to remove the synonym filter at query time (or use a
> synonym filter w/o multi-word synonyms).

Actually, to be more precise, the current query-time restriction is
that you can't produce synonyms of different lengths.
Hence you could normalize "High Definition TV" to "hdtv" at both query
time and index time.

Optionally you can expand to both "High Definition TV" and "hdtv" at
index time (in which case you would normally turn off query time
synonym processing).

-Yonik
http://www.lucidimagination.com


RE: passing arguments to analyzer/filter at runtime

2010-11-22 Thread jan.kurella
Hi,

yes this is one of my four options I am going to evaluate. Why your suggestion 
might be problematic:

We have ca. 12 language sensitive fields and support ca. 200 distinct languages 
= 2400 fields
a multifield/dismax query spanning 2400 fields might become problematic?

We will go for this approach as well, but we are not sure if it will be the 
best for roughly 20GB raw data with (due to the many languages and names) 
100billions of separate tokens.

Is my approach possible?

Jan

-Original Message-
From: ext Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Montag, 22. November 2010 16:10
To: solr-user@lucene.apache.org
Subject: Re: passing arguments to analyzer/filter at runtime

Hi,

I wouldn't use a multiValued field for this because you then you would have the 
same analyzers (and possibly stemmers) for different languages.

The usual method is to have fieldTypes for each language (en_text, de_text etc) 
and then create specific fields that map to them (en_content, de_content etc).

Since you know the language at index time, you can simply add the content to 
the proper LANG_content field.

Cheers,

On Monday 22 November 2010 15:58:41 jan.kure...@nokia.com wrote:
> Hi,
> 
> I’m trying to find a solution to search only in a given language.
> 
> On index time the language is known per string to be tokenized so I would
> like to write a filter that prefixes each token according to its language.
> First question: how to pass the language argument to the filter best?
> 
> I’m going to use multivalued fields, and each value I put in that field has
> another language. How do I pass several languages on to the filter best?
> 
> on search side it gets a bit trickier, here I do not know exactly the
> language of the input query but several possible. So instead of prefixing
> each token with one language code I need to prefix each token with every
> possible language code. How do I pass parameters to the filter at query
> time?
> 
> I’m not using the URL variant I am using the SolrServer.query(SolrQuery)
> interface.
> 
> Jan

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


SOLR and secure content

2010-11-22 Thread Jos Janssen

Hi,

We are currently investigating how to setup a correct solr server for our
goals.
The problem i'm running into is how to design the solr setup so that we can
check if a user is authenticated for viewing the document.  Let me explain
the situation.

We have a website with some pages and documents which are accesible by
everyone (Public).
We also have some sort of extranet, thse pages and documents are not
accesible for everyone. 
In this extranet we have different user groups. Acces is defined by the user
group. 

What i'm looking for is some sort of best practices to design/configure solr
setup for this situation.
I searched the internet but could find any examples or documentation for
this situation.

Maybe i'm not looking for the right documentation, that why i post this
message. 
Can someone give me some information for this.

Regards,

Jos 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-and-secure-content-tp1945028p1945028.html
Sent from the Solr - User mailing list archive at Nabble.com.


DisMaxQParserPlugin and Tokenization

2010-11-22 Thread jan.kurella
Hi,



Using the SearchHandler with the deftype=”dismax” option enables the 
DisMaxQParserPlugin. From investigating it seems, it is just tokenizing by 
whitespace.



Although by looking in the code I could not find the place, where this behavior 
is enforced? I only found, that for each field the getFieldQuery() method is 
called, which either throws an “unknownField” exception or returns the correct 
analyzer including tokenizer and filter for the given field.



We want to use a more fancier Tokenizer/filter setting with the DisMaxQuery 
stuff.



Where to hook in best?



Jan



Re: Special Characters

2010-11-22 Thread Shawn Heisey

On 11/22/2010 7:40 AM, Erick Erickson wrote:

As I remember, PatternReplace... isn't in 1.4, so you'd have to move to 3.x
or trunk.

You could always write a custom class that did what you wanted, it's
actually
pretty easy.


PatternReplaceCharFilterFactory isn't in 1.4, but PatternReplaceFilterFactory 
is.  I'm using it in my 1.4.1 installation.  The CharFilter version gets 
applied before tokenization, which caused problems for me in my testing of 
branch_3x.  In situations where the order of operations isn't important, the 
CharFilter option would be great.

Based on their description, I'd think what they actually want is 
WordDelimiterFilterFactory with preserveOriginal and catenateWords 
turned on at a minimum.  That should match on any likely representation 
of J.R.R. Tolkien.  The other options can also be useful.


In my schema, the index analyzer has WordDelimiterFilterFactory with 
everything turned on except catenateAll, and the query analyzer is the 
same except all three catenate options are turned off.


Shawn



Re: SOLR and secure content

2010-11-22 Thread Savvas-Andreas Moysidis
Hi,

Could you elaborate a bit more on how you access Solr? are you making direct
Solr calls or is the communication directed through an application layer?

On 22 November 2010 11:05, Jos Janssen  wrote:

>
> Hi,
>
> We are currently investigating how to setup a correct solr server for our
> goals.
> The problem i'm running into is how to design the solr setup so that we can
> check if a user is authenticated for viewing the document.  Let me explain
> the situation.
>
> We have a website with some pages and documents which are accesible by
> everyone (Public).
> We also have some sort of extranet, thse pages and documents are not
> accesible for everyone.
> In this extranet we have different user groups. Acces is defined by the
> user
> group.
>
> What i'm looking for is some sort of best practices to design/configure
> solr
> setup for this situation.
> I searched the internet but could find any examples or documentation for
> this situation.
>
> Maybe i'm not looking for the right documentation, that why i post this
> message.
> Can someone give me some information for this.
>
> Regards,
>
> Jos
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-and-secure-content-tp1945028p1945028.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Can a URL based datasource in DIH return non xml

2010-11-22 Thread lee carroll
Hi Erik,

Thank you for the response. Just for completeness of the thread
I'm going to process the xhtml off-line. Another approach could be to set up
a web service which DIH could call which returned xml from a html parser.
However for my purposes its just as easy to use curl and perl and then use
DIH

cheers Lee

On 22 November 2010 12:59, Erick Erickson  wrote:

> DIH does some good stuff, but it doesn't handle bad input very robustly
> (actually, how could it intuit what "the right thing" is?). I'd consider
> SolrJ coupled with a "forgiving" HTML parser, e.g.
> http://sourceforge.net/projects/nekohtml/
>
> Best
> Erick
>
> On Sun, Nov 21, 2010 at 7:46 PM, lee carroll
> wrote:
>
> > Hi,
> >
> > Can a URL based datasource in DIH return non xml. My pages being indexed
> > are
> > writen by many authors and will
> > often be invalid xhtml. Can DIH cope with htis or will i need another
> > approach ?
> >
> > thanks in advance Lee C
> >
>


Re: Special Characters

2010-11-22 Thread Erick Erickson
Hmmm, good point on WordDelimiterFilterFactory. You're right, that should
work.

Although there'd still be a problem with J. R. R. never matching
jrr. But that wouldn't be solved by Pattern either. I'd try to
define the problem away ...

good catch
Erick

On Mon, Nov 22, 2010 at 12:15 PM, Shawn Heisey  wrote:

> On 11/22/2010 7:40 AM, Erick Erickson wrote:
>
>> As I remember, PatternReplace... isn't in 1.4, so you'd have to move to
>> 3.x
>> or trunk.
>>
>> You could always write a custom class that did what you wanted, it's
>> actually
>> pretty easy.
>>
>
> PatternReplaceCharFilterFactory isn't in 1.4, but
> PatternReplaceFilterFactory is.  I'm using it in my 1.4.1 installation.  The
> CharFilter version gets applied before tokenization, which caused problems
> for me in my testing of branch_3x.  In situations where the order of
> operations isn't important, the CharFilter option would be great.
>
> Based on their description, I'd think what they actually want is
> WordDelimiterFilterFactory with preserveOriginal and catenateWords turned on
> at a minimum.  That should match on any likely representation of J.R.R.
> Tolkien.  The other options can also be useful.
>
> In my schema, the index analyzer has WordDelimiterFilterFactory with
> everything turned on except catenateAll, and the query analyzer is the same
> except all three catenate options are turned off.
>
> Shawn
>
>


Facet - Range Query issue

2010-11-22 Thread Solr User
Hi,

I am having issue with querying and using facet.

This was working fine earlier:

/spell/?q=(sun) AND (pubyear:[1991 TO
2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true&debugQuery=on

After modifying to use dismax handler with new schema the below query does
not work:

/select/?q=(sun) AND (pubyear:[1991 TO
2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear_facet&facet.field=format_facet&facet.field=series_facet&facet.field=season_facet&facet.field=imprint_facet&facet.field=category_facet&facet.field=award_facet&facet.field=age_facet&facet.field=reading_facet&facet.field=grade_facet&facet.field=price_facet&spellcheck=true&debugQuery=on


  (sun) AND (pubyear:[1991 TO 2011])
  (sun) AND (pubyear:[1991 TO 2011])
  +((+DisjunctionMaxQuery((series:sun | desc:sun |
bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun |
author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun |
imprint:sun | subtitle:sun^3.0 | isbn13:sun))
+DisjunctionMaxQuery((series:"pubyear 1991" | desc:"pubyear 1991" |
bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991"))
DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011 |
format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 |
category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 |
subtitle:2011^3.0 | isbn13:2011)))~1) ()
  +((+(series:sun | desc:sun | bisacsub:sun
| award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 |
category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun |
subtitle:sun^3.0 | isbn13:sun) +(series:"pubyear 1991" | desc:"pubyear 1991"
| bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991") (series:2011 |
desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 |
pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 |
isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 |
isbn13:2011))~1) ()
  
  DisMaxQParser

Basically we are trying to pass the query string along with a facet field
and the range. Is there any syntax issue? Please help this is urgent as I
got stuck.

Thanks,
Solr user


Re: Facet - Range Query issue

2010-11-22 Thread Erick Erickson
Well, without seeing the changes you made to the schema, it's hard to tell
much.
Also, could you define "not work"? What, exactly, fails to do what you
expect?

But the first question I have is "did you reindex after changing your
schema?".

And have you checked your index to verify that there values in the fields
you
changed?

Best
Erick

On Mon, Nov 22, 2010 at 1:42 PM, Solr User  wrote:

> Hi,
>
> I am having issue with querying and using facet.
>
> This was working fine earlier:
>
> /spell/?q=(sun) AND (pubyear:[1991 TO
>
> 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true&debugQuery=on
>
> After modifying to use dismax handler with new schema the below query does
> not work:
>
> /select/?q=(sun) AND (pubyear:[1991 TO
>
> 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear_facet&facet.field=format_facet&facet.field=series_facet&facet.field=season_facet&facet.field=imprint_facet&facet.field=category_facet&facet.field=award_facet&facet.field=age_facet&facet.field=reading_facet&facet.field=grade_facet&facet.field=price_facet&spellcheck=true&debugQuery=on
>
> 
>  (sun) AND (pubyear:[1991 TO 2011])
>  (sun) AND (pubyear:[1991 TO 2011])
>  +((+DisjunctionMaxQuery((series:sun | desc:sun |
> bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun |
> author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun |
> imprint:sun | subtitle:sun^3.0 | isbn13:sun))
> +DisjunctionMaxQuery((series:"pubyear 1991" | desc:"pubyear 1991" |
> bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
> shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
> 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
> isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
> subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991"))
> DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011 |
> format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 |
> category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 |
> subtitle:2011^3.0 | isbn13:2011)))~1) ()
>  +((+(series:sun | desc:sun | bisacsub:sun
> | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 |
> category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun |
> subtitle:sun^3.0 | isbn13:sun) +(series:"pubyear 1991" | desc:"pubyear
> 1991"
> | bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
> shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
> 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
> isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
> subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991") (series:2011 |
> desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 |
> pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 |
> isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 |
> isbn13:2011))~1) ()
>  
>  DisMaxQParser
>
> Basically we are trying to pass the query string along with a facet field
> and the range. Is there any syntax issue? Please help this is urgent as I
> got stuck.
>
> Thanks,
> Solr user
>


Re: SOLR and secure content

2010-11-22 Thread Jos Janssen

Hi,

We plan to make an application layer in PHP which will communicate to the
solr server.

Direct calls will only be made for administration purposes only.

regards,

jos
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-and-secure-content-tp1945028p1947970.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet - Range Query issue

2010-11-22 Thread Solr User
Eric,

I solved the issue by adding fq parameter in the query. Thank you so much
for your reply.

Thanks,
Murali

On Mon, Nov 22, 2010 at 1:51 PM, Erick Erickson wrote:

> Well, without seeing the changes you made to the schema, it's hard to tell
> much.
> Also, could you define "not work"? What, exactly, fails to do what you
> expect?
>
> But the first question I have is "did you reindex after changing your
> schema?".
>
> And have you checked your index to verify that there values in the fields
> you
> changed?
>
> Best
> Erick
>
> On Mon, Nov 22, 2010 at 1:42 PM, Solr User  wrote:
>
> > Hi,
> >
> > I am having issue with querying and using facet.
> >
> > This was working fine earlier:
> >
> > /spell/?q=(sun) AND (pubyear:[1991 TO
> >
> >
> 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear&facet.field=format&facet.field=series&facet.field=season&facet.field=imprint&facet.field=category&facet.field=award&facet.field=age&facet.field=reading&facet.field=grade&facet.field=price&spellcheck=true&debugQuery=on
> >
> > After modifying to use dismax handler with new schema the below query
> does
> > not work:
> >
> > /select/?q=(sun) AND (pubyear:[1991 TO
> >
> >
> 2011])&rows=9&facet=true&facet.limit=-1&facet.mincount=1&facet.field=author&facet.field=pubyear_facet&facet.field=format_facet&facet.field=series_facet&facet.field=season_facet&facet.field=imprint_facet&facet.field=category_facet&facet.field=award_facet&facet.field=age_facet&facet.field=reading_facet&facet.field=grade_facet&facet.field=price_facet&spellcheck=true&debugQuery=on
> >
> > 
> >  (sun) AND (pubyear:[1991 TO 2011])
> >  (sun) AND (pubyear:[1991 TO 2011])
> >  +((+DisjunctionMaxQuery((series:sun | desc:sun |
> > bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun |
> > author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun |
> > imprint:sun | subtitle:sun^3.0 | isbn13:sun))
> > +DisjunctionMaxQuery((series:"pubyear 1991" | desc:"pubyear 1991" |
> > bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991" |
> > shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
> > 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
> > isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
> > subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991"))
> > DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011
> |
> > format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 |
> > category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011
> |
> > subtitle:2011^3.0 | isbn13:2011)))~1) ()
> >  +((+(series:sun | desc:sun |
> bisacsub:sun
> > | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 |
> > category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun |
> > subtitle:sun^3.0 | isbn13:sun) +(series:"pubyear 1991" | desc:"pubyear
> > 1991"
> > | bisacsub:"pubyear 1991" | award:"pubyear 1991" | format:"pubyear 1991"
> |
> > shortdesc:"pubyear 1991" | pubyear:"pubyear 1991" | author:"pubyear
> > 1991"^2.0 | category:"pubyear 1991" | title:"pubyear 1991"^9.0 |
> > isbn10:"pubyear 1991" | season:"pubyear 1991" | imprint:"pubyear 1991" |
> > subtitle:"pubyear 1991"^3.0 | isbn13:"pubyear 1991") (series:2011 |
> > desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 |
> > pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 |
> > isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 |
> > isbn13:2011))~1) ()
> >  
> >  DisMaxQParser
> >
> > Basically we are trying to pass the query string along with a facet field
> > and the range. Is there any syntax issue? Please help this is urgent as I
> > got stuck.
> >
> > Thanks,
> > Solr user
> >
>


git repo for branch_3x + SOLR-1873 (Solr Cloud)

2010-11-22 Thread Jeremy Hinegardner
Hi all,

I've done an initial backport of SOLR-1873 (Solr Cloud) to branch_3x.  I will do
merges from branch_3x periodically.  Currently this passes all tests.  

https://github.com/collectiveintellect/lucene-solr/tree/branch_3x-cloud

We need a stable Solr Cloud system and this was our best guess on how that
should be done.  Does that sound right?

enjoy,

-jeremy

-- 

 Jeremy Hinegardner  jer...@hinegardner.org 



Using WhitespaceTokenizer but still wanting to match when all fields are concatenated

2010-11-22 Thread Eric Caron
Problem:
Indexed phrase: JetBlue Airlines
Ideal matching queries: jetblue, "jet blue" "jetblue airway", "jetblue
company"

I'd like to be able to use synonyms (to convert airway to airline),
stopwords (to drop "company"), strip periods and use ASCII folding, and
split on case.

I'm close with the following:
***







***
Except the problem that I can't do synonyms or stopwords because of the
non-tokenizing tokenizer. There's also the problem that a wildcard at the
end of the exact-match returns nothing.

Does anyone have suggestions on how this could be accomplished? The dataset
is under 100k entries and none of the docs are more than 200 characters.


Shingles and Delimiter Help

2010-11-22 Thread Jessy Kate
Hello Solr community,

I'm using Solr for an app to index documents, with shingles to index n-grams
(right now 2- 3- and 4-grams). this is solr 1.4.1 with lucene 2.9.3. i'm
having two challenges:

1. the shingles configuration is not respecting the lower limit set in the
config file:



I still see bi-grams and tri-grams in the 4-gram results, for example.
This install was assembled a few months ago-- so perhaps it was a bug
that's been fixed? (I looked then and did not find anything, but know
it was a relatively new feature).


2. the second is that for some reason the delimiters appear to be
getting indexed with my n-gram tokens (except unigrams), so that i get
a lot of search results for  x, where x is a real word in
my documents. i'm sure this is just a misunderstanding of the docs on
my part, but i just can't seem to figure out how to do this right.
Here is the configuration stanza for bigrams (it is equivalent for
tri-grams and 4-grams):








 
   




an example output for bigrams:


facet_counts: {

   - facet_queries: { }
   - -
   facet_fields: {
  - -
  bigrams: [
 - "_ _"
 - 67567
 - "_ speaker"
 - 18932
 - "speaker _"
 - 16186
 - "_ bill"
 - 14513
 - "_ house"
 - 14058
 - "bill _"
 - 13205
 - "_ time"
 - 13021
 - "time _"
 - 12239
 - "house _"
 - 10704
 - "today _"
 - 10577
  ]
   }



the "positionIncrementGap" for the copyField i use to store the main
searchable fields in, is actually set to 100, so i thought that might
be it, but i tried modifying that and it didn't solve the problem.


any help on either issue would be greatly appreciated. happy to
provide any other details. the full config file is available at:

https://github.com/sunlightlabs/Capitol-Words/blob/master/solr/schema.xml


thank you in advance!

jessy


-- 
Jessy Cowan-Sharp
http://jessykate.com


RE: Shingles and Delimiter Help

2010-11-22 Thread Steven A Rowe
Hi Jessy,

Several ShingleFilter(Factory) improvements, including the ability to specify 
minShingleSize, were introduced on the Solr/Lucene 3.x, and so are not 
available in Solr 1.4.X/Lucene 2.9.X. (This is your #1 issue.)

For details about the changes and when they were introduced: 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

The "_"-only tokens you're seeing, which are likely the result of placeholder 
tokens where stopwords used to be, is also fixed under Solr/Lucene 3.x, so that 
only shingles with at least one "real" token are output.  (This is your #2 
issue.)

Steve

> -Original Message-
> From: Jessy Kate [mailto:jessy.cowansh...@gmail.com]
> Sent: Monday, November 22, 2010 3:33 PM
> To: solr-user@lucene.apache.org
> Subject: Shingles and Delimiter Help
> 
> Hello Solr community,
> 
> I'm using Solr for an app to index documents, with shingles to index n-
> grams
> (right now 2- 3- and 4-grams). this is solr 1.4.1 with lucene 2.9.3. i'm
> having two challenges:
> 
> 1. the shingles configuration is not respecting the lower limit set in the
> config file:
> 
>  minShingleSize="3"
> maxShingleSize="3"
> outputUnigrams="false"
> />
> 
> I still see bi-grams and tri-grams in the 4-gram results, for example.
> This install was assembled a few months ago-- so perhaps it was a bug
> that's been fixed? (I looked then and did not find anything, but know
> it was a relatively new feature).
> 
> 
> 2. the second is that for some reason the delimiters appear to be
> getting indexed with my n-gram tokens (except unigrams), so that i get
> a lot of search results for  x, where x is a real word in
> my documents. i'm sure this is just a misunderstanding of the docs on
> my part, but i just can't seem to figure out how to do this right.
> Here is the configuration stanza for bigrams (it is equivalent for
> tri-grams and 4-grams):
> 
> 
>  >
> 
> 
>  generateWordParts="1"
> generateNumberParts="1" catenateWords="0"
> catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="0"/>
> 
>  ignoreCase="true"
> words="stopwords.txt"
> enablePositionIncrements="true"
> />
>   minShingleSize="2"
> maxShingleSize="2"
> outputUnigrams="false"
> />
>
> 
> 
> 
> 
> an example output for bigrams:
> 
> 
> facet_counts: {
> 
>- facet_queries: { }
>- -
>facet_fields: {
>   - -
>   bigrams: [
>  - "_ _"
>  - 67567
>  - "_ speaker"
>  - 18932
>  - "speaker _"
>  - 16186
>  - "_ bill"
>  - 14513
>  - "_ house"
>  - 14058
>  - "bill _"
>  - 13205
>  - "_ time"
>  - 13021
>  - "time _"
>  - 12239
>  - "house _"
>  - 10704
>  - "today _"
>  - 10577
>   ]
>}
> 
> 
> 
> the "positionIncrementGap" for the copyField i use to store the main
> searchable fields in, is actually set to 100, so i thought that might
> be it, but i tried modifying that and it didn't solve the problem.
> 
> 
> any help on either issue would be greatly appreciated. happy to
> provide any other details. the full config file is available at:
> 
> https://github.com/sunlightlabs/Capitol-Words/blob/master/solr/schema.xml
> 
> 
> thank you in advance!
> 
> jessy
> 
> 
> --
> Jessy Cowan-Sharp
> http://jessykate.com


What tokenizer is good for breaking host names

2010-11-22 Thread sara motahari
Hello Solr community,

I have a "host" field in my documents which keep the host from which the page 
was crawled. for example, yahoo.com, or sports.yahoo.com. I want this field to 
be searchable so if I search yahoo, I can find sports.yahoo.com. 

I have used these tokenizers and it does not work:



Now, it seems they do not break the host name at the dots and does not match 
find yahoo in sports.yahoo.com.
What tokenizer should I use so it breaks the host name at dots?

Thanks,
Sara


  

Re: Shingles and Delimiter Help

2010-11-22 Thread Jessy Kate
fantastic, thanks! i'll update the release and keep my fingers crossed. many
thanks for the speedy response.

jessy

On Mon, Nov 22, 2010 at 4:53 PM, Steven A Rowe  wrote:

> Hi Jessy,
>
> Several ShingleFilter(Factory) improvements, including the ability to
> specify minShingleSize, were introduced on the Solr/Lucene 3.x, and so are
> not available in Solr 1.4.X/Lucene 2.9.X. (This is your #1 issue.)
>
> For details about the changes and when they were introduced:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
> The "_"-only tokens you're seeing, which are likely the result of
> placeholder tokens where stopwords used to be, is also fixed under
> Solr/Lucene 3.x, so that only shingles with at least one "real" token are
> output.  (This is your #2 issue.)
>
> Steve
>
> > -Original Message-
> > From: Jessy Kate [mailto:jessy.cowansh...@gmail.com]
> > Sent: Monday, November 22, 2010 3:33 PM
> > To: solr-user@lucene.apache.org
> > Subject: Shingles and Delimiter Help
> >
> > Hello Solr community,
> >
> > I'm using Solr for an app to index documents, with shingles to index n-
> > grams
> > (right now 2- 3- and 4-grams). this is solr 1.4.1 with lucene 2.9.3. i'm
> > having two challenges:
> >
> > 1. the shingles configuration is not respecting the lower limit set in
> the
> > config file:
> >
> >  > minShingleSize="3"
> > maxShingleSize="3"
> > outputUnigrams="false"
> > />
> >
> > I still see bi-grams and tri-grams in the 4-gram results, for example.
> > This install was assembled a few months ago-- so perhaps it was a bug
> > that's been fixed? (I looked then and did not find anything, but know
> > it was a relatively new feature).
> >
> >
> > 2. the second is that for some reason the delimiters appear to be
> > getting indexed with my n-gram tokens (except unigrams), so that i get
> > a lot of search results for  x, where x is a real word in
> > my documents. i'm sure this is just a misunderstanding of the docs on
> > my part, but i just can't seem to figure out how to do this right.
> > Here is the configuration stanza for bigrams (it is equivalent for
> > tri-grams and 4-grams):
> >
> >
> >  > >
> > 
> > 
> >  > generateWordParts="1"
> > generateNumberParts="1" catenateWords="0"
> > catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="0"/>
> > 
> >  > ignoreCase="true"
> > words="stopwords.txt"
> > enablePositionIncrements="true"
> > />
> >   > minShingleSize="2"
> > maxShingleSize="2"
> > outputUnigrams="false"
> > />
> >
> > 
> >
> >
> >
> > an example output for bigrams:
> >
> >
> > facet_counts: {
> >
> >- facet_queries: { }
> >- -
> >facet_fields: {
> >   - -
> >   bigrams: [
> >  - "_ _"
> >  - 67567
> >  - "_ speaker"
> >  - 18932
> >  - "speaker _"
> >  - 16186
> >  - "_ bill"
> >  - 14513
> >  - "_ house"
> >  - 14058
> >  - "bill _"
> >  - 13205
> >  - "_ time"
> >  - 13021
> >  - "time _"
> >  - 12239
> >  - "house _"
> >  - 10704
> >  - "today _"
> >  - 10577
> >   ]
> >}
> >
> >
> >
> > the "positionIncrementGap" for the copyField i use to store the main
> > searchable fields in, is actually set to 100, so i thought that might
> > be it, but i tried modifying that and it didn't solve the problem.
> >
> >
> > any help on either issue would be greatly appreciated. happy to
> > provide any other details. the full config file is available at:
> >
> >
> https://github.com/sunlightlabs/Capitol-Words/blob/master/solr/schema.xml
> >
> >
> > thank you in advance!
> >
> > jessy
> >
> >
> > --
> > Jessy Cowan-Sharp
> > http://jessykate.com
>



-- 
Jessy Cowan-Sharp
http://jessykate.com


Re: What tokenizer is good for breaking host names

2010-11-22 Thread Ahmet Arslan
> I have a "host" field in my documents which keep the host
> from which the page 
> was crawled. for example, yahoo.com, or sports.yahoo.com. I
> want this field to 
> be searchable so if I search yahoo, I can find
> sports.yahoo.com. 
> 
> I have used these tokenizers and it does not work:
> 
> 
>  class="solr.RemoveDuplicatesTokenFilterFactory"/>
> Now, it seems they do not break the host name at the dots
> and does not match 
> find yahoo in sports.yahoo.com.
> What tokenizer should I use so it breaks the host name at
> dots?

LetterTokenizerFactory or MappingCharFilterFactory with "."=> " "


  


Re: SOLR and secure content

2010-11-22 Thread Savvas-Andreas Moysidis
maybe this older thread on Modeling Access Control might help:

http://lucene.472066.n3.nabble.com/Modelling-Access-Control-td1756817.html#a1761482

Regards,
-- Savvas

On 22 November 2010 18:53, Jos Janssen  wrote:

>
> Hi,
>
> We plan to make an application layer in PHP which will communicate to the
> solr server.
>
> Direct calls will only be made for administration purposes only.
>
> regards,
>
> jos
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SOLR-and-secure-content-tp1945028p1947970.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Question on replication

2010-11-22 Thread Mark
After I perform a delta-import on my master the slave replicates the 
whole index which can be quite time consuming. Is there any way for the 
slave to replicate only partials that have changed? Do I need to change 
some setting on master not to commit/optimize to get this to work?


Thanks


Re:Re:Re: SnapPuller error : Unable to move index file

2010-11-22 Thread kafka0102
Does anyone care about the bug?




At 2010-11-22 22:28:39,kafka0102  wrote:

>sorry for my unclear question.
>My solr's version is 1.4.1,and I  maybe hit a solr's bug.
>In my case,my slave's using index's directory is index.20101122031000.It was 
>generated at 2010-11-22 03:10:00 because of some reasons(It's not important). 
>And at 2010-11-22 15:10:00,the slave got a replication.I find the function in 
>SnapPuller:
>  private File createTempindexDir(final SolrCore core) {
>final String tmpIdxDirName = "index." + new 
> SimpleDateFormat(SnapShooter.DATE_FMT).format(new Date());
>final File tmpIdxDir = new File(core.getDataDir(), tmpIdxDirName);
>tmpIdxDir.mkdirs();
>return tmpIdxDir;
>  }
>and SnapShooter.DATE_FMT = "MMddhhmmss"
>So in this replication, the tmpIndexDir and indexDir both are 
>"20101122031000".In the end of the replication,delTree(tmpIndexDir) will 
>delete the index dir.
>
>So SnapShooter.DATE_FMT = "MMddHHmmss" should be fine.
>
>
>
>
>
>
>At 2010-11-22 21:13:41,"Erick Erickson"  wrote:
>
>>what op system are you on? what version of Solr? what filesystem?
>>
>>It's really hard to help without more information, you might want to review:
>>http://wiki.apache.org/solr/UsingMailingLists
>>
>>Best
>>Erick
>>
>>2010/11/22 kafka0102 
>>
>>> my replication got errors like :
>>> Unable to move index file from:
>>> /home/data/tuba/search-index/eshequn.post.db_post/index.20101122034500/_21.frq
>>> to:
>>> /home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000/_21.frq
>>>
>>> I looked at log and found the last slave replication commit before the
>>> error is :
>>> [2010-11-22
>>> 15:10:18][INFO][pool-6-thread-1][SolrDeletionPolicy.java(114)]SolrDeletionPolicy.onInit:
>>> commits:num=4
>>>
>>>  
>>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_3,version=1290358965331,generation=3,filenames=[_21.fdt,
>>> _21.frq, _21.prx, _21.tii, _21.nrm, _21.fdx, _21.tis, segments_3, _21.fnm]
>>>
>>>  
>>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_kq,version=1290358966074,generation=746,filenames=[_21.frq,
>>> _21.prx, _q8.frq, _21.tii, _q8.prx, _q8.tii, _q8.fdt, _21.nrm, _q8.fnm,
>>> _21.tis, _21.fdt, _q8.nrm, _q8.fdx, segments_kq, _q8.tis, _21.fdx,
>>> _21_1r.del, _21.fnm]
>>>
>>>  
>>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_ky,version=1290358966082,generation=754,filenames=[_21.frq,
>>> _qg.fnm, _qe.tis, _21.tii, _qe.nrm, _qg.nrm, _qg.fdt, _21_1u.del, _qd.tii,
>>> _qd.nrm, _qg.tii, _21.tis, _21.fdt, _qe.fdx, _qe.prx, _qf.tii, _21.fdx,
>>> _qf.nrm, segments_ky, _qf.fdt, _qe.fdt, _qd.fdt, _qf.tis, _21.prx,
>>> _qd_2.del, _qd.fnm, _qd.fdx, _qf.fdx, _qe.frq, _qd.prx, _21.nrm, _qd.frq,
>>> _qg.prx, _qg.tis, _qf.frq, _qd.tis, _qf.prx, _qe.tii, _qf.fnm, _qg.fdx,
>>> _qe.fnm, _qg.frq, _21.fnm]
>>>
>>>  
>>> commit{dir=/home/data/tuba/search-index/eshequn.post.db_post/index.20101122031000,segFN=segments_l3,version=1290358966087,generation=759,filenames=[_21.frq,
>>> _21.prx, _21.tii, _qn.fnm, _qn.fdt, _21_1u.del, _qn.fdx, _21.nrm, _qn.nrm,
>>> _qn.frq, _21.tis, _qn.prx, _21.fdt, segments_l3, _qn.tis, _qn.tii, _21.fdx,
>>> _21.fnm]
>>>
>>> When the error happened, the dir index.20101122031000 had been deleted.
>>> Does the SolrDeletionPolicy delete the index dir not only files? The problem
>>> happend some times.Does anyone know the reason?
>>>
>>>


sorl response xsd

2010-11-22 Thread Tri Nguyen
Hi,
 
I'm trying to look for the solr response xsd.
 
Is this it here?
 
https://issues.apache.org/jira/browse/SOLR-17
 
I'd basically want to know if the data import passed or failed.  I can get the 
xml string and search for "completed", but would wondering if I can use and xsd 
to parse the response.
 
Or is there another way?
 
Here's the response I have and I don't see in the xsd the lst element for 
statusMessages.
 
xml version="1.0" encoding="UTF-8" ?> 

- 


+ 


  0 

  15 
  

+ 


- 


  data-config.xml 
  
  

  full-import 

  idle 

   

- 


  0 

  0 

  0 

  2010-11-22 17:20:42 

  Indexing completed. Added/Updated: 0 documents. Deleted 0 
documents. 

  2010-11-22 17:20:43 

  2010-11-22 17:20:43 

  0 

  0:0:0.375 
  

  This response format is experimental. It is likely to 
change in the future. 
  
 
Thanks,
 
Tri

Re: Question on replication

2010-11-22 Thread Shawn Heisey

On 11/22/2010 5:45 PM, Mark wrote:
After I perform a delta-import on my master the slave replicates the 
whole index which can be quite time consuming. Is there any way for 
the slave to replicate only partials that have changed? Do I need to 
change some setting on master not to commit/optimize to get this to work?


Anytime you optimize the index, whether it's done separately or as part 
of an import, the slave will have to copy the entire index, because the 
entire index will have changed.


If you include &optimize=false on your delta-import URL, it should do 
exactly as you are expecting.  If you are doing a lot of delta-imports, 
you'll eventually start auto-merging segments according to the value 
specified in mergeFactor, which will still be faster than a full optimize.


You normally don't have to do optimizes at all unless you are deleting 
documents, or updating documents in place, which deletes the old one 
before inserting the new one.  If you are not relying on relevancy sort, 
you don't even need to do it then, unless the index size begins to get 
out of control.


Shawn