Re: noobie question: sorting

2011-03-15 Thread David Smiley (@MITRE.org)
Hi.  Where did you find such an obtuse example?

Recently, Solr supports sorting by function query.  One such function is
named "query" which takes a query and uses the score of the result of that
query as the function's result.  Due to constraints of where this query is
placed within a function query, it is necessary to use the local-params
syntax (e.g. {!v=...}) since you can't simply state "category:445".  Or,
there could have been a parameter dereference like $sortQ where sortQ is
another parameter holding category:445.  Any way, the net effect is that
documents are score-sorted based on the query category:445 instead of the
user-query ("q" param). I'd expect category:445 docs to come up top and all
others to appear randomly afterwards.  It would be nice if the sort query
could simply be "category:445 desc" but that's not supported.

Complicated?  You bet!  But fear not; this is about as complicated as it
gets.

References:
http://wiki.apache.org/solr/SolrQuerySyntax
http://wiki.apache.org/solr/CommonQueryParameters#sort
http://wiki.apache.org/solr/FunctionQuery#query

~ David Smiley
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/noobie-question-sorting-tp2685250p2685617.html
Sent from the Solr - User mailing list archive at Nabble.com.


Stemming question

2011-03-15 Thread Bill Bell
When I use the Porter Stemmer in Solr, it appears to take works that are
stemmed and replace it with the root work in the index.
I verified this by looking at analysis.jsp.

Is there an option to expand the stemmer to include all combinations of the
word? Like include 's, ly, etc?

Other options besides protection?

Bill





Re: Tokenizing Chinese & multi-language search

2011-03-15 Thread Andy
Hi Otis,

It doesn't look like the last 2 options would work for me. So I guess my best 
bet is to ask the user to specify the language when they type in the query.

Once I get that information from the user, how do I dynamically pick an 
analyzer for the query string?

Thanks

Andy

--- On Tue, 3/15/11, Otis Gospodnetic  wrote:

> From: Otis Gospodnetic 
> Subject: Re: Tokenizing Chinese & multi-language search
> To: solr-user@lucene.apache.org
> Date: Tuesday, March 15, 2011, 11:51 PM
> Hi Andy,
> 
> Is the "I don't know what language the query is in"
> something you could change 
> by...
> - asking the user
> - deriving from HTTP request headers
> - identifying the query language (if queries are long
> enough and "texty")
> - ...
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
> > From: Andy 
> > To: solr-user@lucene.apache.org
> > Sent: Tue, March 15, 2011 9:07:36 PM
> > Subject: Tokenizing Chinese & multi-language
> search
> > 
> > Hi,
> > 
> > I remember reading in this list a while ago that Solr
> will only  tokenize on 
> >whitespace even when using CJKAnalyzer. That would make
> Solr  unusable on 
> >Chinese or any other languages that don't use
> whitespace as  separator.
> > 
> > 1) I remember reading about a workaround.
> Unfortunately I  can't find the post 
> >that mentioned it. Could someone give me pointers on
> how to  address this issue?
> > 
> > 2) Let's say I have fixed this issue and have 
> properly analyzed and indexed 
> >the Chinese documents. My documents are in 
> multiple languages. I plan to use 
> >separate fields for documents in different 
> languages: text_en, text_zh, 
> >text_ja, text_fr, etc. Each field will be 
> associated with the appropriate 
> >analyzer. 
> >
> > My problem now is how to deal with  the query
> string. I don't know what 
> >language the query is in, so I won't be able  to
> select the appropriate analyzer 
> >for the query string. If I just use the  standard
> analyzer on the query string, 
> >any query that's in Chinese won't be  tokenized
> correctly. So would the whole 
> >system still work in this  case?
> > 
> > This must be a pretty common use case, handling
> multi-language  search. What is 
> >the recommended way of dealing with this 
> problem?
> > 
> > Thanks.
> > Andy
> > 
> > 
> >       
> > 
> 





Re: Problem with field collapsing of patched Solr 1.4

2011-03-15 Thread Otis Gospodnetic
Kai, try SOLR-1086 with Solr trunk instead if trunk is OK for you.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Kai Schlamp 
> To: solr-user@lucene.apache.org
> Sent: Sun, March 13, 2011 11:58:56 PM
> Subject: Problem with field collapsing of patched Solr 1.4
> 
> Hello.
> 
> I just tried to patch Solr 1.4 with the field collapsing patch  of
> https://issues.apache.org/jira/browse/SOLR-236. The patching and  build
> process seemed to be ok (below are the steps I did), but the  field
> collapsing feature doesn't seem to work.
> When I go to `http://localhost:8982/solr/select/?q=*:*` I correctly
> get 10 documents  as result.
> When going to 
>`http://localhost:8982/solr/select/?q=*:*&collapse=true&collapse.field=tag_name_ss&collapse.max=1`
>
> (tag_name_ss  is surely a field with content) I get the same 10 docs as
> result back. No  further information regarding the field collapsing.
> What do I miss? Do I have  to activate it somehow?
> 
> * Downloaded 
>[Solr](http://apache.lauf-forum.at//lucene/solr/1.4.1/apache-solr-1.4.1.tgz)
> *  Downloaded 
>[SOLR-236-1_4_1-paging-totals-working.patch](https://issues.apache.org/jira/secure/attachment/12459716/SOLR-236-1_4_1-paging-totals-working.patch)
>
> *  Changed line 2837 of that patch to `@@ -0,0 +1,511 @@` (regarding
> this  
>[comment](https://issues.apache.org/jira/browse/SOLR-236?focusedCommentId=12932905&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12932905))
>
> *  Downloaded 
>[SOLR-236-1_4_1-NPEfix.patch](https://issues.apache.org/jira/secure/attachment/12470202/SOLR-236-1_4_1-NPEfix.patch)
>
> *  Extracted the Solr archive
> * Applied both patches:
> ** `cd  apache-solr-1.4.1`
> ** `patch -p0 <  ../SOLR-236-1_4_1-paging-totals-working.patch`
> ** `patch -p0 <  ../SOLR-236-1_4_1-NPEfix.patch`
> * Build Solr
> ** `ant clean`
> ** `ant  example` ... tells me "BUILD SUCCESSFUL"
> * Reindexed everything (using  Sunspot Solr)
> * Solr info tells me correctly "Solr Specification  Version:
> 1.4.1.2011.03.14.04.29.20"
> 
> Kai
> 


Re: Different options for autocomplete/autosuggestion

2011-03-15 Thread Otis Gospodnetic
Hi,

I actually don't follow how field collapsing helps with autocompletion...?

Over at http://search-lucene.com we eat our own autocomplete dog food: 
http://sematext.com/products/autocomplete/index.html .  Tasty stuff.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Kai Schlamp 
> To: solr-user@lucene.apache.org
> Sent: Mon, March 14, 2011 11:52:48 PM
> Subject: Re: Different options for autocomplete/autosuggestion
> 
> @Robert: That sounds interesting and very flexible, but also like a
> lot of  work. This approach also doesn't seem to allow querying Solr
> directly by  using Ajax ... one of the big benefits in my opinion when
> using  Solr.
> @Bill: There are some things I don't like about the  Suggester
> component. It doesn't seem to allow infix searches (at least it is  not
> mentioned in the Wiki or elsewhere). It also uses a separate  index
> that has to be rebuild independently of the main index. And it  doesn't
> support any filter queries.
> 
> The Lucid Imagination blog also  describes a further autosuggest
> approach 
>(http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/).
>
> The  disadvantage here is that the source documents must have distinct
> fields  (resp. the dih selects must provide distinct data). Otherwise
> duplications  would come up in the Solr query result, cause of the
> document nature of  Solr.
> 
> In my opinion field collapsing seems to be most promising for a  full
> featured autosuggestion solution. Unfortunately it is not  available
> for Solr 1.4.x or 3.x (I tried patching those branches several  times
> without success).
> 
> 2011/3/15 Bill Bell :
> > http://lucidworks.lucidimagination.com/display/LWEUG/Spell+Checking+and+Aut
> >  omatic+Completion+of+User+Queries
> >
> > For Auto-Complete, find the  following section in the solrconfig.xml file
> > for the collection:
> >   
> >  
> >
> >  autocomplete
> >   >  name="classname">org.apache.solr.spelling.suggest.Suggester
> >>  name="lookupImpl">org.apache.solr.spelling.suggest.jaspell.JaspellLookup >  tr>
> >  autocomplete
> >   true
> > 
> >
> >
> >
> >
> >
> > On  3/14/11 8:16 PM, "Andy"   wrote:
> >
> >>Can you provide more details? Or a  link?
> >>
> >>--- On Mon, 3/14/11, Bill Bell   wrote:
> >>
> >>> See how Lucid Enterprise does it...  A
> >>> bit differently.
> >>>
> >>> On 3/14/11  12:14 AM, "Kai Schlamp" 
> >>>  wrote:
> >>>
> >>> >Hi.
> >>>  >
> >>> >There seems to be several options for implementing  an
> >>> >autocomplete/autosuggestions feature with Solr. I  am
> >>> trying to
> >>> >summarize those possibilities  together with their
> >>> advantages and
> >>>  >disadvantages. It would be really nice to read some of
> >>> your  opinions.
> >>> >
> >>> >* Using N-Gram filter + text  field query
> >>> >+ available in stable 1.4.x
> >>>  >+ results can be boosted
> >>> >+ sorted by best  matches
> >>> >- may return duplicate results
> >>>  >
> >>> >* Facets
> >>> >+ available in stable  1.4.x
> >>> >+ no duplicate entries
> >>> >- sorted by  count
> >>> >- may need an extra N-Gram field for infix  queries
> >>> >
> >>> >* Terms
> >>> >+  available in stable 1.4.x
> >>> >+ infix query by using regex in  3.x
> >>> >- only prefix query in 1.4.x
> >>> >-  regexp may be slow (just a guess)
> >>> >
> >>> >*  Suggestions
> >>> >? Did not try that yet. Does it allow infix  queries?
> >>> >
> >>> >* Field  Collapsing
> >>> >+ no duplications
> >>> >- only  available in 4.x branch
> >>> >? Does it work together with  highlighting? That would
> >>> be a big plus.
> >>>  >
> >>> >What are your experiences regarding
> >>>  autocomplete/autosuggestion with
> >>> >Solr? Any additions,  suggestions or corrections? What
> >>> do you prefer?
> >>>  >
> >>>  >Kai
> >>>
> >>>
> >>>
> >>
> >>
> >>
> >
> >
> >
> 
> 
> 
> -- 
> Dr. med. Kai Schlamp
> Am Fort Elisabeth 17
> 55131  Mainz
> Germany
> Phone +49-177-7402778
> Email: schl...@gmx.de
> 


SOLR DIH importing MySQL "text" column as a BLOB

2011-03-15 Thread Kaushik Chakraborty
I've a column for posts in MySQL of type `text`, I've tried corresponding
`field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But
whenever I'm importing it using the DIH, it's getting imported as a BLOB
object. I checked, this thing is happening only for columns of type `text`
and not for `varchar`(they are getting indexed as string). Hence, the posts
field is not becoming searchable.

I found about this issue, after repeated search failures, when I did a `*:*`
query search on Solr. A sample response:



1.0
[B@10a33ce2
2011-02-21T07:02:55Z
test.acco...@gmail.com
Test
Account
[B@2c93c4f1
1


The `data-config.xml` :


 
 
 
 
 
 
 
 
 
   
  

The `schema.xml` :



 
 
 
 
 
 

solr_post_status_message_id
solr_post_message


Thanks,
Kaushik


Re: Sorting on multiValued fields via function query

2011-03-15 Thread David Smiley (@MITRE.org)
Hi Harish. 
Did sorting on multiValued fields actually work correctly for you before?
I'd be surprised if so.  I could be wrong but I think you previously always
got the sorting affects of whatever was the last indexed value. It is indeed
the case that the FieldCache only supports up to one indexed value per
field. Recently Hoss added sanity checks that you are seeing the results of: 
https://issues.apache.org/jira/browse/SOLR-2339   You might want to comment
on that issue with proof (e.g. a simple test) that it worked before but not
now.

~ David

-
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p2685485.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Tokenizing Chinese & multi-language search

2011-03-15 Thread Otis Gospodnetic
Hi Andy,

Is the "I don't know what language the query is in" something you could change 
by...
- asking the user
- deriving from HTTP request headers
- identifying the query language (if queries are long enough and "texty")
- ...

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Andy 
> To: solr-user@lucene.apache.org
> Sent: Tue, March 15, 2011 9:07:36 PM
> Subject: Tokenizing Chinese & multi-language search
> 
> Hi,
> 
> I remember reading in this list a while ago that Solr will only  tokenize on 
>whitespace even when using CJKAnalyzer. That would make Solr  unusable on 
>Chinese or any other languages that don't use whitespace as  separator.
> 
> 1) I remember reading about a workaround. Unfortunately I  can't find the 
> post 
>that mentioned it. Could someone give me pointers on how to  address this 
>issue?
> 
> 2) Let's say I have fixed this issue and have  properly analyzed and indexed 
>the Chinese documents. My documents are in  multiple languages. I plan to use 
>separate fields for documents in different  languages: text_en, text_zh, 
>text_ja, text_fr, etc. Each field will be  associated with the appropriate 
>analyzer. 
>
> My problem now is how to deal with  the query string. I don't know what 
>language the query is in, so I won't be able  to select the appropriate 
>analyzer 
>for the query string. If I just use the  standard analyzer on the query 
>string, 
>any query that's in Chinese won't be  tokenized correctly. So would the whole 
>system still work in this  case?
> 
> This must be a pretty common use case, handling multi-language  search. What 
> is 
>the recommended way of dealing with this  problem?
> 
> Thanks.
> Andy
> 
> 
>   
> 


noobie question: sorting

2011-03-15 Thread James Lin
Hi Guys,

came across this sorting query

query ({!v="category: 445"}) desc

I understand it is sorting on exact match of category = 445, I don't quite
understand the syntax, could someone please elaborate a bit for me? So I can
reuse this syntax in the future.

Regards

James


Trunk Compile failure/ hang

2011-03-15 Thread Viswa S

Hello,
I am trying to build source out of trunk (to apply a patch) and ran into an 
issue were the build process hangs ( below output) during build lucene at 
sanity-load-lib. 
Just when build sanity-load-lib starts, I see an dialog box asking for applet 
access permission "The applet is attempting to invoke the 
java/lang/System.loadLibrary() operation on db_java-4.7" . I click on "Allow" 
and the build process never resumes (waited for more than an hour)
Any thoughts would be very helpful.
-Viswa
...build-lucene:
contrib-build.init:
get-db-jar:
check-and-get-db-jar:
init:
clover.setup:
clover.info: [echo]  [echo]   Clover not found. Code coverage 
reports disabled. [echo]  
clover:
compile-core:
compile-test-framework:
common.compile-test:
sanity-load-lib:
  

Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish

2011-03-15 Thread cyang2010
I just tried with some real spanish text:

"Alquileres"  -->

=
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1
term text   alquileres
term type   word
source start,end4,14
payload 

org.apache.solr.analysis.SnowballPorterFilterFactory {language=Spanish}
term position   1
term text   alquiler
term type   word
source start,end4,14
payload 



Looks like that spanish stemmer is working.   Thanks,

cyang

--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684814.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tokenizing Chinese & multi-language search

2011-03-15 Thread Andy
Hi,

I remember reading in this list a while ago that Solr will only tokenize on 
whitespace even when using CJKAnalyzer. That would make Solr unusable on 
Chinese or any other languages that don't use whitespace as separator.

1) I remember reading about a workaround. Unfortunately I can't find the post 
that mentioned it. Could someone give me pointers on how to address this issue?

2) Let's say I have fixed this issue and have properly analyzed and indexed the 
Chinese documents. My documents are in multiple languages. I plan to use 
separate fields for documents in different languages: text_en, text_zh, 
text_ja, text_fr, etc. Each field will be associated with the appropriate 
analyzer. 
My problem now is how to deal with the query string. I don't know what language 
the query is in, so I won't be able to select the appropriate analyzer for the 
query string. If I just use the standard analyzer on the query string, any 
query that's in Chinese won't be tokenized correctly. So would the whole system 
still work in this case?

This must be a pretty common use case, handling multi-language search. What is 
the recommended way of dealing with this problem?

Thanks.
Andy


  


Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish

2011-03-15 Thread cyang2010
Sorry Robert.  I just use some text translated by someone.  Maybe that
translation is not right.

Could you please give me a spanish term which i can show the spanish
stemming factory is working?

Thanks,


cyang

--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish

2011-03-15 Thread Robert Muir
On Tue, Mar 15, 2011 at 8:50 PM, cyang2010  wrote:
> Robert,
>
> Thanks for your advice.  I modified my stopword text file.  Now the
> stopwordFilter start to work.
>
> But the stemming related filter (SnowballPorterFilterFactory-- spanish)
> still not working.  Anyone have any idea on that?
>

how is it not working? how is "cöcktäils" a spanish word!


Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish

2011-03-15 Thread cyang2010
Robert,

Thanks for your advice.  I modified my stopword text file.  Now the
stopwordFilter start to work.

But the stemming related filter (SnowballPorterFilterFactory-- spanish)
still not working.  Anyone have any idea on that?

Thanks,


cyang

--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684713.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Sorting 0 values last

2011-03-15 Thread Chris Hostetter

: Not sure how you are indexing, but in addition to the above
: suggestion by Yonik, one could ignore 0's at indexing time,
: i.e., ensure that 0 values for that field are not indexed, and
: use sortMissingLast.

Once upon a time i had a usecase where i was indexing product data, and in 
thta product data a null price ment not currently for sale, but "0" was a 
legal price for products that were being given away free.  i had a similar 
requirement that the default sort should be based on price, but "free" 
products should come last ... except if the user explicitly said "sort by 
price".

what i did was to index a "hasPrice" field that was true if the price 
field existed and was non-zero.  my default sort was "hasPrice desc, price 
asc" but if the user clicked "sort by price" it was "price asc"

that might give you ideas about your own usecase.

-Hoss


Re: background merge hit exception

2011-03-15 Thread Erick Erickson
The stack trace indicates there's an issue with optimizing. How are
you kicking off
optimizing? Is it automatic or manual?

34G for an index with that many documents does seem excessive. Are you
re-indexing the same
content with the same unique key? In other words do you have a large
number of deleted
documents in your index? You can tell by looking at your admin page,
"schema browser"
link and see if MaxDoc and NumDocs are wildly different.

Is there any chance that more than one process is writing to your index?

Best
Erick

On Tue, Mar 15, 2011 at 8:02 AM, Isha Garg  wrote:
> On Tuesday 15 March 2011 01:31 PM, Anurag wrote:
>>
>> Do you mean that earlier it was doing indexing well then all of sudden you
>> started getting this exception?
>>
>> -
>> Kumar Anurag
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/background-merge-hit-exception-tp2680625p2680979.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
> ya earlier it was fine but i think as index size grows the problem start
> occuring.
> also I  have a doubt related to index size . Isnt it is too large as
> compared to no. of documents?
>


Re: Getting Category ID (primary key)

2011-03-15 Thread Chris Hostetter

: If it works, it's performant and not too messy it's a good way :-) . You can
: also consider just faceting on Id, and use the id to fetch the categoryname
: through sql / nosql.
: That way your logic is seperated from your presentation, which makes
: extending (think internationalizing, etc.) easier. Not sure if that's
: appropriate for your 'category' field but anyway.

That's the advice i generally give *if* you already have access to the 
system of record for your categorization in your application (if not, the 
tradeoff of the extra data lookup may not be worth the cost in 
flexibility)

>From the notes in my "Many Facets" talk (slide #24: "Pretty" 
facet.field Terms)...

>> If you can use unique identifiers (instead of pretty, long, string 
>> labels) it can reduce the memory and simplify the request parsing (see 
>> next Tip) but it adds work to your front end application -- keeping 
>> track of the mapping between id => pretty label. If your application 
>> already needs to knows about these mappsings for other purposes, then 
>> it's much simpler to take advantage of that.

http://people.apache.org/~hossman/apachecon2010/facets/



-Hoss


Re: stopFilterFactor and SnowballPorterFilterFactory not work for Spanish

2011-03-15 Thread Robert Muir
On Tue, Mar 15, 2011 at 7:07 PM, cyang2010  wrote:

> I just copied the text from this URL to form my stopwords_es.txt:
>
> http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt
>

you cannot use the files from analysis/snowball/* directly with Solr.
These files are not in a format that Solr understands... so you will
have to modify them (e.g. the comment character for solr is # not |,
and there can only be one word per line)


Re: Dismax: field not returned unless in sort clause?

2011-03-15 Thread Chris Hostetter

: We have a "D" field (string, indexed, stored, not required) that is returned
: * when we search with the standard request handler
: * when we search with dismax request handler _and the field is specified in
: the sort parameter_
: 
: but is not returned when using the dismax handler and the field is not
: specified in the sort param.

are you using one of the "sortMissing" options on D or it's fieldType?

I'm guessing you have sortMissingLast="true" for D, so anytime you sort on 
it the docs that do have a value appear first.  but when you don't sort on 
it, other factors probably lead docs that don't have a value for the D 
field to appear first -- solr doesn't include fields in docs that don't 
have any value for that field.

if my guess is correct, adding "fq=D:[* TO *] to any of your queries will 
cause the total number of results to shrink, but the first page of results 
for your requests that don't sort on D will look exactly the same.

the LUkeRequestHandler will help you see how many docs in your index don't 
have any values indexed in the "D" field.


-Hoss


stopFilterFactor and SnowballPorterFilterFactory not work for Spanish

2011-03-15 Thread cyang2010
I am using solr 1.4.1.   I am trying to index a spanish field using the
following tokenizer/filters:


  






Using field analysis solr Admin i can tell StopFilterFactory and
SnowballPorterFilterFactory with Spanish not working right:

1. after stopFilter, "la" should be gone, but it is not.
2. after snowballporterFilterFactory(language=Spanish), "cöcktäils" should
become "cöcktäil".  But i still see the token "cöcktäils" coming out.

I configured a spanish stopword list for the StopFilterFactory.

Field name: title_name
field value:  la Cöcktäils


Index Analyzer
=
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   1   2
term text   la  Cöcktäils
term type   wordword
source start,end0,2 3,12
payload 

=
org.apache.solr.analysis.StopFilterFactory {words=stopwords_es.txt,
ignoreCase=true}
term position   1   2
term text   la  Cöcktäils
term type   wordword
source start,end0,2 3,12
payload 
==  
org.apache.solr.analysis.LowerCaseFilterFactory {}
term position   1   2
term text   la  cöcktäils
term type   wordword
source start,end0,2 3,12
payload
=== 

org.apache.solr.analysis.SnowballPorterFilterFactory {language=Spanish}
term position   1   2
term text   la  cöcktäils
term type   wordword
source start,end0,2 3,12
payload 
===
org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory {}
term position   1   2
term text   la  cöcktäils
term type   wordword
source start,end0,2 3,12
payload 
==



I just copied the text from this URL to form my stopwords_es.txt:

http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/resources/org/apache/lucene/analysis/snowball/spanish_stop.txt



Look forward to your help...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/stopFilterFactor-and-SnowballPorterFilterFactory-not-work-for-Spanish-tp2684322p2684322.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: accessing the analyzers in a component?

2011-03-15 Thread Paul Libbrecht
Thanks Ahmet, I indicated that in the wiki at
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

My solution was a little bit different since I wanted to get the analyzer per 
field name:

rb.getSchema().getField("name").getFieldType().getAnalyzer()

thanks again!

paul

Le 15 mars 2011 à 02:44, Ahmet Arslan a écrit :

>> Within my custom query-component, I wish to obtain an
>> instance of the analyzer for a given named field.
>> Is a schema object I can access?
> 
> 
> public void process(ResponseBuilder rb) throws IOException {  
> 
>   Map map =   rb.req.getSchema().getFieldTypes();
> 
>   Analyzer analyzer =  map.get("myFieldName").getAnalyzer();
>   Analyzer queryAnalyzer = map.get("myFieldName").getQueryAnalyzer();
> 
> 
> 



Re: Faceting help

2011-03-15 Thread Upayavira
I'm not sure if I get what you are trying to achieve. What do you mean
by "constraint"?

Are you saying that you effectively want to filter the facets that are
returned?

e.g. for source field, you want to show html/pdf/email, but not, say xls
or doc?

Upayavira

On Tue, 15 Mar 2011 15:38 +, "McGibbney, Lewis John"
 wrote:
> Hello list,
> 
> I'm trying to use facet's via widget's within Ajax-Solr. I have tried the
> wiki for general help on configuring facets and constraints and also
> attended the recent Lucidworks webinar on faceted search. Can anyone
> please direct me to some reading on how to formally configure facets for
> searching.
> 
> Currently my facets are configured as follows
> 
>   'facet.field': [ 'topics', 'organisations', 'exchanges',
>   'countryCodes' ],
>   'facet.limit': 20,
>   'facet.mincount': 1,
>   'f.topics.facet.limit': 50,
>   'f.countryCodes.facet.limit': -1,
>   'facet.date': 'date',
>   'facet.date.start': '1987-02-26T00:00:00.000Z/DAY',
>   'facet.date.end': '1987-10-20T00:00:00.000Z/DAY+1DAY',
>   'facet.date.gap': '+1DAY',
>   'json.nl': 'map'
> 
> However I wish to change the fields to contain some constraints such as
> 
> Topics < field
>   Legislation < constraint
>   Guidance/Policies < constraint
>   Customer Service information/complaints procedure < constraint
>   financial information < constraint
>   etc etc
> 
> Source < field
>   html < constraint < constraint
>   pdf < constraint
>   email < constraint
>   etc etc
> 
> Date < field
>< constraint
> 
> Basically I need resources to understand how to implement the above
> instead of the example I currently have.
> Some guidance would be great
> Thank you kindly
> 
> Lewis
> 
> Glasgow Caledonian University is a registered Scottish charity, number
> SC021474
> 
> Winner: Times Higher Education’s Widening Participation Initiative of the
> Year 2009 and Herald Society’s Education Initiative of the Year 2009.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
> 
> Winner: Times Higher Education’s Outstanding Support for Early Career
> Researchers of the Year 2010, GCU as a lead with Universities Scotland
> partners.
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: problem using dataimporthandler

2011-03-15 Thread Peter Sturge
Could possibly be your original xml file was in unicode (with a BOM
header - FFFE or FEFF) - xml will see it as content if the underlying
file system doesn't handle it.


On Tue, Mar 15, 2011 at 10:00 PM, sivaram  wrote:
> I got rid of the problem by just copying the other schema and config files(
> which sound like nothing to do with the error on the dataconfig file but I
> gave it a try) and it worked I don't know if I'm missing something here
> but its working now.
>
> Thanks,
> Ram.
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/problem-using-dataimporthandler-tp495415p2684044.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: problem using dataimporthandler

2011-03-15 Thread sivaram
I got rid of the problem by just copying the other schema and config files(
which sound like nothing to do with the error on the dataconfig file but I
gave it a try) and it worked I don't know if I'm missing something here
but its working now. 

Thanks,
Ram.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-using-dataimporthandler-tp495415p2684044.html
Sent from the Solr - User mailing list archive at Nabble.com.


partial optimize does not reduce the segment number to maxNumSegments

2011-03-15 Thread Renee Sun
I have a core with 120+ segment files and I tried partial optimize specify
maxNumSegments=10, after the optimize the segment files reduced to 64 files;

I did the same optimize again, it reduced to 30 something;

this keeps going and eventually it drops to teen number.

I was expecting seeing the optimize results in exactly 10 segment files or
somewhere near, and why do I have to manually repeat the optimize to reach
that number?

thanks
Renee 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/partial-optimize-does-not-reduce-the-segment-number-to-maxNumSegments-tp2682195p2682195.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Query

2011-03-15 Thread Ahmet Arslan
> I am running below query in query browser admin interface
> 
> +RetailPriceCodeID:1 +MSRP:[16001.00 TO 32000.00]
> 
> I think it should return only results with RetailPriceCode
> = 1 ad MSRP
> between 16001 and 32000.
> 
> But it returns all resuts with MSRP = 1 and doesnt consider
> 2nd query at
> all.
> 
> Am i doing something wrong here? Please help

You query is perfectly fine.  However range queries require trie based or 
sortable based types. tfloat tdouble etc.


  


Re: Dynamically boost search scores

2011-03-15 Thread Ahmet Arslan
> the page you recommended and came up
> with:
> 
> http://localhost:8983/solr/search/?q=dog&fl=boost_score,genus,species,score&rows=15&bf=%22ord%28sum%28boost_score,1%29%29
> ^10%22
> 
> But appeared to have no effect. The results were in the
> same order as they
> were when I left off the bf parameter. So what am I doing
> incorrectly?

bf belongs to DisMaxParams. 

Instead try this  q={!boost b=boost_score}dog

You can use any valid FuctionQuery as a b paramter.

http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html

You may want to use parameter referencing also. Where q is harcoded in 
solrconfig.xml, something like:

q={!boost b=$bb v=$qq}&qq=dog&bb=%22ord%28sum%28boost_score,1%29%29

http://wiki.apache.org/solr/LocalParams



  


Re: problem using dataimporthandler

2011-03-15 Thread sivaram
Regarding the DIH problem: I'm encountering "content not allowed in prolog"
only when I'm deploying solr on tomcat. I'm using the same data-config.xml
in the solr example through jetty and it works fine and I can index the
data. Please let me know what should be changed while using tomcat. 

Thanks,
Ram.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/problem-using-dataimporthandler-tp495415p2683596.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: keeping data consistent between Database and Solr

2011-03-15 Thread onlinespend...@gmail.com
That's pretty interesting to use the autoincrementing document ID as a way
to keep track of what has not been indexed in Solr.  And you overwrite this
document ID even when you modify an existing document.  Very cool.  I
suppose the number can even rotate back to 0, as long as you handle that.

I am thinking of using a timestamp to achieve a similar thing. All documents
that have been accessed after the last Solr index need to be added to the
Solr index.  In fact, each name-value pair in Cassandra has a timestamp
associated with it, so I'm curious if I could simply use this.

I'm curious how you handle the delta-imports. Do you have some routine that
periodically checks for updates to your MySQL database via the document ID?
Which language do you use for that?

Thanks,
Ben

On Tue, Mar 15, 2011 at 9:12 AM, Shawn Heisey  wrote:

> On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote:
>
>> But my main question is, how do I guarantee that data between my Cassandra
>> database and Solr index are consistent and up-to-date?
>>
>
> Our MySQL database has two unique indexes.  One is a document ID,
> implemented in MySQL as an autoincrement integer and in Solr as a long.  The
> other is what we call a tag id, implemented in MySQL as a varchar and Solr
> as a single lowercased token and serving as Solr's uniqueKey.  We have an
> update trigger on the database that updates the document ID whenever the
> database document is updated.
>
> We have a homegrown build system for Solr.  In a nutshell, it keeps track
> of the newest document ID in the Solr Index.  If the DIH delta-import fails,
> it doesn't update the stored ID, which means that on the next run, it will
> try and index those documents again.  Changes to the entries in the database
> are automatically picked up because the document ID is newer, but the tag id
> doesn't change, so the document in Solr is overwritten.
>
> Things are actually more complex than I've written, because our index is
> distributed.  Hopefully it can give you some ideas for yours.
>
> Shawn
>
>


Re: Dynamically boost search scores

2011-03-15 Thread Brian Lamb
Thank you for the advice. I looked at the page you recommended and came up
with:

http://localhost:8983/solr/search/?q=dog&fl=boost_score,genus,species,score&rows=15&bf=%22ord%28sum%28boost_score,1%29%29
^10%22

But appeared to have no effect. The results were in the same order as they
were when I left off the bf parameter. So what am I doing incorrectly?

Thanks,

Brian Lamb

On Mon, Mar 14, 2011 at 11:45 AM, Markus Jelsma
wrote:

> See boosting documents by function query. This way you can use document's
> boost_score field to affect the final score.
>
> http://wiki.apache.org/solr/FunctionQuery
>
> On Monday 14 March 2011 16:40:42 Brian Lamb wrote:
> > Hi all,
> >
> > I have a field in my schema called boost_score. I would like to set it up
> > so that if I pass in a certain flag, each document score is boosted by
> the
> > number in boost_score.
> >
> > For example if I use:
> >
> > http://localhost/solr/search/?q=dog
> >
> > I would get search results like normal. But if I use:
> >
> > http://localhost/solr/search?q=dog&boost=true
> >
> > The score of each document would be boosted by the number in the field
> > boost_score.
> >
> > Unfortunately, I have no idea how to implement this actually but I'm
> hoping
> > that's where you all can come in.
> >
> > Thanks,
> >
> > Brian Lamb
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>


Re: Solr Query

2011-03-15 Thread Geert-Jan Brits
> But it returns all resuts with MSRP = 1 and doesnt consider 2nd query at
all.

I believe you mean: 'it returns all results with RetailPriceCodeID = 1 while
ignoring the 2nd query?'

If so, please check that your default operator is set to AND in your schema
config.
Other than that, your syntax seems correct.

Hth,
Geert-Jan


2011/3/15 Vishal Patel 

> I am a bit new for Solr.
>
> I am running below query in query browser admin interface
>
> +RetailPriceCodeID:1 +MSRP:[16001.00 TO 32000.00]
>
> I think it should return only results with RetailPriceCode = 1 ad MSRP
> between 16001 and 32000.
>
> But it returns all resuts with MSRP = 1 and doesnt consider 2nd query at
> all.
>
> Am i doing something wrong here? Please help
>


Re: Sorting 0 values last

2011-03-15 Thread Gora Mohanty
On Tue, Mar 15, 2011 at 9:04 PM, Yonik Seeley
 wrote:
> On Tue, Mar 15, 2011 at 10:35 AM, MOuli  wrote:
>> I want so sort ASC on a price field, but some of the ouldocs got a 0 (not 
>> NULL)
>> value. Now I want that this docs are at the end when i sort the price field
>> ascending. Is it possible?
>
> In 3.1 and trunk (4.0-dev), you could sort by a function query that
> maps values of 0 to something very large.
[...]

Not sure how you are indexing, but in addition to the above
suggestion by Yonik, one could ignore 0's at indexing time,
i.e., ensure that 0 values for that field are not indexed, and
use sortMissingLast.

Regards,
Gora


Solr Query

2011-03-15 Thread Vishal Patel
I am a bit new for Solr.

I am running below query in query browser admin interface

+RetailPriceCodeID:1 +MSRP:[16001.00 TO 32000.00]

I think it should return only results with RetailPriceCode = 1 ad MSRP
between 16001 and 32000.

But it returns all resuts with MSRP = 1 and doesnt consider 2nd query at
all.

Am i doing something wrong here? Please help


Sorting on multiValued fields via function query

2011-03-15 Thread harish.agarwal
Hello,
I believe the most recent builds of Solr have started explicitly throwing an
error around sorting on multiValued fields.  I'd actually been sorting on
multiValued fields for some time without any problems before this, not sure
how Solr was able to handle this in the past...

In any case, I'd like to be able to sort on multiValued fields via a
function query, but keep getting the following error:
can not use FieldCache on multivalued field

I've tried using the function 'sum', 'max', and 'min' with the same result.  
Is there any way to sort on the maximum value, for instance, of a
multiValued field?

Thanks,
-Harish

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-on-multiValued-fields-via-function-query-tp2681833p2681833.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue

2011-03-15 Thread Shawn Heisey
The host is dual quad-core, each Xen VM has been given two CPUs.  Not 
counting dom0, two of the hosts have 10/8 CPUs allocated, two of them 
have 8/8.  The dom0 VM is also allocated two CPUs.


I'm not really sure how that works out when it comes to Java running on 
the VM, but if at all possible, it is likely that Xen would try and keep 
both VM cpus on the same physical CPU and the VM's memory allocation on 
the same NUMA node.  If that's the case, it would meet what you've 
stated as the recommendation for incremental mode.


Shawn


On 3/15/2011 9:10 AM, Markus Jelsma wrote:

CMS is very good for multicore CPU's. Use incremental mode only when you have
a single CPU with only one or two cores.




Faceting help

2011-03-15 Thread McGibbney, Lewis John
Hello list,

I'm trying to use facet's via widget's within Ajax-Solr. I have tried the wiki 
for general help on configuring facets and constraints and also attended the 
recent Lucidworks webinar on faceted search. Can anyone please direct me to 
some reading on how to formally configure facets for searching.

Currently my facets are configured as follows

  'facet.field': [ 'topics', 'organisations', 'exchanges', 'countryCodes' ],
  'facet.limit': 20,
  'facet.mincount': 1,
  'f.topics.facet.limit': 50,
  'f.countryCodes.facet.limit': -1,
  'facet.date': 'date',
  'facet.date.start': '1987-02-26T00:00:00.000Z/DAY',
  'facet.date.end': '1987-10-20T00:00:00.000Z/DAY+1DAY',
  'facet.date.gap': '+1DAY',
  'json.nl': 'map'

However I wish to change the fields to contain some constraints such as

Topics < field
  Legislation < constraint
  Guidance/Policies < constraint
  Customer Service information/complaints procedure < constraint
  financial information < constraint
  etc etc

Source < field
  html < constraint < constraint
  pdf < constraint
  email < constraint
  etc etc

Date < field
   < constraint

Basically I need resources to understand how to implement the above instead of 
the example I currently have.
Some guidance would be great
Thank you kindly

Lewis

Glasgow Caledonian University is a registered Scottish charity, number SC021474

Winner: Times Higher Education’s Widening Participation Initiative of the Year 
2009 and Herald Society’s Education Initiative of the Year 2009.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html

Winner: Times Higher Education’s Outstanding Support for Early Career 
Researchers of the Year 2010, GCU as a lead with Universities Scotland partners.
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html


Re: Sorting 0 values last

2011-03-15 Thread Yonik Seeley
On Tue, Mar 15, 2011 at 10:35 AM, MOuli  wrote:
> I want so sort ASC on a price field, but some of the docs got a 0 (not NULL)
> value. Now I want that this docs are at the end when i sort the price field
> ascending. Is it possible?

In 3.1 and trunk (4.0-dev), you could sort by a function query that
maps values of 0 to something very large.

sort=map(price,0,0,99) asc

This should map anything with a price between 0 and 0 to 99 for
the purposes of sorting.

http://wiki.apache.org/solr/FunctionQuery

-Yonik
http://lucidimagination.com


Sorting 0 values last

2011-03-15 Thread MOuli
Hi @everyone.

I want so sort ASC on a price field, but some of the docs got a 0 (not NULL)
value. Now I want that this docs are at the end when i sort the price field
ascending. Is it possible?

Thanks in advance.

MOuli

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-0-values-last-tp2681612p2681612.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue

2011-03-15 Thread Markus Jelsma
CMS is very good for multicore CPU's. Use incremental mode only when you have 
a single CPU with only one or two cores.

On Tuesday 15 March 2011 16:03:38 Shawn Heisey wrote:
> My solr+jetty+java6 install seems to work well with these GC options.
> It's a dual processor environment:
> 
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> 
> I've never had a real problem with memory, so I've not done any kind of
> auditing.  I probably should, but time is a limited resource.
> 
> Shawn
> 
> On 3/14/2011 2:29 PM, Markus Jelsma wrote:
> > That depends on your GC settings and generation sizes. And, instead of
> > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> > 
> > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > 
> >> It's actually, as I understand it, expected JVM behavior to see the heap
> >> rise to close to it's limit before it gets GC'd, that's how Java GC
> >> works.  Whether that should happen every 20 seconds or what, I don't
> >> nkow.
> >> 
> >> Another option is setting better JVM garbage collection arguments, so GC
> >> doesn't "stop the world" so often. I have had good luck with my Solr
> >> using this:  -XX:+UseParallelGC

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr performance issue

2011-03-15 Thread Shawn Heisey
My solr+jetty+java6 install seems to work well with these GC options.  
It's a dual processor environment:


-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

I've never had a real problem with memory, so I've not done any kind of 
auditing.  I probably should, but time is a limited resource.


Shawn


On 3/14/2011 2:29 PM, Markus Jelsma wrote:

That depends on your GC settings and generation sizes. And, instead of
UseParallelGC you'd better use UseParNewGC in combination with CMS.

See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

It's actually, as I understand it, expected JVM behavior to see the heap
rise to close to it's limit before it gets GC'd, that's how Java GC
works.  Whether that should happen every 20 seconds or what, I don't nkow.

Another option is setting better JVM garbage collection arguments, so GC
doesn't "stop the world" so often. I have had good luck with my Solr
using this:  -XX:+UseParallelGC




Re: Dismax: field not returned unless in sort clause?

2011-03-15 Thread Ahmet Arslan


--- On Tue, 3/15/11, mrw  wrote:

> From: mrw 
> Subject: Dismax:  field not returned unless in sort clause?
> To: solr-user@lucene.apache.org
> Date: Tuesday, March 15, 2011, 3:21 PM
> We have a "D" field (string, indexed,
> stored, not required) that is returned
> * when we search with the standard request handler
> * when we search with dismax request handler _and the field
> is specified in
> the sort parameter_
> 
> but is not returned when using the dismax handler and the
> field is not
> specified in the sort param.
> 
> IOW, if I do the following query (no sort param), I get all
> the expected
> results, but the D field never comes back...
> 
> &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D
> 
> ...but if I add "D" to the sort param, the D field comes
> back on every
> single record
> 
> &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20asc
> &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20desc
> 
> If I omit the fl param, I see that all of our other fields
> appear to be
> returned on every result without any need to specify them
> in the sort param.  
> 
> Obviously, I cannot hard-code the sort order around the D
> field.  :)

Can you use one space in qf parameter while separating field names?

q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A B C&start=0&rows=300&fl=D





Solrj (1.4.1) Performance related query

2011-03-15 Thread rahul
Hi,

I am using Solrj as a Solr client in my project.

While searching, for a few words, it seems Solrj takes more time to send
response, for eg (8 - 12 sec). While searching most of the other words it
seems Solrj take less amount of time only.

For eg, if I post a search url in browser, it shows the QTime in
milliseconds only.

http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1

But, if I query the same using Solrj from my project like below, it takes
long time(8 - 12 sec) to produce the same results. Hence, I suspect whether
Solrj takes such long time to produce results.

SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery("computing");
query.setParam("qt", "myhandler");
query.setFilterQueries("category:1");
query.setHighlight(false);
QueryResponse rsp = server.query( query );

I have tried both POTH and GET method. But, both are taking much time.

Any idea why Solrj takes such long time for particular words. It returns
around 40 doc list as a search result.  I have even comment out highlighting
for that.

And any way to speed it up.

Note: I am using Tomcat and set heap size as around 1024 mb. And I am using
Solr 1.4.1 version.

Thanks, 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-1-4-1-Performance-related-query-tp2681488p2681488.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: keeping data consistent between Database and Solr

2011-03-15 Thread Tim Gilbert
I use Solr + MySql with data coming from several DHI type "loaders" that
I have written to move data from many different databases into my "BI"
solution.  I don't use DHI because I am not simply replicating the data,
but I am moving/merging/processing the incoming data during the loading.

For me, I have an Aspect (aspectj) which wraps my Data Access Object and
every time a "persist" is called (I am using hibernate), I update Solr
with the same data an instant later using @Around advice.  This handles
nearly every event during the day.  I have a simple "retry" procedure on
my Solrj add/commit on network error in hopes that it will eventually
work.

In case of error I rebuild the solr index from scratch each night by
recreating it based on the data in MySQL.  That takes about 10 minutes
and I run it at night.  This allows for me to have "eventual
consistency" for any issues that cropped up during the day. 

Obviously the size of my database (< 2 million records) makes this
approach manageable.  YMMV.

Tim

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Tuesday, March 15, 2011 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: keeping data consistent between Database and Solr

On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote:
> But my main question is, how do I guarantee that data between my
Cassandra
> database and Solr index are consistent and up-to-date?

Our MySQL database has two unique indexes.  One is a document ID, 
implemented in MySQL as an autoincrement integer and in Solr as a long.

The other is what we call a tag id, implemented in MySQL as a varchar 
and Solr as a single lowercased token and serving as Solr's uniqueKey.  
We have an update trigger on the database that updates the document ID 
whenever the database document is updated.

We have a homegrown build system for Solr.  In a nutshell, it keeps 
track of the newest document ID in the Solr Index.  If the DIH 
delta-import fails, it doesn't update the stored ID, which means that on

the next run, it will try and index those documents again.  Changes to 
the entries in the database are automatically picked up because the 
document ID is newer, but the tag id doesn't change, so the document in 
Solr is overwritten.

Things are actually more complex than I've written, because our index is

distributed.  Hopefully it can give you some ideas for yours.

Shawn



Dismax: field not returned unless in sort clause?

2011-03-15 Thread mrw
We have a "D" field (string, indexed, stored, not required) that is returned
* when we search with the standard request handler
* when we search with dismax request handler _and the field is specified in
the sort parameter_

but is not returned when using the dismax handler and the field is not
specified in the sort param.

IOW, if I do the following query (no sort param), I get all the expected
results, but the D field never comes back...

&q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D

...but if I add "D" to the sort param, the D field comes back on every
single record

&q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20asc
&q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20desc

If I omit the fl param, I see that all of our other fields appear to be
returned on every result without any need to specify them in the sort param.  

Obviously, I cannot hard-code the sort order around the D field.  :)

Any ideas?   


Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-field-not-returned-unless-in-sort-clause-tp2681447p2681447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrj Performance check.

2011-03-15 Thread rahul
Hi,

I am using Solrj as a Solr client in my project.

While searching, for a few words, it seems Solrj takes more time to send
response, for eg (8 - 12 sec). While searching most of the other words it
seems Solrj take less amount of time only.

For eg, if I post a search url in browser, it shows the QTime in
milliseconds only.

http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1

But, if I query the same using Solrj from my project like below, it takes
long time(8 - 12 sec) to produce the same results. Hence, I suspect whether
Solrj takes such long time to produce results.

SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery("computing");
query.setParam("qt", "myhandler");
query.setFilterQueries("category:1");
query.setHighlight(false);
QueryResponse rsp = server.query( query );

I have tried both POTH and GET method. But, both are taking much time.

Any idea why Solrj takes such long time for particular words. It returns
around 40 doc list as a search result.  I have even comment out highlighting
for that.

And any way to speed it up.

Note: I am using Tomcat and set heap size as around 1024 mb.

Thanks, 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Performance-check-tp2681444p2681444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: keeping data consistent between Database and Solr

2011-03-15 Thread Shawn Heisey

On 3/14/2011 9:38 PM, onlinespend...@gmail.com wrote:

But my main question is, how do I guarantee that data between my Cassandra
database and Solr index are consistent and up-to-date?


Our MySQL database has two unique indexes.  One is a document ID, 
implemented in MySQL as an autoincrement integer and in Solr as a long.  
The other is what we call a tag id, implemented in MySQL as a varchar 
and Solr as a single lowercased token and serving as Solr's uniqueKey.  
We have an update trigger on the database that updates the document ID 
whenever the database document is updated.


We have a homegrown build system for Solr.  In a nutshell, it keeps 
track of the newest document ID in the Solr Index.  If the DIH 
delta-import fails, it doesn't update the stored ID, which means that on 
the next run, it will try and index those documents again.  Changes to 
the entries in the database are automatically picked up because the 
document ID is newer, but the tag id doesn't change, so the document in 
Solr is overwritten.


Things are actually more complex than I've written, because our index is 
distributed.  Hopefully it can give you some ideas for yours.


Shawn



Re: Solrj performance bottleneck

2011-03-15 Thread Yonik Seeley
On Tue, Mar 15, 2011 at 8:12 AM, rahul  wrote:
> I am using Solrj as a Solr client in my project.
>
> While searching, for a few words, it seems Solrj takes more time to send
> response, for eg (8 - 12 sec). While searching most of the other words it
> seems Solrj take less amount of time only.
>
> For eg, if I post a search url in browser, it shows the QTime in
> milliseconds only.

The QTime does not measure the time to stream back the response (which
includes loading stored fields).
Since the response is streamed, it's not possible to include this
time.  The difference normally isn't that
large unless you have a network issue, a client that is taking a long
time to read the response,
or a very large index and not enough free RAM for the OS to cache all the files.

Check the solr logs and make sure that equivalent queries are being received.
The QTime is also logged.

-Yonik
http://lucidimagination.com


Solrj performance bottleneck

2011-03-15 Thread rahul
Hi,

I am using Solrj as a Solr client in my project.

While searching, for a few words, it seems Solrj takes more time to send
response, for eg (8 - 12 sec). While searching most of the other words it
seems Solrj take less amount of time only.

For eg, if I post a search url in browser, it shows the QTime in
milliseconds only.

http://serverName/solr/mydata/select?q=computing&qt=myhandler&fq=category:1

But, if I query the same using Solrj from my project like below, it takes
long time(8 - 12 sec) to produce the same results. Hence, I suspect whether
Solrj takes such long time to produce results.

SolrServer server = new CommonsHttpSolrServer(url);
SolrQuery query = new SolrQuery("computing");
query.setParam("qt", "myhandler");
query.setFilterQueries("category:1");
query.setHighlight(false);
QueryResponse rsp = server.query( query );

I have tried both POTH and GET method. But, both are taking much time.

Any idea why Solrj takes such long time for particular words. It returns
around 40 doc list as a search result.  I have even comment out highlighting
for that. 

And any way to speed it up.

Note: I am using Tomcat and set heap size as around 1024 mb.

Thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-performance-bottleneck-tp2681294p2681294.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: background merge hit exception

2011-03-15 Thread Isha Garg

On Tuesday 15 March 2011 01:31 PM, Anurag wrote:

Do you mean that earlier it was doing indexing well then all of sudden you
started getting this exception?

-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/background-merge-hit-exception-tp2680625p2680979.html
Sent from the Solr - User mailing list archive at Nabble.com.
   
ya earlier it was fine but i think as index size grows the problem start 
occuring.
also I  have a doubt related to index size . Isnt it is too large as 
compared to no. of documents?


Re: keeping data consistent between Database and Solr

2011-03-15 Thread onlinespend...@gmail.com
Solandra is great for adding better scalability and NRT to Solr, but
it pretty much just stores the index in Cassandra and insulates that
from the user. It doesn't solve the problem of allowing quick and
direct retrieval of data that need not be searched. I could certainly
just use a Solr search query to "directly" access a single document,
but that has overhead and would not be as efficient as directly
accessing a database. With potentially tens of thousands of
simultaneous direct data accesses, I'd rather not put this burden on
Solr and would prefer to use it only for searchas it was intended,
while simple data retrieval could come from a better equipped
database.

But my question of consistency applies to all databases and Solr. i
would imagine most people maintain separate MySQL and Solr databases.

On Tuesday, March 15, 2011, Bill Bell  wrote:
> Look at Solandra. Solr + Cassandra.
>
> On 3/14/11 9:38 PM, "onlinespend...@gmail.com" 
> wrote:
>
>>Like many people, Solr is not my primary data store. Not all of my data
>>need
>>be searchable and for simple and fast retrieval I store it in a database
>>(Cassandra in my case).  Actually I don't have this all built up yet, but
>>my
>>intention is that whenever new data is entered that it be added to my
>>Cassandra database and simultaneously added to the Solr index (either by
>>queuing up recent data before a commit or some other means; any
>>suggestions
>>on this front?).
>>
>>But my main question is, how do I guarantee that data between my Cassandra
>>database and Solr index are consistent and up-to-date?  What if I write
>>the
>>data to Cassandra and then a failure occurs during the commit to the Solr
>>index?  I would need to be aware what data failed to commit and make sure
>>that a re-attempt is made.  Obviously inconsistency for a short duration
>>is
>>inevitable when using two different databases (Cassandra and Solr), but I
>>certainly don't want a failure to create perpetual inconsistency.  I'm
>>curious what sort of mechanisms people are using to ensure consistency
>>between their database (MySQL, Cassandra, etc.) and Solr.
>>
>>Thank you,
>>Ben
>
>
>


Re: Solr query POST and not in GET

2011-03-15 Thread Upayavira
Please do not cross-post between lists - yours seems like a user
query to me, so I'm answering it here.

As to your question - Solr does not select the request method -
you do. I've just tested it and Solr happily accepts a query via
a POST request.

However, you'd probably do well to look at other ways to
structure your query if you're hitting the URL length limit.

Upayavira

On Tue, 15 Mar 2011 12:23 +0100, "Gastone Penzo"
 wrote:

  Hi,

is possible to change Solr sending query method from get to post?

because my query has a lot of OR..OR..OR and the log says to me
Request URI too large

Where can i change it??

thanx







--
Gastone Penzo



[1]www.solr-italia.it

The first italian blog about SOLR

References

1. http://www.solr-italia.it/
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source



Re: Solr query POST and not in GET

2011-03-15 Thread Geert-Jan Brits
Yes it's possible.
Assuming your using SolrJ as a client-library:

set:
QueryRequest req = new QueryRequest();
req.setMethod(METHOD.POST);

Any other client-library should have a similar method.
hth,
Geert-Jan


2011/3/15 Gastone Penzo 

> Hi,
> is possible to change Solr sending query method from get to post?
> because my query has a lot of OR..OR..OR and the log says to me Request URI
> too large
> Where can i change it??
> thanx
>
>
>
>
> --
> Gastone Penzo
>
> www.solr-italia.it
> The first italian blog about SOLR
>


Solr query POST and not in GET

2011-03-15 Thread Gastone Penzo
Hi,
is possible to change Solr sending query method from get to post?
because my query has a lot of OR..OR..OR and the log says to me Request URI
too large
Where can i change it??
thanx




-- 
Gastone Penzo

www.solr-italia.it
The first italian blog about SOLR


Re: Query on facet field¹s count

2011-03-15 Thread William Bell
My patch is for 4.0 trunk.

On Fri, Mar 11, 2011 at 10:05 PM, rajini maski  wrote:
> Thanks Bill Bell . .This query works after applying the patch you refered
> to, is it? Please can you let me know how do I need to update the current
> war (apache solr 1.4.1 )file with this new patch? Thanks a lot.
>
> Thanks,
> Rajani
>
> On Sat, Mar 12, 2011 at 8:56 AM, Bill Bell  wrote:
>
>> http://localhost:8983/solr/select?q=*:*&facet=true&facet.field=StudyID&face
>> t.mincount=1&facet.limit=-1&f.StudyID.facet.namedistinct=1
>>
>> Would do what you want I believe...
>>
>>
>>
>> On 3/11/11 8:51 AM, "Bill Bell"  wrote:
>>
>> >There is my patch to do that. SOLR-2242
>> >
>> >Bill Bell
>> >Sent from mobile
>> >
>> >
>> >On Mar 11, 2011, at 1:34 AM, rajini maski  wrote:
>> >
>> >> Query on facet field results...
>> >>
>> >>
>> >>       When I run a facet query on some field say : facet=on &
>> >> facet.field=StudyID I get list of distinct StudyID list with the count
>> >>that
>> >> tells that how many times did this study occur in the search query.
>> >>But I
>> >> also needed the count of these distinct StudyID list.. Any solr query
>> >>to get
>> >> count of it..
>> >>
>> >>
>> >>
>> >> Example:
>> >>
>> >>
>> >>
>> >>   
>> >>
>> >>    
>> >>
>> >>  135164
>> >>
>> >>  79820
>> >>
>> >>  70815
>> >>
>> >>  37076
>> >>
>> >>  35276
>> >>
>> >>  
>> >>
>> >> 
>> >>
>> >>
>> >>
>> >> I wanted the count attribute that shall return the count of number of
>> >> different studyID occurred .. In above example  it could be  : Count = 5
>> >> (105,179,107,120,134)
>> >>
>> >>
>> >>
>> >> 
>> >>
>> >> 
>> >>
>> >>  135164
>> >>
>> >>  79820
>> >>
>> >>  70815
>> >>
>> >>  37076
>> >>
>> >>  35276
>> >>
>> >>  
>> >>
>> >> 
>>
>>
>>
>


Re: background merge hit exception

2011-03-15 Thread Anurag
Do you mean that earlier it was doing indexing well then all of sudden you
started getting this exception? 

-
Kumar Anurag

--
View this message in context: 
http://lucene.472066.n3.nabble.com/background-merge-hit-exception-tp2680625p2680979.html
Sent from the Solr - User mailing list archive at Nabble.com.