Re: solr query result not read the latest xml file

2010-08-09 Thread Ahmet Arslan
> hi everyone,
> 
> I do these steps every time the new xml file created (for
> example
> cat_978.xml has just been created):
> 1. delete the index
> (AUC_CAT:978)
> 2. commit the new cat_978.xml (java -jar post.jar
> cat_978.xml)
> 3. restart the java (stop and java -jar start.jar)
> 
> if I'm not done those steps then the query result showed in
> the browser
> still using the old value (cat_978.xml - no changes at all)
> instead of
> reading the new cat_978.xml
> 
> what I want to ask, is there a way so I don't need to
> restart the java since
> it consume too much resources and time?

You dont need to delete old document. Solr replaces it automaticaly. Assuming 
they have same . 

Probably HTTP caching causing you problems when testing with browser. You can 
disable it in solrconfig.xml file 


  


Re: Problems to clustering on tomcat

2010-08-09 Thread Otis Gospodnetic
Claudio,

It sounds like the word "Cluster" there is adding confusion.  
ClusteringComponent has to do with search results clustering.  What you seem to 
be after is creation of a Solr cluster.

You'll find good pointers here: 
http://search-lucene.com/?q=master+slave&fc_project=Solr&fc_type=wiki

Perhaps this is the best place to start: 
http://wiki.apache.org/solr/SolrReplication

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Claudio Devecchi 
> To: solr-user@lucene.apache.org
> Sent: Mon, August 9, 2010 7:07:54 PM
> Subject: Problems to clustering on tomcat
> 
> Hi everybody,
> 
> I need to do some tests in my solr instalation, previously  I configured my
> application on a single node, and now I need to make some  tests on a cluster
> configuration.
> I followed the steps on "http://wiki.apache.org/solr/ClusteringComponent";
> and when a startup the  example system everything is ok, but when I try to
> run it on tomcat I receive  the error bellow, somebody have an idea?
> 
> SEVERE: Could not start SOLR.  Check solr/home property
> org.apache.solr.common.SolrException: Error loading  class
> 'org.apache.solr.handler.clustering.ClusteringComponent'
> 
> -- 
> Claudio Devecchi
> flickr.com/cdevecchi
> 


Re: randomness - percent share

2010-08-09 Thread Lance Norskog
There is a Random value type documented in the standard schema.xml.
I'm not sure if it reseeds each time. You can use this in a function
query along with your percentage bias fields.

On Mon, Aug 9, 2010 at 2:44 PM, Satish Kumar
 wrote:
> Hi,
>
> We have some identical records in our data set (e.g. what is swine flu?
> written by two different authors). When user searches for "What is swine
> flu?", we want the result by author1 appear as the first result for x% of
> the queries and result by author2 for y% of the queries (where x and y
> should be configurable). I am wondering if I can use the percentShare value
> (25, 40, 60, etc.) stored per record as an element in controlling the score,
> yet generating randomness-- if record1 share is 75% and record share is 25%,
> on an average record1 should appear 75 times and record2 25 times in 100
> search queries; if not exactly 75 and 25, something in that range should be
> fine too.
>
> Any ideas on implementing this feature?
>
>
> Thanks much!
>
> Satish
>



-- 
Lance Norskog
goks...@gmail.com


Re: enhancing auto complete

2010-08-09 Thread Bhavnik Gajjar
Thanks Avlesh for sharing the info. Will try it!

In between, some another solution is also found 
http://metaoptimize.com/qa/questions/17/stemming-problems-when-writing-search-auto-complete

Kind regards.

On 8/4/2010 9:13 PM, Avlesh Singh wrote:
> I preferred to answer this question privately earlier. But I have received
> innumerable requests to unveil the architecture. For the benefit of all, I
> am posting it here (after hiding as much info as I should, in my company's
> interest).
>
> The context: Auto-suggest feature on http://askme.in
>
> *Solr setup*: Underneath are some of the salient features -
>
> 1. TermsComponent is NOT used.
> 2. The index is made up of 4 fields of the following types -
> "autocomplete_full", "autocomplete_token", "string" and "text".
> 3. "autocomplete_full" uses KeywordTokenizerFactory and
> EdgeNGramFilterFactory. "autocomplete_token" uses 
> WhitespaceTokenizerFactory
> and EdgeNGramFilterFactory. Both of these are Solr text fields with 
> standard
> filters like LowerCaseFilterFactory etc applied during querying and
> indexing.
> 4. Standard DataImportHandler and a bunch of sql procedures are used to
> "derive" all suggestable phrases from the system and index them in the 
> above
> mentioned fields.
>
> *Controller setup*: The controller (to handle suggest queries) is a typical
> JAVA servlet using Solr as its backend (connecting via solrj). Based on the
> incoming query string, a lucene query is created. It is BooleanQuery
> comprising of TermQuery across all the above mentioned fields. The boost
> factor to each of these term queries would determine (to an extent) what
> kind of matches do you prefer to show up first. JSON is used as the data
> exchange format.
>
> *Frontend setup*: It is a home grown JS to address some specific use cases
> of the project in question. One simple exercise with Firebug will spill all
> the beans. However, I strongly recommend using jQuery to build (and extend)
> the UI component.
>
> Any help beyond this is available, but off the list.
>
> Cheers
> Avlesh
> @avlesh  | http://webklipper.com
>
> On Tue, Aug 3, 2010 at 10:04 AM, Bhavnik Gajjar<
> bhavnik.gaj...@gatewaynintec.com>  wrote:
>
>
>>   Whoops!
>>
>> table still not looks ok :(
>>
>> trying to send once again
>>
>>
>> loremLorem ipsum dolor sit amet
>>  Hieyed ddi lorem ipsum dolor
>>  test lorem ipsume
>>  test xyz lorem ipslili
>>
>> lorem ipLorem ipsum dolor sit amet
>>  Hieyed ddi lorem ipsum dolor
>>  test lorem ipsume
>>  test xyz lorem ipslili
>>
>> lorem ipsltest xyz lorem ipslili
>>
>> On 8/3/2010 10:00 AM, Bhavnik Gajjar wrote:
>>
>> Avlesh,
>>
>> Thanks for responding
>>
>> The table mentioned below looks like,
>>
>> lorem   Lorem ipsum dolor sit amet
>>   Hieyed ddi lorem ipsum
>> dolor
>>   test lorem ipsume
>>   test xyz lorem ipslili
>>
>> lorem ip   Lorem ipsum dolor sit amet
>>   Hieyed ddi lorem ipsum
>> dolor
>>   test lorem ipsume
>>   test xyz lorem ipslili
>>
>> lorem ipsl test xyz lorem ipslili
>>
>>
>> Yes, [http://askme.in] looks good!
>>
>> I would like to know its designs/solr configurations etc.. Can you
>> please provide me detailed views of it?
>>
>> In [http://askme.in], there is one thing to be noted. Search text like,
>> [business c] populates [Business Centre] which looks OK but, [Consultant
>> Business] looks bit odd. But, in general the pointer you suggested is
>> great to start with.
>>
>> On 8/2/2010 8:39 PM, Avlesh Singh wrote:
>>
>>
>>   From whatever I could read in your broken table of sample use cases, I 
>> think
>>
>>
>>   you are looking for something similar to what has been done here 
>> -http://askme.in; if this is what you are looking do let me know.
>>
>> Cheers
>> Avlesh
>> @avlesh     | 
>> http://webklipper.com
>>
>> On Mon, Aug 2, 2010 at 8:09 PM, Bhavnik 
>> Gajjar   wrote:
>>
>>
>>
>>
>>   Hi,
>>
>> I'm looking for a solution related to auto complete feature for one
>> application.
>>
>> Below is a list of texts from which auto complete results would be
>> populated.
>>
>> Lorem ipsum dolor sit amet
>> tincidunt ut laoreet
>> dolore eu feugiat nulla facilisis at vero eros et
>> te feugait nulla facilisi
>> Claritas est etiam processus
>> anteposuerit litterarum formas humanitatis
>> fiant sollemnes in futurum
>> Hieyed ddi lorem ipsum dolor
>> test lorem ipsume
>> test xyz l

solr query result not read the latest xml file

2010-08-09 Thread e8en

hi everyone,

I do these steps every time the new xml file created (for example
cat_978.xml has just been created):
1. delete the index (AUC_CAT:978)
2. commit the new cat_978.xml (java -jar post.jar cat_978.xml)
3. restart the java (stop and java -jar start.jar)

if I'm not done those steps then the query result showed in the browser
still using the old value (cat_978.xml - no changes at all) instead of
reading the new cat_978.xml

what I want to ask, is there a way so I don't need to restart the java since
it consume too much resources and time?

thanks in advance,
Eben
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-query-result-not-read-the-latest-xml-file-tp1066785p1066785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to support "implicit trailing wildcards"

2010-08-09 Thread yandong yao
Hi Bastian,

Sorry for not make it clear, I also want exact match have higher score than
wildcard match, that is means: if searching 'mount', documents with 'mount'
will have higher score than documents with 'mountain', while 'mount*' seems
treat 'mount' and 'mountain' as same.

besides, also want the query to be processed with analyzer, while from
http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F,
Wildcard, Prefix, and Fuzzy queries are not passed through the Analyzer. The
rationale is that if search 'mounted', I also want documents with 'mount'
match.

So seems built-in wildcard search could not satisfy my requirements if i
understand correctly.

Thanks very much!


2010/8/9 Bastian Spitzer 

> Wildcard-Search is already built in, just use:
>
> ?q=umoun*
> ?q=mounta*
>
> -Ursprüngliche Nachricht-
> Von: yandong yao [mailto:yydz...@gmail.com]
> Gesendet: Montag, 9. August 2010 15:57
> An: solr-user@lucene.apache.org
> Betreff: how to support "implicit trailing wildcards"
>
> Hi everyone,
>
>
> How to support 'implicit trailing wildcard *' using Solr, eg: using Google
> to search 'umoun', 'umount' will be matched , search 'mounta', 'mountain'
> will be matched.
>
> From my point of view, there are several ways, both with disadvantages:
>
> 1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with 'u',
> 'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the index
> size increases dramatically, b) will matches even has no relationship, such
> as such 'mount' will match 'mountain' also.
>
> 2) Using two pass searching: first pass searches term dictionary through
> TermsComponent using given keyword, then using the first matched term from
> term dictionary to search again. eg: when user enter 'umoun', TermsComponent
> will match 'umount', then use 'umount' to search. The disadvantage are: a)
> need to parse query string so that could recognize meta keywords such as
> 'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP
> client), b) The returned hit counts is not for original search string, thus
> will influence other components such as auto-suggest component based on user
> search history and hit counts.
>
> 3) Write custom SearchComponent, while have no idea where/how to start
> with.
>
> Is there any other way in Solr to do this, any feedback/suggestion are
> welcome!
>
> Thanks very much in advance!
>


Re: DIH and multivariable fields problems

2010-08-09 Thread harrysmith

This is increasingly more looking like a bug. To recap, I am trying to use
the DIH to import multivalued dynamic fields and using a variable to name
that field.

Upon further testing, the multivalued import works fine with a
static/constant name, but only keeps the first record when naming the field
dynamically. See below for relevant snips.

>From schema.xml :


>From data-config.xml :








Produces the following, note that there are 3 records that should be
returned and are correctly done, with the field name being a constant.

- 
- 
  9892962 
- 
  record 1 
  record 2 
  record 3 
  Polygraph Newsletter Title 
  
- 
  Polygraph Newsletter Title 
  
  
  

===

Now, changing the field name to a variable..., note only the first record is
retained for the 'Relation_s' field -- there should be 3 records.

 
becomes
 

produces the following:
- 
- 
- 
  record 1 
  
- 
  Polygraph Newsletter Title 
  
  9892962 
- 
  Polygraph Newsletter Title 
  
  
  

Only the first record is retained. There was also another post (which
recieved no replies) in the archive that reported the same issue. The DIH
debug logs do show 3 records correctly being returned, so somehow these are
not getting added.



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-and-multivariable-fields-problems-tp1032893p1065244.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet Fields - ID vs. Display Value

2010-08-09 Thread dc tech
I think depends on what you need:
1) Simple,unique category - use display facet
2) Categories may be duplicate from display perspective (eg authors) :
store display#id in facet field but show only display
3) Internationalization requirements - store I'd but have ui pull and
display the translated labels

On 8/9/10, Frank A  wrote:
> What I meant (which I realize now wasn't very clear) was if I have
> something like categoryID and categorylabel - is the normal practice
> to define categoryID as the facet field and then have the UI layer
> display the label?  Or would it be normal to directly use
> categorylabel as the facet field?
>
>
>
> On Mon, Aug 9, 2010 at 6:01 PM, Otis Gospodnetic
>  wrote:
>> Hi Frank,
>>
>> I'm not sure what you mean by that.
>> If the question is about what should be shown in the UI, it should be
>> something
>> pretty and human-readable, such as the original facet string value,
>> assuming it
>> was nice and clean.
>>
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Lucene ecosystem search :: http://search-lucene.com/
>>
>>
>>
>> - Original Message 
>>> From: Frank A 
>>> To: solr-user@lucene.apache.org
>>> Sent: Mon, August 9, 2010 5:19:57 PM
>>> Subject: Facet Fields - ID vs. Display Value
>>>
>>> Is there a general best practice on whether facet fields should be on
>>> "IDs"  or "Display values"?
>>>
>>> -Frank
>>>
>>
>

-- 
Sent from my mobile device


RE: Re: Facet Fields - ID vs. Display Value

2010-08-09 Thread Markus Jelsma
Well, you can do both, of cource but there's no need for additional code if you 
get it for free. I'd prefer - as most i assume - to use the label as a facet 
field.
 
-Original message-
From: Frank A 
Sent: Tue 10-08-2010 01:11
To: solr-user@lucene.apache.org; 
Subject: Re: Facet Fields - ID vs. Display Value

What I meant (which I realize now wasn't very clear) was if I have
something like categoryID and categorylabel - is the normal practice
to define categoryID as the facet field and then have the UI layer
display the label?  Or would it be normal to directly use
categorylabel as the facet field?



On Mon, Aug 9, 2010 at 6:01 PM, Otis Gospodnetic
 wrote:
> Hi Frank,
>
> I'm not sure what you mean by that.
> If the question is about what should be shown in the UI, it should be 
> something
> pretty and human-readable, such as the original facet string value, assuming 
> it
> was nice and clean.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: Frank A 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, August 9, 2010 5:19:57 PM
>> Subject: Facet Fields - ID vs. Display Value
>>
>> Is there a general best practice on whether facet fields should be on
>> "IDs"  or "Display values"?
>>
>> -Frank
>>
>


Re: Facet Fields - ID vs. Display Value

2010-08-09 Thread Frank A
What I meant (which I realize now wasn't very clear) was if I have
something like categoryID and categorylabel - is the normal practice
to define categoryID as the facet field and then have the UI layer
display the label?  Or would it be normal to directly use
categorylabel as the facet field?



On Mon, Aug 9, 2010 at 6:01 PM, Otis Gospodnetic
 wrote:
> Hi Frank,
>
> I'm not sure what you mean by that.
> If the question is about what should be shown in the UI, it should be 
> something
> pretty and human-readable, such as the original facet string value, assuming 
> it
> was nice and clean.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: Frank A 
>> To: solr-user@lucene.apache.org
>> Sent: Mon, August 9, 2010 5:19:57 PM
>> Subject: Facet Fields - ID vs. Display Value
>>
>> Is there a general best practice on whether facet fields should be on
>> "IDs"  or "Display values"?
>>
>> -Frank
>>
>


Problems to clustering on tomcat

2010-08-09 Thread Claudio Devecchi
Hi everybody,

I need to do some tests in my solr instalation, previously I configured my
application on a single node, and now I need to make some tests on a cluster
configuration.
I followed the steps on "http://wiki.apache.org/solr/ClusteringComponent";
and when a startup the example system everything is ok, but when I try to
run it on tomcat I receive the error bellow, somebody have an idea?

SEVERE: Could not start SOLR. Check solr/home property
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'

-- 
Claudio Devecchi
flickr.com/cdevecchi


RE: hl.usePhraseHighlighter

2010-08-09 Thread Ahmet Arslan
> I used text type and found the following in schema.xml. I
> don't know which ones I should remove. 
> ***

You should remove  from both index and query time.


  


Re: Facet Fields - ID vs. Display Value

2010-08-09 Thread Otis Gospodnetic
Hi Frank,

I'm not sure what you mean by that.
If the question is about what should be shown in the UI, it should be something 
pretty and human-readable, such as the original facet string value, assuming it 
was nice and clean.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Frank A 
> To: solr-user@lucene.apache.org
> Sent: Mon, August 9, 2010 5:19:57 PM
> Subject: Facet Fields - ID vs. Display Value
> 
> Is there a general best practice on whether facet fields should be on
> "IDs"  or "Display values"?
> 
> -Frank
> 


randomness - percent share

2010-08-09 Thread Satish Kumar
Hi,

We have some identical records in our data set (e.g. what is swine flu?
written by two different authors). When user searches for "What is swine
flu?", we want the result by author1 appear as the first result for x% of
the queries and result by author2 for y% of the queries (where x and y
should be configurable). I am wondering if I can use the percentShare value
(25, 40, 60, etc.) stored per record as an element in controlling the score,
yet generating randomness-- if record1 share is 75% and record share is 25%,
on an average record1 should appear 75 times and record2 25 times in 100
search queries; if not exactly 75 and 25, something in that range should be
fine too.

Any ideas on implementing this feature?


Thanks much!

Satish


RE: hl.usePhraseHighlighter

2010-08-09 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your help!

I used text type and found the following in schema.xml. I don't know which ones 
I should remove. 
***

  







  
  







  

***

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Monday, August 09, 2010 4:32 PM
To: solr-user@lucene.apache.org
Subject: Re: hl.usePhraseHighlighter


> I am trying to do exactly match. For
> example, I hope only get study highlighted if I search
> "study", not others (studies, studied and so on).

This has nothing to do with highlighting and its parameters. 
You need to remove stem filter factory (porter, snowball) from your analyzer 
chain. Re-start solr and re-index is also necessary.


  


Facet Fields - ID vs. Display Value

2010-08-09 Thread Frank A
Is there a general best practice on whether facet fields should be on
"IDs" or "Display values"?

-Frank


Re: anti-words - exact match

2010-08-09 Thread Satish Kumar
Thanks Jon.

My initial thought was exactly like yours. My preference was to implement
this requirement completely at Solr level so that different applications
won't have to put this logic. However, I am not sure how to shingle-ize the
input query and use that in filter query with a NOT operator at the solr
layer. The other option as you suggested is to single-ize the input query in
the application layer -- this is doable, but means adding logic in
application layer.

For now I am settling on the below solution:

- each anti-word (can be multiple words) will be stored as separate token.
The input record will contain different anti-word separated by
comma. solr.PatternTokenizerFactory will be used to split on comma and
create tokens

- the list of anti-words is stored in memory in application layer and
anti-words are extracted from the user entered query (e.g. if user enteres
'I have swollen foot' and 'swollen foot' is anti-word, swollen foot is
extracted)

- filter query with NOT operator on anti-word field is sent to solr


Thanks much!

Satish

This is tricky. You could try doing something with the ShingleFilter (
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory)
> at _query time_ to turn the users query:
>
> "i have a swollen foot" into:
> "i", "i have", "i have a", "i have a swollen",  "have", "have a", "have
> a swollen"... etc.
>
> I _think_ you can get the ShingleFilter factory to do that.
>
> But now you only want to exclude if one of those shingles matches the
> ENTIRE "anti-word". So maybe index as non-tokenized, so each of those
> shingles will somehow only match on the complete thing.  You'd want to
> normalize spacing and punctuation.
>
> But then you need to turn that into a _negated_ element of your query.
> Perhaps by using an fq with a NOT/"-" in it? And a query which 'matches'
> (causing 'not' behavior) if _any_ of the shingles match.
>
> I have no idea if it's actually possible to put these things together in
> that way. A non-tokenized field? Which still has it's queries shingle-ized
> at query time? And then works as a negated query, matching for negation if
> any of the shingles match?  Not really sure how to put that together in your
> solrconfig.xml and/or application logic if needed. You could try.
>

yup, I didn't know how to shingle-ized the input query and use that as input
in filter query.


> Another option would be doing the query-time 'shingling' in your app, and
> then it's a somewhat more normal Solr query. &fq= -"shingle one" -"shingle
> two" -"shingle three" etc.  Or put em in seperate fq's depending on how you
> want to use your filter cache. Still searching on a non-tokenized field, and
> still normalizing on white-space and punctuation at both index time and
> (using same normalization logic but in your application logic this time)
> query time.  I think that might work.
>
> So I'm not really sure, but maybe that gives you some ideas.
>
> Jonathan
>
>
>
>
> Satish Kumar wrote:
>
>> Hi,
>>
>> We have a requirement to NOT display search results if user query contains
>> terms that are in our anti-words field. For example, if user query is "I
>> have swollen foot" and if some records in our index have "swollen foot" in
>> anti-words field, we don't want to display those records. How do I go
>> about
>> implementing this?
>>
>> NOTE 1: anti-words field can contain multiple values. Each value can be a
>> one or multiple words (e.g. "swollen foot", "headache", etc. )
>>
>> NOTE 2: the match must be exact. If anti-words field contains "swollen
>> foot"
>> and if user query is "I have swollen foot", record must be excluded. If
>> user
>> query is "My foot is swollen", the record should not be excluded.
>>
>> Any pointers is greatly appreciated!
>>
>>
>> Thanks,
>> Satish
>>
>>
>>
>


RE: It seems like using a wildcard causes lowercase filter to not do the lowercasing?

2010-08-09 Thread Robert Petersen
Aha, I overlooked that.  Thank you.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Monday, August 09, 2010 1:28 PM
To: solr-user@lucene.apache.org
Subject: Re: It seems like using a wildcard causes lowercase filter to not do 
the lowercasing?

> I have a field with lowercase filter
> on search and index sides, and
> searching in this field works fine with uppercase or
> lowercase terms,
> except if I wildcard!  So searching for 'gps' or 'GPS'
> returns the same
> result set, but searching for 'gps*' returns results as
> expected and
> searching for 'GPS*' returns nothing.  It seems the
> asterisk blocks the
> lower case filter operation and then no matches occur
> because the index
> is all lowercased.

"Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are 
not passed through the Analyzer" [1]

[1]http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F


  


Re: hl.usePhraseHighlighter

2010-08-09 Thread Ahmet Arslan

> I am trying to do exactly match. For
> example, I hope only get study highlighted if I search
> "study", not others (studies, studied and so on).

This has nothing to do with highlighting and its parameters. 
You need to remove stem filter factory (porter, snowball) from your analyzer 
chain. Re-start solr and re-index is also necessary.


  


Re: It seems like using a wildcard causes lowercase filter to not do the lowercasing?

2010-08-09 Thread Ahmet Arslan
> I have a field with lowercase filter
> on search and index sides, and
> searching in this field works fine with uppercase or
> lowercase terms,
> except if I wildcard!  So searching for 'gps' or 'GPS'
> returns the same
> result set, but searching for 'gps*' returns results as
> expected and
> searching for 'GPS*' returns nothing.  It seems the
> asterisk blocks the
> lower case filter operation and then no matches occur
> because the index
> is all lowercased.

"Unlike other types of Lucene queries, Wildcard, Prefix, and Fuzzy queries are 
not passed through the Analyzer" [1]

[1]http://wiki.apache.org/lucene-java/LuceneFAQ#Are_Wildcard.2C_Prefix.2C_and_Fuzzy_queries_case_sensitive.3F





[ANN] Free technical webinar: Mastering the Lucene Index: Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

2010-08-09 Thread Mark Miller
Hey all - apologize for the quick cross post - just to let you know,
Andrzej is giving a free webinar this wed. His presentations are always
fantastic, so check it out:

Lucid Imagination Presents a free technical webinar:  Mastering the
Lucene Index
Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

Sign up here:
http://www.eventsvc.com/lucidimagination/081110?trk-AP

Lucene/Solr index implementation is critical to the performance of your
search application and the quality of your results -- and not just at
indexing time. If you're developing applications in Lucene/Solr, your
index will reward care and attention -- adding power to your running
search application -- all the more so as you inevitably increase the
scope of your query traffic and the dimensions of your data.

Join Andrzej Bialecki, Lucene Committer and inventor of the Luke index
utility, for an advanced workshop on cutting edge techniques for keeping
your Lucene/Solr index at its peak potential. Andrzej will discuss and
present essential strategies for index post-processing, including:
* Single-pass index splitting -- reshaping indexes for flexible deployment
* Index pruning, filtering and multi-tiered search, or how to serve
indexes (mostly) from RAM
* Bit-wise search -- or how to find the best bit-wise matches - and
applications in text fingerprinting

About the presenter: Andrzej Bialecki is a committer of the Apache
Lucene project, a Lucene PMC member, and chairman of the Apache Nutch
project. He is also the author of Luke, the Lucene Index Toolbox.
Andrzej participates in many commercial projects that use Lucene, Solr,
Nutch and Hadoop to implement enterprise and vertical search.

Sign up here:
http://www.eventsvc.com/lucidimagination/081110?trk-AP


It seems like using a wildcard causes lowercase filter to not do the lowercasing?

2010-08-09 Thread Robert Petersen
I have a field with lowercase filter on search and index sides, and
searching in this field works fine with uppercase or lowercase terms,
except if I wildcard!  So searching for 'gps' or 'GPS' returns the same
result set, but searching for 'gps*' returns results as expected and
searching for 'GPS*' returns nothing.  It seems the asterisk blocks the
lower case filter operation and then no matches occur because the index
is all lowercased.

 

This is a very simple index with very simple docs, and the field is
defined like this in the schema:

 



 

 



  







  



 



Re: how to query a string using solr URL in the browser

2010-08-09 Thread Gora Mohanty
On Mon, 9 Aug 2010 05:31:36 -0700 (PDT)
e8en  wrote:

> 
> I forgot something,
> 
> when I enter this:
> http://172.11.18.120:9000/search/select/?q=text:bracket&q.op=AND&start=0&rows=1000
> 
> the result will show all ITEM_ID that contain 'bracket' word in
> both or one of ITEM_DESCR_SHORT or ITEM_TITLE
[...]

Please read up on the excellent Solr documentation on the Wiki, and
understand how the query parameter, 'q', in the URL works. Besides
the solution that Otis has already provided, if you want a query of
"text:bracket" to search for the word "bracket" only in the fields
AUC_DESCR_SHORT and AUC_TITLE, you have to copy *only* those fields
into the default search field (named "text" by default).

From the description of what you observe, it seems that the default
search field, "text", in schema.xml is having only ITEM_DESCR_SHORT
and ITEM_TITLE copied into it. Please post your schema.xml file
here for further clarifications.

Regards,
Gora


hl.usePhraseHighlighter

2010-08-09 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
I am trying to do exactly match. For example, I hope only get study highlighted 
if I search "study", not others (studies, studied and so on).

I didn't find any function for it from SolrQuery. I added following in 
solrconfig.xml
true.

Unfortunately I didn't get it work. 

Please help me out.

Thanks so much,
Xiaohui 


Re: can't use strdist in sorting either?

2010-08-09 Thread solr-user

issue resolved

I should have read the documentation with more care; "Calculate the distance
between two strings"

my city field was a tokenized text field so changing it to string type got
things working

sorry all
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1058059.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: can't use strdist in sorting either?

2010-08-09 Thread solr-user

finally figured out that I can simply escape the quotation marks in the query
URL using backslashes to use strdist as a functionquery (sorry all, that
should have been a no-brainer)

http://10.0.11.54:8994/solr/select?q=(*:*)^0%20_val_:"strdist(\"phoenix\",city,edit)"&fl=score,*&sort=score%20desc

however, sorting by the score in this query doesnt work (ie same problem as
when sorting by strdist function - results dont change when I go from asc to
desc or vice-versa).

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/can-t-use-strdist-as-functionquery-tp1023390p1057056.html
Sent from the Solr - User mailing list archive at Nabble.com.


AW: how to support "implicit trailing wildcards"

2010-08-09 Thread Bastian Spitzer
Wildcard-Search is already built in, just use:

?q=umoun*
?q=mounta*

-Ursprüngliche Nachricht-
Von: yandong yao [mailto:yydz...@gmail.com] 
Gesendet: Montag, 9. August 2010 15:57
An: solr-user@lucene.apache.org
Betreff: how to support "implicit trailing wildcards"

Hi everyone,


How to support 'implicit trailing wildcard *' using Solr, eg: using Google to 
search 'umoun', 'umount' will be matched , search 'mounta', 'mountain'
will be matched.

>From my point of view, there are several ways, both with disadvantages:

1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with 'u', 'um', 
'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the index size 
increases dramatically, b) will matches even has no relationship, such as such 
'mount' will match 'mountain' also.

2) Using two pass searching: first pass searches term dictionary through 
TermsComponent using given keyword, then using the first matched term from term 
dictionary to search again. eg: when user enter 'umoun', TermsComponent will 
match 'umount', then use 'umount' to search. The disadvantage are: a) need to 
parse query string so that could recognize meta keywords such as 'AND', 'OR', 
'+', '-', '"' (this makes more complex as I am using PHP client), b) The 
returned hit counts is not for original search string, thus will influence 
other components such as auto-suggest component based on user search history 
and hit counts.

3) Write custom SearchComponent, while have no idea where/how to start with.

Is there any other way in Solr to do this, any feedback/suggestion are welcome!

Thanks very much in advance!


Re: Programmatic Access to Solr schema?

2010-08-09 Thread Erik Hatcher
Are you looking to get access to a remote schema?   You can pull  
schema.xml via HTTP using a URL like:


  

If you're accessing the schema from inside a custom Solr component  
then the IndexSchema API (which you can get to from pretty much any  
context you're in) is the way to go.


Erik


On Aug 9, 2010, at 3:43 AM, Aparna Chaudhary wrote:


Hi,

I need access to solr schema definition(schema.xml) file to perform  
some

validations while constructing the document. I see there is a
class IndexSchema which provides information about the field metadata.

How to get handle to this class?

Aparna Chaudhary
http://blog.aparnachaudhary.net
Twitter: @aparnachaudhary




how to support "implicit trailing wildcards"

2010-08-09 Thread yandong yao
Hi everyone,


How to support 'implicit trailing wildcard *' using Solr, eg: using Google
to search 'umoun', 'umount' will be matched , search 'mounta', 'mountain'
will be matched.

>From my point of view, there are several ways, both with disadvantages:

1) Using EdgeNGramFilterFactory, thus 'umount' will be indexed with 'u',
'um', 'umo', 'umou', 'umoun', 'umount'. The disadvantages are: a) the index
size increases dramatically, b) will matches even has no relationship, such
as such 'mount' will match 'mountain' also.

2) Using two pass searching: first pass searches term dictionary through
TermsComponent using given keyword, then using the first matched term from
term dictionary to search again. eg: when user enter 'umoun', TermsComponent
will match 'umount', then use 'umount' to search. The disadvantage are: a)
need to parse query string so that could recognize meta keywords such as
'AND', 'OR', '+', '-', '"' (this makes more complex as I am using PHP
client), b) The returned hit counts is not for original search string, thus
will influence other components such as auto-suggest component based on user
search history and hit counts.

3) Write custom SearchComponent, while have no idea where/how to start with.

Is there any other way in Solr to do this, any feedback/suggestion are
welcome!

Thanks very much in advance!


Re: how to create a custom type in Solr

2010-08-09 Thread Thomas Joiner
I'd love to see your code on this, however what I've really been wondering
is the following: When did AbstractSubTypeFieldType get added?  It isn't in
1.4.1 (as far as I can tell that's the latest one that is bundled on their
site).  So, do I just need to grab it from subversion, and build it?  And if
so, is there a particular revision that I should go with?  Or should I just
pull trunk and use that, and last of all, is trunk stable enough to be used
in production?

Regards,
Thomas

On Mon, Aug 9, 2010 at 8:38 AM, Mark Allan  wrote:

> On 9 Aug 2010, at 1:01 pm, Otis Gospodnetic wrote:
>
>  Mark,
>>
>> A good way to get your changes/improvements into Solr is by putting them
>> in
>> JIRA.  Please see http://wiki.apache.org/solr/HowToContribute
>>
>> Thanks!
>> Otis
>>
>
>
> Hi Otis,
>
> For the class which requires only minor modifications, I tested it to
> ensure it doesn't break existing compatibility/functionality, and then I
> created an issue in JIRA and uploaded a patch:
>https://issues.apache.org/jira/browse/SOLR-1986
>
> I then posted a message about it to the list and got the following
> responses.
>
> On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
>
>> On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll 
>> wrote:
>>
>>> Originally, I had intended that it was just for one Field Sub Type,
>>> thinking that if we ever wanted multiple sub types, that a new, separate
>>> class would be needed
>>>
>>
>> Right - this was my original thinking too.  AbstractSubTypeFieldType is
>> only a convenience class to create compound types... people can do it other
>> ways.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>
>
> When I replied to ask if that meant the changes wouldn't be included, I got
> no response. As there's been no activity in JIRA, I didn't bother putting
> any of my other changes into JIRA as they all relied on that one.
> Mark
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>


Re: how to create a custom type in Solr

2010-08-09 Thread Mark Allan

On 9 Aug 2010, at 1:01 pm, Otis Gospodnetic wrote:


Mark,

A good way to get your changes/improvements into Solr is by putting  
them in

JIRA.  Please see http://wiki.apache.org/solr/HowToContribute

Thanks!
Otis



Hi Otis,

For the class which requires only minor modifications, I tested it to  
ensure it doesn't break existing compatibility/functionality, and then  
I created an issue in JIRA and uploaded a patch:

https://issues.apache.org/jira/browse/SOLR-1986

I then posted a message about it to the list and got the following  
responses.


On 7 Jul 2010, at 6:24 pm, Yonik Seeley wrote:
On Wed, Jul 7, 2010 at 8:15 AM, Grant Ingersoll  
 wrote:
Originally, I had intended that it was just for one Field Sub Type,  
thinking that if we ever wanted multiple sub types, that a new,  
separate class would be needed


Right - this was my original thinking too.  AbstractSubTypeFieldType  
is only a convenience class to create compound types... people can  
do it other ways.


-Yonik
http://www.lucidimagination.com



When I replied to ask if that meant the changes wouldn't be included,  
I got no response. As there's been no activity in JIRA, I didn't  
bother putting any of my other changes into JIRA as they all relied on  
that one.

Mark


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: wildcards in solr synonyms file

2010-08-09 Thread dotriz

Does the synonyms file support regular expressions?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/wildcards-in-solr-synonyms-file-tp1053691p1055822.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to query a string using solr URL in the browser

2010-08-09 Thread e8en

I forgot something,

when I enter this:
http://172.11.18.120:9000/search/select/?q=text:bracket&q.op=AND&start=0&rows=1000

the result will show all ITEM_ID that contain 'bracket' word in both or one
of ITEM_DESCR_SHORT or ITEM_TITLE

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-query-a-string-using-solr-URL-in-the-browser-tp1052434p1055353.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to query a string using solr URL in the browser

2010-08-09 Thread e8en

Hi Otis,
your answer gave me some lights :)

the input must be:
http://172.11.18.120:9000/search/select/?q=text:bracket&q.op=AND&start=0&rows=1000

how to change 'AUC_DESCR_SHORT:bracket' and 'AUC_TITLE:bracket' into
'text:bracket' ?
is there any solution?

thanks,
Eben
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-query-a-string-using-solr-URL-in-the-browser-tp1052434p1055270.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to query a string using solr URL in the browser

2010-08-09 Thread Otis Gospodnetic
Eben,

Your URL was:

http://172.11.18.120:9000/search/select/?q=text:bracket&q.op=AND&start=0&rows=1000




change text:bracket to:

AUC_DESCR_SHORT:bracket AND AUC_TITLE:bracket

This will need to be URL-encoded, of course.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: e8en 
> To: solr-user@lucene.apache.org
> Sent: Mon, August 9, 2010 3:43:50 AM
> Subject: how to query a string using solr URL in the browser
> 
> 
> hi everyone,
> this is my solr query link and the result that showed in my  browser where
> ITEM_CAT = 1191
> 
>http://172.11.18.120:9000/search/select/?q=ITEM_CAT:1191&q.op=AND&start=0&rows=1000
>0
> 
> 1
> 1.0
> 27017
> Bracket Ceiling untuk semua merk  projector,
> panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan  Sta
> name="ITEM_HTML_DIR_NL">/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html
>>
> 607136
> Nego
> 7
> 270/27017/bracket_lcd_plasma_3a-1274291780.JPG
> 2010-05-19 17:56:45
> [UPDATE] BRACKET Projector dan LCD/PLASMA  TV
> 1
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 28
> 
> 
> what I want to ask  is how to search a word for example "bracket" by changing
> the url into this  (see the bold):
>http://172.11.18.120:9000/search/select/?q=text:bracket&q.op=AND&start=0&rows=1000
>0
> 
> so  it will resulted all ITEM_ID that has AUC_DESCR_SHORT &  AUC_TITLE
> containing "bracket" word
> 
> really really appreciate your  help
> thanks in advance
> 
> cheers,
> Eben
> -- 
> View this message in  context: 
>http://lucene.472066.n3.nabble.com/how-to-query-a-string-using-solr-URL-in-the-browser-tp1052434p1052434.html
>
> Sent  from the Solr - User mailing list archive at Nabble.com.
> 


Re: how to create a custom type in Solr

2010-08-09 Thread Otis Gospodnetic
Mark,

A good way to get your changes/improvements into Solr is by putting them in 
JIRA.  Please see http://wiki.apache.org/solr/HowToContribute

Thanks!
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Mark Allan 
> To: solr-user@lucene.apache.org
> Sent: Mon, August 9, 2010 4:04:29 AM
> Subject: Re: how to create a custom type in Solr
> 
> I see you have the exact same requirements I did and have also hit the same  
>problem I did a month-or-so ago.  I ended up writing a custom field type  
>based 
>on solr.schema.PointType and making some very minor modifications to one  of 
>the 
>Solr classes (AbstractSubfieldType) to allow the field a certain degree  of 
>repeatability without affecting existing functionality.
> 
> I offered my  modifications to the list but no-one seemed interested.  Let me 
>know if you  want the code and a walkthrough of what it does.
> 
> Mark
> 
> On 7 Aug  2010, at 3:36 am, Lance Norskog wrote:
> 
> > Use OR between multiple  ranges.
> > 
> > On Fri, Aug 6, 2010 at 8:52 AM, Thomas Joiner   
>wrote:
> >> This will work for a single range.  However, I may need  to support 
multiple
> >> ranges, is there a way to do that?
> >> 
> >> On Fri, Aug 6, 2010 at 10:49 AM, Jan Høydahl / Cominvent  <
> >> jan@cominvent.com>  wrote:
> >> 
> >>> Your use case can be solved by splitting the  range into two int's:
> >>> 
> >>> Document: {title: My  document, from: 8000, to: 9000}
> >>> Query: q=title:"My" AND (from:[*  TO 8500] AND to:[8500 TO *])
> >>> 
> >>> --
> >>>  Jan Høydahl, search solution architect
> >>> Cominvent AS -  www.cominvent.com
> >>> Training in Europe -  www.solrtraining.com
> >>> 
> >>> On 6. aug. 2010, at 17.02,  Thomas Joiner wrote:
> >>> 
>  I need to have a field  that supports ranges...for instance, you specify 
a
>  range of  8000 to 9000 and if you search for 8500, it will hit.   
However,
>  when googling, I really couldn't find any resources  on how to create 
your
>  own field type in  Solr.
>  
>  But from what I was able to find,  the AbstractSubTypeFieldType class
> >>> seems
>   like a good starting point for the type that I want to make, however  
>that
>  isn't in the current version of Solr that I am using  (1.4.1).  So I 
guess
> >>> my
>  question is: is  Solr 3.0 ready for production?  If so, how do I get it?
> >>> Do  I
>  just need to checkout the code from svn and build it  myself?  If so
> >>> should I
>  just check out  the latest, or is there a particular branch that I should
> >>>  go
>  with that is reliable?  If I switch to 3.0, will I  need to reindex my
> >>> data,
>  or has the data  format not changed?
>  
>  And if 3.0 isn't  ready for production, what would you suggest I do?  Is
> >>>  the
>  AbstractSubTypeFieldType such that I can backport it and  use it with
> >>> 1.4.1,
>  or does it use specific  features of 3.0 that I would have to backport as
>  well, in  which case it would become a horribly convoluted mess where I
> >>>  would
>  be better off just going with 3.0.  And I guess  this comes back to help
> >>> on
>  finding resources  about implementing custom types...it would just be 
more
>   complicated if I couldn't use the AbstractSubTypeFieldType.
>  
>  (This is my first time posting to a mailing list, so if I  have violated
>  horribly some etiquette of mailing lists,  please tell me).
>  
>   Regards,
>  Thomas
> 
> 
> --The University of Edinburgh is  a charitable body, registered in
> Scotland, with registration number  SC005336.
> 
>


Re: wildcards in solr synonyms file

2010-08-09 Thread Otis Gospodnetic
Riz,

The synonyms file doesn't support wildcards.  You'll need to list all numbers 
explicitly.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: M.Rizwan 
> To: solr-user@lucene.apache.org
> Sent: Mon, August 9, 2010 5:46:02 AM
> Subject: wildcards in solr synonyms file
> 
> Hi,
> 
> I want to write an entry in solr synonyms file so that when i search  for
> word discount, solr return all records which have [10]% kind of words  in
> title.
> So for example, i have a document
> cars - 10% off
> and I  search for word "discount", this document should be returned aswell.
> 
> In  synonyms file, I have written
> discount => *%
> 
> but it didn't work.  Any idea how to do this?
> 
> --
> Riz
> 


Re: solr single threaded?

2010-08-09 Thread Otis Gospodnetic
Andy,

A single non-distributed search requests uses a single thread both in  Lucene 
and Solr.  This is kind of normal, so I wouldn't worry about it  if I were you.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Andy 
> To: solr-user@lucene.apache.org
> Sent: Mon, August 9, 2010 7:53:27 AM
> Subject: Re: solr single threaded?
> 
> Otis,
> 
> Thanks. In that case what does it mean that "Lucene search is  single 
>threaded"? How is that different from the Solr  behavior?
> 
> Andy
> 
> --- On Mon, 8/9/10, Otis Gospodnetic   wrote:
> 
> > From: Otis Gospodnetic 
> >  Subject: Re: solr single threaded?
> > To: solr-user@lucene.apache.org
> >  Date: Monday, August 9, 2010, 1:28 AM
> > Andy,
> > 
> > Short  answer: No, Solr will use multiple CPU cores and/or
> > multiple CPUs if  they 
> > are present.
> > 
> > A single non-distributed search  request runs in a single
> > thread, but Solr (and 
> > the servlet  container you put it in) can handle a number of
> > such threads in 
> >  parallel.
> > 
> > Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> > 
> > 
> > 
> > - Original  Message 
> > > From: Andy 
> > > To: solr-user@lucene.apache.org
> >  > Sent: Mon, August 9, 2010 1:19:58 AM
> > > Subject: solr single  threaded?
> > > 
> > > I read that Lucene search is single  threaded. Does
> > that mean Solr search is 
> > >also  single  threaded?
> > > 
> > > What does it mean - that there are no  concurrent 
> > searches & all searches are 
> > >serialized? Can  Solr take advantages of multiple 
> > CPUs?
> > > 
> > >  Thanks.
> > > 
> > > 
> > >   
> > > 
> > 
> 
> 
> 
> 


Re: solr single threaded?

2010-08-09 Thread Andy
Otis,

Thanks. In that case what does it mean that "Lucene search is single threaded"? 
How is that different from the Solr behavior?

Andy

--- On Mon, 8/9/10, Otis Gospodnetic  wrote:

> From: Otis Gospodnetic 
> Subject: Re: solr single threaded?
> To: solr-user@lucene.apache.org
> Date: Monday, August 9, 2010, 1:28 AM
> Andy,
> 
> Short answer: No, Solr will use multiple CPU cores and/or
> multiple CPUs if they 
> are present.
> 
> A single non-distributed search request runs in a single
> thread, but Solr (and 
> the servlet container you put it in) can handle a number of
> such threads in 
> parallel.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
> > From: Andy 
> > To: solr-user@lucene.apache.org
> > Sent: Mon, August 9, 2010 1:19:58 AM
> > Subject: solr single threaded?
> > 
> > I read that Lucene search is single threaded. Does
> that mean Solr search is 
> >also  single threaded?
> > 
> > What does it mean - that there are no concurrent 
> searches & all searches are 
> >serialized? Can Solr take advantages of multiple 
> CPUs?
> > 
> > Thanks.
> > 
> > 
> >       
> > 
> 





Re: how to query a string using solr URL in the browser

2010-08-09 Thread e8en

someone please help me, I'm running out of time :(
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-query-a-string-using-solr-URL-in-the-browser-tp1052434p1054327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How do i update some document when i use sharding indexs?

2010-08-09 Thread Geert-Jan Brits
Just to be completely clear: the program that splits your index in 20 shards
should employ this algo as well.


2010/8/9 Geert-Jan Brits 

> I'm not sure if Solr has some build-in support for sharding-functions, but
> you should generally use some hashing-algorithm to split the indices and use
> the same hash-algorithm to locate which shard contains a document.
> http://en.wikipedia.org/wiki/Hash_function
>
> Without employing any domain knowledge (of documents you possible want to
> group toegether on a single shard for performance) you could build a very
> simple (crude) hash-function by md5-hashing the unique-keys of your
> documents, taking the first 3 chars (should be precise enough, so load is
> pretty much balanced), calculate a nr from the chars (256 * first char + 16
> * 2nd char + 3rd char), and take that nr modulo 20. That should give you a
> nr in [0,20) which is the shard-index.
>
> use the same algorithm to determine which shard contains the document that
> you want to change.
>
> Geert-Jan
>
>
> 2010/8/9 lu.rongbin 
>
>
>>My index has 76 million documents, I split it to 20 indexs because the
>> size of index is 33G. I deploy 20 shards for search response performence
>> on
>> ec2's 20 instances.But when i wan't to update some doc, it means i must
>> traversal each index , and find the document is in which shard index, and
>> update the doc? It's crazy! How can i do?
>>thanks.
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/How-do-i-update-some-document-when-i-use-sharding-indexs-tp1053509p1053509.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>


Re: How do i update some document when i use sharding indexs?

2010-08-09 Thread Geert-Jan Brits
I'm not sure if Solr has some build-in support for sharding-functions, but
you should generally use some hashing-algorithm to split the indices and use
the same hash-algorithm to locate which shard contains a document.
http://en.wikipedia.org/wiki/Hash_function

Without employing any domain knowledge (of documents you possible want to
group toegether on a single shard for performance) you could build a very
simple (crude) hash-function by md5-hashing the unique-keys of your
documents, taking the first 3 chars (should be precise enough, so load is
pretty much balanced), calculate a nr from the chars (256 * first char + 16
* 2nd char + 3rd char), and take that nr modulo 20. That should give you a
nr in [0,20) which is the shard-index.

use the same algorithm to determine which shard contains the document that
you want to change.

Geert-Jan


2010/8/9 lu.rongbin 

>
>My index has 76 million documents, I split it to 20 indexs because the
> size of index is 33G. I deploy 20 shards for search response performence on
> ec2's 20 instances.But when i wan't to update some doc, it means i must
> traversal each index , and find the document is in which shard index, and
> update the doc? It's crazy! How can i do?
>thanks.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-do-i-update-some-document-when-i-use-sharding-indexs-tp1053509p1053509.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr 1.4.1 and 3x: Grouping of query changes results

2010-08-09 Thread gwk

On 8/9/2010 12:01 AM, David Benson wrote:

I'm seeing what I believe to be a logic error in the processing of a query.

Returns document 1234 as expected:
id:1234 AND -indexid:1 AND -indexid:2 AND -indexid:3

Does not return document as expected:
id:1234 AND (-indexid:1 AND -indexid:2) AND -indexid:3

Has anyone else experienced this? The exact placement of the parens isn't key, 
just adding a level of nesting changes the query results.

Thanks,

David
   


Hi,

I could be wrong but I think this has to do with Solr's lack of support 
for purely negative queries, try the following and see if it behaves 
correctly:


id:1234 AND (*:* AND -indexid:1 AND -indexid:2) AND -indexid:3

Regards,

gwk


wildcards in solr synonyms file

2010-08-09 Thread M.Rizwan
Hi,

I want to write an entry in solr synonyms file so that when i search for
word discount, solr return all records which have [10]% kind of words in
title.
So for example, i have a document
cars - 10% off
and I search for word "discount", this document should be returned aswell.

In synonyms file, I have written
discount => *%

but it didn't work. Any idea how to do this?

--
Riz


How do i update some document when i use sharding indexs?

2010-08-09 Thread lu.rongbin

My index has 76 million documents, I split it to 20 indexs because the
size of index is 33G. I deploy 20 shards for search response performence on
ec2's 20 instances.But when i wan't to update some doc, it means i must
traversal each index , and find the document is in which shard index, and
update the doc? It's crazy! How can i do? 
thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-do-i-update-some-document-when-i-use-sharding-indexs-tp1053509p1053509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to create a custom type in Solr

2010-08-09 Thread Mark Allan
I see you have the exact same requirements I did and have also hit the  
same problem I did a month-or-so ago.  I ended up writing a custom  
field type based on solr.schema.PointType and making some very minor  
modifications to one of the Solr classes (AbstractSubfieldType) to  
allow the field a certain degree of repeatability without affecting  
existing functionality.


I offered my modifications to the list but no-one seemed interested.   
Let me know if you want the code and a walkthrough of what it does.


Mark

On 7 Aug 2010, at 3:36 am, Lance Norskog wrote:


Use OR between multiple ranges.

On Fri, Aug 6, 2010 at 8:52 AM, Thomas Joiner > wrote:
This will work for a single range.  However, I may need to support  
multiple

ranges, is there a way to do that?

On Fri, Aug 6, 2010 at 10:49 AM, Jan Høydahl / Cominvent <
jan@cominvent.com> wrote:


Your use case can be solved by splitting the range into two int's:

Document: {title: My document, from: 8000, to: 9000}
Query: q=title:"My" AND (from:[* TO 8500] AND to:[8500 TO *])

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 6. aug. 2010, at 17.02, Thomas Joiner wrote:

I need to have a field that supports ranges...for instance, you  
specify a
range of 8000 to 9000 and if you search for 8500, it will hit.   
However,
when googling, I really couldn't find any resources on how to  
create your

own field type in Solr.

But from what I was able to find, the AbstractSubTypeFieldType  
class

seems
like a good starting point for the type that I want to make,  
however that
isn't in the current version of Solr that I am using (1.4.1).  So  
I guess

my
question is: is Solr 3.0 ready for production?  If so, how do I  
get it?

Do I

just need to checkout the code from svn and build it myself?  If so

should I
just check out the latest, or is there a particular branch that I  
should

go
with that is reliable?  If I switch to 3.0, will I need to  
reindex my

data,

or has the data format not changed?

And if 3.0 isn't ready for production, what would you suggest I  
do?  Is

the
AbstractSubTypeFieldType such that I can backport it and use it  
with

1.4.1,
or does it use specific features of 3.0 that I would have to  
backport as
well, in which case it would become a horribly convoluted mess  
where I

would
be better off just going with 3.0.  And I guess this comes back  
to help

on
finding resources about implementing custom types...it would just  
be more

complicated if I couldn't use the AbstractSubTypeFieldType.

(This is my first time posting to a mailing list, so if I have  
violated

horribly some etiquette of mailing lists, please tell me).

Regards,
Thomas



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



Re: Indexing fieldvalues with dashes and spaces

2010-08-09 Thread PeterKerk

Hi Erick,

Ok. its more clear now. I indeed have the whitespace tokenizer:


  






  



What happens is that I have a field called 'Beach & Sea", which is a theme
for a location. What happens because of the whitespace tokenizer, it gets
split up in 2 fields: 
 "Beach",2,
 "Sea",2],
(see below)

Ofcourse those individual facet names are NOT correct facetnames, because it
should be "Beach & Sea".
But if I REMOVE the whitespace tokenizer, it throws an error that a
fieldtype should always have a tokenizer.
But which tokenizer would I need in order for me to have the correct facet
name?
(I've been checking this page
btw:http://lucene.apache.org/solr/api/org/apache/solr/analysis/package-summary.html)


"facet_counts":{
  "facet_queries":{},
  "facet_fields":{
"themes":[
 "Gemeentehuis",2,
 "Beach",2,
 "Sea",2],
"province":[
 "gelderland",1,
 "utrecht",1,
 "zuidholland",1],
"services":[
 "exclusiev",2,
 "fotoreportag",2,
 "hur",2,
 "liv",1,
 "muziek",1]},
  "facet_dates":{}}}



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1052554.html
Sent from the Solr - User mailing list archive at Nabble.com.


how to query a string using solr URL in the browser

2010-08-09 Thread e8en

hi everyone,
this is my solr query link and the result that showed in my browser where
ITEM_CAT = 1191

http://172.11.18.120:9000/search/select/?q=ITEM_CAT:1191&q.op=AND&start=0&rows=1000

1
1.0
27017
Bracket Ceiling untuk semua merk projector,
panjang 60-90 cm  Bahan Besi Cat Hitam = 325rb Bahan Sta
/aksesoris-batere-dan-tripod/update-bracket-projector-dan-lcd-plasma-tv-607136.html
607136
Nego
7
270/27017/bracket_lcd_plasma_3a-1274291780.JPG
2010-05-19 17:56:45
[UPDATE] BRACKET Projector dan LCD/PLASMA TV
1
0
0
0
0
0
0
0
28


what I want to ask is how to search a word for example "bracket" by changing
the url into this (see the bold):
http://172.11.18.120:9000/search/select/?q=text:bracket&q.op=AND&start=0&rows=1000

so it will resulted all ITEM_ID that has AUC_DESCR_SHORT & AUC_TITLE
containing "bracket" word

really really appreciate your help
thanks in advance

cheers,
Eben
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-query-a-string-using-solr-URL-in-the-browser-tp1052434p1052434.html
Sent from the Solr - User mailing list archive at Nabble.com.


Programmatic Access to Solr schema?

2010-08-09 Thread Aparna Chaudhary
Hi,

I need access to solr schema definition(schema.xml) file to perform some
validations while constructing the document. I see there is a
class IndexSchema which provides information about the field metadata.

How to get handle to this class?

Aparna Chaudhary
http://blog.aparnachaudhary.net
Twitter: @aparnachaudhary