date:20090604


We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
the index size went from 1.5 Gb to 2.7 Gb.

Is that some expected behavior ?

Is there any switch or trick to avoid having a double + index file size?

Koji Sekiguchi-2 wrote:
 
 CharFilter can normalize (convert) traditional chinese to simplified 
 chinese or vice versa,
 if you define mapping.txt. Here is the sample of Chinese character 
 normalization:
 
 https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
 
 See SOLR-822 for the detail:
 
 https://issues.apache.org/jira/browse/SOLR-822
 
 Koji
 
 
 revathy arun wrote:
 Hi,

 When I index chinese content using chinese tokenizer and analyzer in solr
 1.3 ,some of the chinese text files are getting indexed but others are
 not.

 Since chinese has got many different language subtypes as in standard
 chinese,simplified chinese etc which of these does the chinese tokenizer
 support and is there any method to find the type of  chiense language 
 from
 the file?

 Rgds

   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which caches should use the solr.FastLRUCache

FastLRUCache is designed to be lock free so it is well suited for
caches which are hit several times in a request. I guess there is no
harm in using FastLRUCache across all the caches.

On Thu, Jun 4, 2009 at 3:22 AM, Robert Purdy rdpu...@gmail.com wrote:

 Hey there,

 Anyone got any advice on which caches (filterCache, queryResultCache,
 documentCache, fieldValueCache) should be implemented using the
 solr.FastLRUCache in solr 1.4 and what are the pros  cons
 vs the solr.LRUCache.

 Thanks Robert.
 --
 View this message in context: 
 http://www.nabble.com/Which-caches-should-use-the-solr.FastLRUCache-tp23860182p23860182.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Token filter on multivalue field

isn't better to use an UpdateProcessor  for this?

On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Hello,

 It's ugly, but the first thing that came to mind was ThreadLocal.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: David Giffin da...@giffin.org
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 3, 2009 1:57:42 PM
 Subject: Token filter on multivalue field

 Hi There,

 I'm working on a unique token filter, to eliminate duplicates on a
 multivalue field. My filter works properly for a single value field.
 It seems that a new TokenFilter is created for each value in the
 multivalue field. I need to maintain an array of used tokens across
 all of the values in the multivalue field. Is there a good way to do
 this? Here is my current code:

 public class UniqueTokenFilter extends TokenFilter {

     private ArrayList words;
     public UniqueTokenFilter(TokenStream input) {
         super(input);
         this.words = new ArrayList();
     }

     @Override
     public final Token next(Token in) throws IOException {
         for (Token token=input.next(in); token!=null; token=input.next()) {
             if ( !words.contains(token.term()) ) {
                 words.add(token.term());
                 return token;
             }
         }
         return null;
     }
 }

 Thanks,
 David





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

RE: Strange behaviour with copyField

2009-06-04 Thread Radha C.

What is the defaultOperator set in your solrconfig.xml? Are you sure that it
matches for au and not author? 

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Thursday, June 04, 2009 2:53 AM
To: solr-user@lucene.apache.org
Subject: Re: Strange behaviour with copyField

On Jun 3, 2009, at 5:09 AM, James Grant wrote:

 I've been hitting my head against a wall all morning trying to  
 figure this out and haven't managed to get anywhere and wondered if  
 anybody here can help.

 I have defined a field type

   fieldType name=text_au class=solr.TextField  
 positionIncrementGap=100
 analyzer
   tokenizer class=solr.LowerCaseTokenizerFactory /
 /analyzer
   /fieldType

 I have two fields

 field name=au type=text_au indexed=true stored=true  
 required=false multiValued=true/
 field name=author type=text_au indexed=true stored=false  
 multiValued=true/

I don't see the difference, as they are the same FieldType for each  
field, text_au.  Is this a typo or am I missing something?

 and a copyField line

 copyField source=au dest=author /

 The idea is to allow searching for authors so a search for author: 
 (Hobbs A.U.) will match the au field value Hobbs A. U. (notice  
 the space).

What would lower casing do for handling the space?

 However the query au:(Hobbs A.U.) matches and the the query  
 author:(Hobbs A.U.) does not.

 Any ideas?

How are you indexing?

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Field Compression


Is it correct to assume that using field compression will cause performance
issues if we decide to allow search over this field?

ie:

 field name=id  type=sint  indexed=true  stored=true
required=true / 
 field name=title type=textindexed=true  stored=true
omitNorms=true/
   field name=file_location  type=string   indexed=false
stored=true/
 field name=body type=text indexed=true  stored=false
omitNorms=true/

if I decide to add compressed=true  to the BODY field... and a I allow
search on body... would that be a problem?
At the same time: if I add compressed=true , but I never do search on this
field ?
  

Stu Hood-3 wrote:
 
 I just finished watching this talk about a column-store RDBMS, which has a
 long section on column compression. Specifically, it talks about the gains
 from compressing similar data together, and how lazily decompressing data
 only when it must be processed is great for memory/CPU cache usage.
 
 http://youtube.com/watch?v=yrLd-3lnZ58
 
 While interesting, its not relevant to Lucene's stored field storage. On
 the other hand, it did get me thinking about stored field compression and
 lazy field loading.
 
 Can anyone give me some pointers about compressThreshold values that would
 be worth experimenting with? Our stored fields are often between 20 and
 300 characters, and we're willing to spend more time indexing if it will
 make searching less IO bound.
 
 Thanks,
 
 Stu Hood
 Architecture Software Developer
 Mailtrust, a Rackspace Company
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Field-Compression-tp15258669p23865558.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spell checking

2009-06-04 Thread Michael Ludwig


Yao Ge schrieb:


Maybe we should call this alternative search terms or
suggested search terms instead of spell checking. It is
misleading as there is no right or wrong in spelling, there
is only popular (term frequency?) alternatives.


I had exactly the same difficulty in understanding the concept
because of the name given to the feature, which usually denotes
just what it says, i.e. a spellchecker, which is driven by an
authoritative dictionary and a set of rules, as integrated in
word processors, in order to ensure orthography.

What we have here is quite different from a spellchecker.

IMHO, a name conveying the actual meaning, along the lines of
suggest, would make more sense.

Michael Ludwig

HashDocSet's maxSize and loadFactor

2009-06-04 Thread Marc Sturlese


Hey there, I am trying to optimize the setup of hasDocSet.
Have read the documentation here:
http://wiki.apache.org/solr/SolrPerformanceFactors#head-2de2e9a6f806ab8a3afbd73f1d99ece48e27b3ab
But can't exactly understand it.
Does it mean that the maxSize should be 0.005 x NumberDocsOfMyIndex or that
maxSize should be aprox the same than the number of docs of my index?
And... what's the loadFactor?

Thanks in advance
-- 
View this message in context: 
http://www.nabble.com/HashDocSet%27s-maxSize-and-loadFactor-tp23868434p23868434.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing Chienese langage

2009-06-04 Thread Erick Erickson

Hmmm, are you quite sure that you emptied the index first and didn'tjust add
all the documents a second time to the index?

Also, when you say the index almost doubled, were you looking only
at the size of the *directory*? SOLR might have been holding a copy
of the old index open while you built a new one...

Best
Erick

On Thu, Jun 4, 2009 at 2:20 AM, Fer-Bj fernando.b...@gmail.com wrote:


 We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
 the index size went from 1.5 Gb to 2.7 Gb.

 Is that some expected behavior ?

 Is there any switch or trick to avoid having a double + index file size?

 Koji Sekiguchi-2 wrote:
 
  CharFilter can normalize (convert) traditional chinese to simplified
  chinese or vice versa,
  if you define mapping.txt. Here is the sample of Chinese character
  normalization:
 
 
 https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
 
  See SOLR-822 for the detail:
 
  https://issues.apache.org/jira/browse/SOLR-822
 
  Koji
 
 
  revathy arun wrote:
  Hi,
 
  When I index chinese content using chinese tokenizer and analyzer in
 solr
  1.3 ,some of the chinese text files are getting indexed but others are
  not.
 
  Since chinese has got many different language subtypes as in standard
  chinese,simplified chinese etc which of these does the chinese tokenizer
  support and is there any method to find the type of  chiense language
  from
  the file?
 
  Rgds
 
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Compression

2009-06-04 Thread Erick Erickson

Warning: This is from a Lucene perspective
I don't think it matters. I'm pretty sure that COMPRESS onlyapplies to
*storing* the data, not putting the tokens in the index
(this latter is what's serached)...

It *will* cause performance issues if you load that field for a large
number of documents on a particular search. I know Lucene itself
has lazy field loading that helps in this case, but I don't know how
to persuade SOLR to use it (it may even lazy-load automatically).
But this is separate from searching...

Best
er...@nottoomuchhelpbutimtrying.

On Thu, Jun 4, 2009 at 4:07 AM, Fer-Bj fernando.b...@gmail.com wrote:


 Is it correct to assume that using field compression will cause performance
 issues if we decide to allow search over this field?

 ie:

  field name=id  type=sint  indexed=true  stored=true
 required=true /
  field name=title type=textindexed=true  stored=true
 omitNorms=true/
   field name=file_location  type=string   indexed=false
 stored=true/
  field name=body type=text indexed=true  stored=false
 omitNorms=true/

 if I decide to add compressed=true  to the BODY field... and a I allow
 search on body... would that be a problem?
 At the same time: if I add compressed=true , but I never do search on this
 field ?


 Stu Hood-3 wrote:
 
  I just finished watching this talk about a column-store RDBMS, which has
 a
  long section on column compression. Specifically, it talks about the
 gains
  from compressing similar data together, and how lazily decompressing data
  only when it must be processed is great for memory/CPU cache usage.
 
  http://youtube.com/watch?v=yrLd-3lnZ58
 
  While interesting, its not relevant to Lucene's stored field storage. On
  the other hand, it did get me thinking about stored field compression and
  lazy field loading.
 
  Can anyone give me some pointers about compressThreshold values that
 would
  be worth experimenting with? Our stored fields are often between 20 and
  300 characters, and we're willing to spend more time indexing if it will
  make searching less IO bound.
 
  Thanks,
 
  Stu Hood
  Architecture Software Developer
  Mailtrust, a Rackspace Company
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Field-Compression-tp15258669p23865558.html
 Sent from the Solr - User mailing list archive at Nabble.com.

SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Michael Ludwig


Shalin Shekhar Mangar wrote:

| If you use spellcheck.q parameter for specifying
| the spelling query, then the field's analyzer will
| be used [...] If you use the q parameter, then the
| SpellingQueryConverter is used.

http://markmail.org/message/k35r7qmpatjvllsc - message
http://markmail.org/thread/gypvpfnsd5sggkpx  - whole thread

Is it correct to say that when I intend to always use
the spellcheck.q parameter I do not need to specify a
queryAnalyzerFieldType in my spellcheck searchComponent,
which I define in solrconfig.xml?

Given the limitations of the SpellingQueryConverter laid
out in the thread referred to above, it seems you want to
use the spellcheck.q parameter for anything but what can
be encoded in ASCII. Is that true?

Michael Ludwig

Re: spell checking

2009-06-04 Thread Walter Underwood

query suggest --wunder

On 6/4/09 1:25 AM, Michael Ludwig m...@as-guides.com wrote:

 Yao Ge schrieb:
 
 Maybe we should call this alternative search terms or
 suggested search terms instead of spell checking. It is
 misleading as there is no right or wrong in spelling, there
 is only popular (term frequency?) alternatives.
 
 I had exactly the same difficulty in understanding the concept
 because of the name given to the feature, which usually denotes
 just what it says, i.e. a spellchecker, which is driven by an
 authoritative dictionary and a set of rules, as integrated in
 word processors, in order to ensure orthography.
 
 What we have here is quite different from a spellchecker.
 
 IMHO, a name conveying the actual meaning, along the lines of
 suggest, would make more sense.
 
 Michael Ludwig

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer

Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer

--- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:

From: Silent Surfer silentsurfe...@yahoo.com
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM

Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer

Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer

--- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:

From: Silent Surfer silentsurfe...@yahoo.com
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM

Hi,
I am new to Lucene forum and it is my first question.I need a clarification 
from you.
Requirement:--1. Build a IT search tool for logs similar to 
that of Splunk(Only wrt searching logs but not in terms of reporting, graphs 
etc) using solr/lucene. The log files are mainly the server logs like JBoss, 
Custom application server logs (May or may not be log4j logs) and the files 
size can go potentially upto 100 MB2. The logs are spread across multiple 
servers (25 to 30 servers)2. Capability to be do search almost realtime3. 
Support  distributed search

Our search criterion can be based on a keyword or timestamp or IP address etc.
Can anyone throw some light if solr/lucene is right solution for this ?
Appreciate any quick help in this regard.
Thanks,Surfer

Re: Which caches should use the solr.FastLRUCache

2009/6/4 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 FastLRUCache is designed to be lock free so it is well suited for
 caches which are hit several times in a request. I guess there is no
 harm in using FastLRUCache across all the caches.

Gets are cheaper, but evictions are more expensive.  If the cache hit
rate is low, the old synchronized cache may be faster, unless you have
a ton of CPUs... not sure where the crossover point is though.

-Yonik
http://www.lucidimagination.com

Re: Questions regarding IT search solution

2009-06-04 Thread Walter Underwood

Why build one? Don't those already exist?

Personally, I'd start with Hadoop instead of Solr. Putting logs in a
search index is guaranteed to not scale. People were already trying
different approaches ten years ago.

wunder

On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote:

 Hi,
 Any help/pointers on the following message would really help me..
 Thanks,Surfer
 
 --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:
 
 From: Silent Surfer silentsurfe...@yahoo.com
 Subject: Questions regarding IT search solution
 To: solr-user@lucene.apache.org
 Date: Tuesday, June 2, 2009, 5:45 PM
 
 Hi,
 I am new to Lucene forum and it is my first question.I need a clarification
 from you.
 Requirement:--1. Build a IT search tool for logs similar to
 that of Splunk(Only wrt searching logs but not in terms of reporting, graphs
 etc) using solr/lucene. The log files are mainly the server logs like JBoss,
 Custom application server logs (May or may not be log4j logs) and the files
 size can go potentially upto 100 MB2. The logs are spread across multiple
 servers (25 to 30 servers)2. Capability to be do search almost realtime3.
 Support  distributed search
 
 Our search criterion can be based on a keyword or timestamp or IP address etc.
 Can anyone throw some light if solr/lucene is right solution for this ?
 Appreciate any quick help in this regard.
 Thanks,Surfer

Faceting on text fields

2009-06-04 Thread Yao Ge


I am index a database with over 1 millions rows. Two of fields contain
unstructured text but size of each fields is limited (256 characters). 

I come up with an idea to use visualize the text fields using text cloud by
turning the two text fields in facets. The weight of font and size is of
each facet value (words) derived from the facet counts. I used simpler field
type so that the there is no stemming to these facet values:
fieldType name=word class=solr.TextField positionIncrementGap=100

  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=false/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=0 generateNumberParts=0 catenateWords=1
catenateNumbers=1 catenateAll=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

The facet query is considerably slower comparing to other facets from
structured database fields (with highly repeated values). What I found
interesting is that even after I constrained search results to just a few
hunderd hits using other facets, these text facets are still very slow.  

I understand that text fields are not good candidate for faceting as it can
contain very large number of unique values. However why it is still slow
after my matching documents is reduced to hundreds? Is it because the whole
filter is cached (regardless the matching docs) and I don't have enough
filter cache size to fit the whole list?

The following is my filterCahce setting:
 filterCache class=solr.LRUCache size=5120 initialSize=512
autowarmCount=128/

Lastly, what I really want to is to give user a chance to visualize and
filter on top relevant words in the free-text fields. Are there alternative
to facet field approach? term vectors? I can do client side process based on
top N (say 100) hits for this but it is my last option.
-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html
Sent from the Solr - User mailing list archive at Nabble.com.

statistics about word distances in solr

2009-06-04 Thread Jens Fischer

Hi,

 

I was wondering if there's an option to return statistics about distances
from the query terms to the most frequent terms in the result documents.

At present I return the most frequent terms using facetSearch which returns
for each word in the result documents the number ob occurences (within the
results).

The additional information I'm looking for is the average distance between
these terms and my search term.

 

So let's say I have two docs

the house is red

I live in a red house

The search for house should also return the info

the:1

is:1

red:1.5

I:5

live:4

and so on...

 

 

As I wasn't able to find such a function I thought about two solution for
the problem

 

1) Use facetSearch and implement a different facet.method which calculates
the average distance of a word to the given search term.

Solr doesn't seem to provide an interface to  implement a different method
so I think this solution would be a bit dogdy and would lead to problems
with the next Solr Upgrade.

 

2) Using the TermVectorComponent which return the position of each word
within a document, I could calculate the distance based on this data in the
application.

But TermVectorComponent returns information per document which means I would
need to return all documents of the result set which is probably not
recommended.

 

 

My question is

a) Did a miss a function of Solr that already does what I'm looking for?

 

b) Is solution 2) feasible even if I always have to return all docs of the
results set (the content doesn't need to be return though, just the
statistics)

 

c) Are the interfaces to ammend facetSearch the way I described which I
might have missed?

 

 

 

Thanks

Jens

Re: Field Compression

2009-06-04 Thread Grant Ingersoll



On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote:



It *will* cause performance issues if you load that field for a large
number of documents on a particular search. I know Lucene itself
has lazy field loading that helps in this case, but I don't know how
to persuade SOLR to use it (it may even lazy-load automatically).
But this is separate from searching...


Lazy loading is an option configured in the solrconfig.xml

Re: SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Shalin Shekhar Mangar

On Thu, Jun 4, 2009 at 7:24 PM, Michael Ludwig m...@as-guides.com wrote:

 Shalin Shekhar Mangar wrote:

 | If you use spellcheck.q parameter for specifying
 | the spelling query, then the field's analyzer will
 | be used [...] If you use the q parameter, then the
 | SpellingQueryConverter is used.

 http://markmail.org/message/k35r7qmpatjvllsc - message
 http://markmail.org/thread/gypvpfnsd5sggkpx  - whole thread

 Is it correct to say that when I intend to always use
 the spellcheck.q parameter I do not need to specify a
 queryAnalyzerFieldType in my spellcheck searchComponent,
 which I define in solrconfig.xml?


Yes, that is correct.

Even if a queryAnalyzerFieldType is not specified and your query uses q,
then WhitespaceTokenizer is used by default.


 Given the limitations of the SpellingQueryConverter laid
 out in the thread referred to above, it seems you want to
 use the spellcheck.q parameter for anything but what can
 be encoded in ASCII. Is that true?


Umm, no actually. SpellingQueryConverter was written for a very simple
use-case dealing with ASCII only. But there is no reason why we cannot
extend it to cover the full UTF-8 set.

I'm sorry I forgot to follow-up on the old thread where you and Jonathan
posted a regex that should work. Can you please open an issue and if
possible, give a patch?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Faceting on text fields

Are you using Solr 1.3?
You might want to try the latest 1.4 test build - faceting has changed a lot.

-Yonik
http://www.lucidimagination.com

On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote:

 I am index a database with over 1 millions rows. Two of fields contain
 unstructured text but size of each fields is limited (256 characters).

 I come up with an idea to use visualize the text fields using text cloud by
 turning the two text fields in facets. The weight of font and size is of
 each facet value (words) derived from the facet counts. I used simpler field
 type so that the there is no stemming to these facet values:
    fieldType name=word class=solr.TextField positionIncrementGap=100

      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

 The facet query is considerably slower comparing to other facets from
 structured database fields (with highly repeated values). What I found
 interesting is that even after I constrained search results to just a few
 hunderd hits using other facets, these text facets are still very slow.

 I understand that text fields are not good candidate for faceting as it can
 contain very large number of unique values. However why it is still slow
 after my matching documents is reduced to hundreds? Is it because the whole
 filter is cached (regardless the matching docs) and I don't have enough
 filter cache size to fit the whole list?

 The following is my filterCahce setting:
     filterCache class=solr.LRUCache size=5120 initialSize=512
 autowarmCount=128/

 Lastly, what I really want to is to give user a chance to visualize and
 filter on top relevant words in the free-text fields. Are there alternative
 to facet field approach? term vectors? I can do client side process based on
 top N (say 100) hits for this but it is my last option.
 --
 View this message in context: 
 http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Which caches should use the solr.FastLRUCache

2009-06-04 Thread Robert Purdy


Thanks for the Good information :) Well I haven't had any evictions in any of
the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in
documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So
in your opinion should the documentCache and queryResultCache use the old
way on a single CPU quad core machine? 

Also right now I have all caches using the solr.FastLRUCache (tried with
both the cleanupThread = false or true) and I have noticed some queries that
are taking 53 ms on a freshly warmed new searcher (when nothing else is
querying the slave), but when the slave is busy the same query, that should
be using the caches, is sometimes taking 8 secs? Any thoughts?

Thanks Robert.


Yonik Seeley-2 wrote:
 
 2009/6/4 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 FastLRUCache is designed to be lock free so it is well suited for
 caches which are hit several times in a request. I guess there is no
 harm in using FastLRUCache across all the caches.
 
 Gets are cheaper, but evictions are more expensive.  If the cache hit
 rate is low, the old synchronized cache may be faster, unless you have
 a ton of CPUs... not sure where the crossover point is though.
 
 -Yonik
 http://www.lucidimagination.com
 
 

-- 
View this message in context: 
http://www.nabble.com/Which-caches-should-use-the-solr.FastLRUCache-tp23860182p23874898.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: HashDocSet's maxSize and loadFactor

On Thu, Jun 4, 2009 at 7:52 AM, Marc Sturlese marc.sturl...@gmail.com wrote:
 Hey there, I am trying to optimize the setup of hasDocSet.

Be aware that in the latest versions of Solr 1.4, HashDocSet is no
longer used by Solr.
https://issues.apache.org/jira/browse/SOLR-1169

 Have read the documentation here:
 http://wiki.apache.org/solr/SolrPerformanceFactors#head-2de2e9a6f806ab8a3afbd73f1d99ece48e27b3ab
 But can't exactly understand it.
 Does it mean that the maxSize should be 0.005 x NumberDocsOfMyIndex or that
 maxSize should be aprox the same than the number of docs of my index?

The former.

 And... what's the loadFactor?

loadFactor: size of the hash table compared to the number of elements stored.
http://en.wikipedia.org/wiki/Hash_table

-Yonik
http://www.lucidimagination.com

Index Comma Separated numbers

2009-06-04 Thread Jianbin Dai


Hi, One of the fields to be indexed is price which is comma separated, e.g., 
12,034.00.  How can I indexed it as a number? 
I am using DIH to pull the data. Thanks.

Re: Faceting on text fields

2009-06-04 Thread Yao Ge


Yes. I am using 1.3. When is 1.4 due for release?


Yonik Seeley-2 wrote:
 
 Are you using Solr 1.3?
 You might want to try the latest 1.4 test build - faceting has changed a
 lot.
 
 -Yonik
 http://www.lucidimagination.com
 
 On Thu, Jun 4, 2009 at 12:01 PM, Yao Ge yao...@gmail.com wrote:

 I am index a database with over 1 millions rows. Two of fields contain
 unstructured text but size of each fields is limited (256 characters).

 I come up with an idea to use visualize the text fields using text cloud
 by
 turning the two text fields in facets. The weight of font and size is of
 each facet value (words) derived from the facet counts. I used simpler
 field
 type so that the there is no stemming to these facet values:
    fieldType name=word class=solr.TextField
 positionIncrementGap=100

      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=false/
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=0 generateNumberParts=0 catenateWords=1
 catenateNumbers=1 catenateAll=0/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.RemoveDuplicatesTokenFilterFactory/
      /analyzer
    /fieldType

 The facet query is considerably slower comparing to other facets from
 structured database fields (with highly repeated values). What I found
 interesting is that even after I constrained search results to just a few
 hunderd hits using other facets, these text facets are still very slow.

 I understand that text fields are not good candidate for faceting as it
 can
 contain very large number of unique values. However why it is still slow
 after my matching documents is reduced to hundreds? Is it because the
 whole
 filter is cached (regardless the matching docs) and I don't have enough
 filter cache size to fit the whole list?

 The following is my filterCahce setting:
     filterCache class=solr.LRUCache size=5120 initialSize=512
 autowarmCount=128/

 Lastly, what I really want to is to give user a chance to visualize and
 filter on top relevant words in the free-text fields. Are there
 alternative
 to facet field approach? term vectors? I can do client side process based
 on
 top N (say 100) hits for this but it is my last option.
 --
 View this message in context:
 http://www.nabble.com/Faceting-on-text-fields-tp23872891p23872891.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Faceting-on-text-fields-tp23872891p23876051.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to disable posting updates from a remote server

2009-06-04 Thread ashokc


Hi,

I find that I am freely able to post to my production SOLR server, from any
other host that can run the post command. So somebody can wipe out the whole
index by posting a delete query. Is there a way SOLR can be configured so
that it will take updates ONLY from the server on which it is running?
Thanks - ashok
-- 
View this message in context: 
http://www.nabble.com/How-to-disable-posting-updates-from-a-remote-server-tp23876170p23876170.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to disable posting updates from a remote server

2009-06-04 Thread Eric Pugh

Take a look at the security section in the wiki, u could do this with
firewall rules or password access.

On Thursday, June 4, 2009, ashokc ash...@qualcomm.com wrote:

 Hi,

 I find that I am freely able to post to my production SOLR server, from any
 other host that can run the post command. So somebody can wipe out the whole
 index by posting a delete query. Is there a way SOLR can be configured so
 that it will take updates ONLY from the server on which it is running?
 Thanks - ashok
 --
 View this message in context: 
 http://www.nabble.com/How-to-disable-posting-updates-from-a-remote-server-tp23876170p23876170.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Customizing results


Hello,

If you know what language the user specified (or is associated with), then you 
just have to ensure the fl URL parameter contain that field (and any other 
fields you want returned).  So if the language/locale is de_de, then make sure 
the request has fl=location_de_de,another_field,another_field, and not, for 
example location_it_it

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Manepalli, Kalyan kalyan.manepa...@orbitz.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 12:36:30 PM
 Subject: Customizing results
 
 Hi,
 I am trying to customize the response that I receive from Solr. 
 In 
 the index I have multiple fields that contain the same data in different 
 language.
 At the query time client specifies the language. Based on this param, I want 
 to 
 return the value, copied into a different field.
 Eg:
 Lubang, Filippinerne
 Lubang, Philippinen
 Lubang, Philippines
 Lubang, Filipinas
 
 If the user specifies language as de_de, then I want to return the result as
 Lubang, Philippinen
 
 What is the most optimal way of doing this?
 Any suggestions on this will be helpful
 
 Thanks,
 Kalyan Manepalli

Re: Questions regarding IT search solution

2009-06-04 Thread Alexandre Rafalovitch

I would also be interested to know what other existing solutions exist.

Splunk's advantage is that it does extraction of the fields with
advanced searching functionality (it has lexers/parsers for multiple
content types). I believe that's the Solr's function desired in
original posting. At the time they came out (2004), I was not aware of
any good open source solutions to do what they did. And I would have
loved one, as I was analyzing multi-gigabite logs.

Hadoop might be a way to process the files, but what would do the
indexing and searching?

Regards,
Alex.

On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwunderw...@netflix.com wrote:
 Why build one? Don't those already exist?

 Personally, I'd start with Hadoop instead of Solr. Putting logs in a
 search index is guaranteed to not scale. People were already trying
 different approaches ten years ago.

 wunder

 On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote:

 Hi,
 Any help/pointers on the following message would really help me..
 Thanks,Surfer

 --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:

 From: Silent Surfer silentsurfe...@yahoo.com
 Subject: Questions regarding IT search solution
 To: solr-user@lucene.apache.org
 Date: Tuesday, June 2, 2009, 5:45 PM

 Hi,
 I am new to Lucene forum and it is my first question.I need a clarification
 from you.
 Requirement:--1. Build a IT search tool for logs similar to
 that of Splunk(Only wrt searching logs but not in terms of reporting, graphs
 etc) using solr/lucene. The log files are mainly the server logs like JBoss,
 Custom application server logs (May or may not be log4j logs) and the files
 size can go potentially upto 100 MB2. The logs are spread across multiple
 servers (25 to 30 servers)2. Capability to be do search almost realtime3.
 Support  distributed search

 Our search criterion can be based on a keyword or timestamp or IP address 
 etc.
 Can anyone throw some light if solr/lucene is right solution for this ?
 Appreciate any quick help in this regard.
 Thanks,Surfer

Re: Is there Downside to a huge synonyms file?

On Tue, Jun 2, 2009 at 11:28 PM, anuvenk anuvenkat...@hotmail.com wrote:
 I'm using query time synonyms.

These don't currently work if the synonyms expand to more than one
option, and those options have a different number of words.

-Yonik
http://www.lucidimagination.com

Re: indexing Chienese langage


I can't tell what that analyzer does, but I'm guessing it uses n-grams?
Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead?

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Fer-Bj fernando.b...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 2:20:03 AM
 Subject: Re: indexing Chienese langage
 
 
 We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
 the index size went from 1.5 Gb to 2.7 Gb.
 
 Is that some expected behavior ?
 
 Is there any switch or trick to avoid having a double + index file size?
 
 Koji Sekiguchi-2 wrote:
  
  CharFilter can normalize (convert) traditional chinese to simplified 
  chinese or vice versa,
  if you define mapping.txt. Here is the sample of Chinese character 
  normalization:
  
  
 https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
  
  See SOLR-822 for the detail:
  
  https://issues.apache.org/jira/browse/SOLR-822
  
  Koji
  
  
  revathy arun wrote:
  Hi,
 
  When I index chinese content using chinese tokenizer and analyzer in solr
  1.3 ,some of the chinese text files are getting indexed but others are
  not.
 
  Since chinese has got many different language subtypes as in standard
  chinese,simplified chinese etc which of these does the chinese tokenizer
  support and is there any method to find the type of  chiense language 
  from
  the file?
 
  Rgds
 
   
  
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Customizing results


Aha, so you really want to rename the field at response time?  I wonder if this 
is something that could be done with (or should be added to) response writers.  
That's where I'd go look first.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Manepalli, Kalyan kalyan.manepa...@orbitz.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 5:30:40 PM
 Subject: RE: Customizing results
 
 Otis,
 With that solution, the client has to accept all type location fields 
 (location_de_de, location_it_it). I want to copy the result into location 
 field, so that client can just accept location.
 
 Thanks,
 Kalyan Manepalli
 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] 
 Sent: Thursday, June 04, 2009 4:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Customizing results
 
 
 Hello,
 
 If you know what language the user specified (or is associated with), then 
 you 
 just have to ensure the fl URL parameter contain that field (and any other 
 fields you want returned).  So if the language/locale is de_de, then make 
 sure 
 the request has fl=location_de_de,another_field,another_field, and not, for 
 example location_it_it
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
  From: Manepalli, Kalyan 
  To: solr-user@lucene.apache.org 
  Sent: Thursday, June 4, 2009 12:36:30 PM
  Subject: Customizing results
 
  Hi,
  I am trying to customize the response that I receive from Solr. 
  In
  the index I have multiple fields that contain the same data in different
  language.
  At the query time client specifies the language. Based on this param, I 
  want 
 to
  return the value, copied into a different field.
  Eg:
  Lubang, Filippinerne
  Lubang, Philippinen
  Lubang, Philippines
  Lubang, Filipinas
 
  If the user specifies language as de_de, then I want to return the result as
  Lubang, Philippinen
 
  What is the most optimal way of doing this?
  Any suggestions on this will be helpful
 
  Thanks,
  Kalyan Manepalli

Re: Questions regarding IT search solution

2009-06-04 Thread Silent Surfer

Hi,
As Alex correctly pointed out my main intention is to figure out whether 
Solr/lucene offer functionalities to replicate what Splunk is doing in terms of 
building indexes etc for enabling search capabilities.
We evaluated Splunk, but it is not very cost effective solution for us as we 
may have logs running into few GBs per day as there can be around 25-20 servers 
running, and Splunk licensing model is based of size of logs per day that too, 
the license valid for only 1 year.
With this back ground, any further inputs on this are greatly appreciated.
Thanks,Surfer 

--- On Thu, 6/4/09, Alexandre Rafalovitch arafa...@gmail.com wrote:

From: Alexandre Rafalovitch arafa...@gmail.com
Subject: Re: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 9:27 PM

I would also be interested to know what other existing solutions exist.

Splunk's advantage is that it does extraction of the fields with
advanced searching functionality (it has lexers/parsers for multiple
content types). I believe that's the Solr's function desired in
original posting. At the time they came out (2004), I was not aware of
any good open source solutions to do what they did. And I would have
loved one, as I was analyzing multi-gigabite logs.

Hadoop might be a way to process the files, but what would do the
indexing and searching?

Regards,
    Alex.

On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwunderw...@netflix.com wrote:
 Why build one? Don't those already exist?

 Personally, I'd start with Hadoop instead of Solr. Putting logs in a
 search index is guaranteed to not scale. People were already trying
 different approaches ten years ago.

 wunder

 On 6/4/09 8:41 AM, Silent Surfer silentsurfe...@yahoo.com wrote:

 Hi,
 Any help/pointers on the following message would really help me..
 Thanks,Surfer

 --- On Tue, 6/2/09, Silent Surfer silentsurfe...@yahoo.com wrote:

 From: Silent Surfer silentsurfe...@yahoo.com
 Subject: Questions regarding IT search solution
 To: solr-user@lucene.apache.org
 Date: Tuesday, June 2, 2009, 5:45 PM

 Hi,
 I am new to Lucene forum and it is my first question.I need a clarification
 from you.
 Requirement:--1. Build a IT search tool for logs similar to
 that of Splunk(Only wrt searching logs but not in terms of reporting, graphs
 etc) using solr/lucene. The log files are mainly the server logs like JBoss,
 Custom application server logs (May or may not be log4j logs) and the files
 size can go potentially upto 100 MB2. The logs are spread across multiple
 servers (25 to 30 servers)2. Capability to be do search almost realtime3.
 Support  distributed search

 Our search criterion can be based on a keyword or timestamp or IP address 
 etc.
 Can anyone throw some light if solr/lucene is right solution for this ?
 Appreciate any quick help in this regard.
 Thanks,Surfer

Re: Questions regarding IT search solution


My guess is Solr/Lucene would work.  Not sure how well/fast, but it would, esp. 
if you avoid range queries (or use tdate), and esp. if you shard/segment 
indices smartly, so that at query time you send (or distribute if you have to) 
the query to only those shards that have the data (if your query is for a 
limited time period).

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Silent Surfer silentsurfe...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 5:52:21 PM
 Subject: Re: Questions regarding IT search solution
 
 Hi,
 As Alex correctly pointed out my main intention is to figure out whether 
 Solr/lucene offer functionalities to replicate what Splunk is doing in terms 
 of 
 building indexes etc for enabling search capabilities.
 We evaluated Splunk, but it is not very cost effective solution for us as we 
 may 
 have logs running into few GBs per day as there can be around 25-20 servers 
 running, and Splunk licensing model is based of size of logs per day that 
 too, 
 the license valid for only 1 year.
 With this back ground, any further inputs on this are greatly appreciated.
 Thanks,Surfer 
 
 --- On Thu, 6/4/09, Alexandre Rafalovitch wrote:
 
 From: Alexandre Rafalovitch 
 Subject: Re: Questions regarding IT search solution
 To: solr-user@lucene.apache.org
 Date: Thursday, June 4, 2009, 9:27 PM
 
 I would also be interested to know what other existing solutions exist.
 
 Splunk's advantage is that it does extraction of the fields with
 advanced searching functionality (it has lexers/parsers for multiple
 content types). I believe that's the Solr's function desired in
 original posting. At the time they came out (2004), I was not aware of
 any good open source solutions to do what they did. And I would have
 loved one, as I was analyzing multi-gigabite logs.
 
 Hadoop might be a way to process the files, but what would do the
 indexing and searching?
 
 Regards,
 Alex.
 
 On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote:
  Why build one? Don't those already exist?
 
  Personally, I'd start with Hadoop instead of Solr. Putting logs in a
  search index is guaranteed to not scale. People were already trying
  different approaches ten years ago.
 
  wunder
 
  On 6/4/09 8:41 AM, Silent Surfer wrote:
 
  Hi,
  Any help/pointers on the following message would really help me..
  Thanks,Surfer
 
  --- On Tue, 6/2/09, Silent Surfer wrote:
 
  From: Silent Surfer 
  Subject: Questions regarding IT search solution
  To: solr-user@lucene.apache.org
  Date: Tuesday, June 2, 2009, 5:45 PM
 
  Hi,
  I am new to Lucene forum and it is my first question.I need a clarification
  from you.
  Requirement:--1. Build a IT search tool for logs similar to
  that of Splunk(Only wrt searching logs but not in terms of reporting, 
  graphs
  etc) using solr/lucene. The log files are mainly the server logs like 
  JBoss,
  Custom application server logs (May or may not be log4j logs) and the files
  size can go potentially upto 100 MB2. The logs are spread across multiple
  servers (25 to 30 servers)2. Capability to be do search almost realtime3.
  Support  distributed search
 
  Our search criterion can be based on a keyword or timestamp or IP address 
 etc.
  Can anyone throw some light if solr/lucene is right solution for this ?
  Appreciate any quick help in this regard.
  Thanks,Surfer

Determining Search Query Category

2009-06-04 Thread ram_sj


Hi,

I have more than 20 categories for my search application. I'm interested in
finding the category of query entered by user dynamically instead of asking
the user to filter the results through long list of categories. 

Its a general question, its not specific to solr though, any suggestion
about how to approach this problem will be helpful.

Thanks
Ram 
-- 
View this message in context: 
http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to do exact serch with solrj

2009-06-04 Thread Jianbin Dai


I still have a problem with exact matching.

query.setQuery(title:\hello the world\);

This will return all docs with title containing hello the world, i.e.,
hello the world, Jack will also be matched. What I want is exactly hello the 
world. Setting this field to string instead of text doesn't work well either, 
because I want something like Hello, The World to be matched as well.
Any idea? Thanks.


 --- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com
 wrote:
 
  From: Avlesh Singh avl...@gmail.com
  Subject: Re: how to do exact serch with solrj
  To: solr-user@lucene.apache.org
  Date: Saturday, May 30, 2009, 11:45 PM
  You need exact match for all the
  three tokens?
  If yes, try query.setQuery(title:\hello the
 world\);
  
  Cheers
  Avlesh
  
  On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai djian...@yahoo.com
  wrote:
  
  
   I tried, but seems it's not working right.
  
   --- On Sat, 5/30/09, Avlesh Singh avl...@gmail.com
  wrote:
  
From: Avlesh Singh avl...@gmail.com
Subject: Re: how to do exact serch with
 solrj
To: solr-user@lucene.apache.org
Date: Saturday, May 30, 2009, 10:56 PM
query.setQuery(title:hello the
world) is what you need.
   
Cheers
Avlesh
   
On Sun, May 31, 2009 at 6:23 AM, Jianbin
 Dai
  djian...@yahoo.com
wrote:
   

 Hi,

 I want to search hello the world in
 the
  title
field using solrj. I set
 the query filter
 query.addFilterQuery(title);
 query.setQuery(hello the world);

 but it returns not exact match results
 as
  well.

 I know one way to do it is to set
 title
  field to
string instead of text.
 But is there any way i can do it? If I
 do
  the search
through web interface
 Solr Admin by title:hello the world,
 it
  returns
exact matches.

 Thanks.

 JB

Re: indexing Chienese langage


What we usually do to reindex is:

1. stop solr
2. rmdir -r data  (that is to remove everything in  /opt/solr/data/
3. mkdir data
4. start solr
5. start reindex.   with this we're sure about not having old copies or
index..

To check the index size we do:
cd data
du -sh



Otis Gospodnetic wrote:
 
 
 I can't tell what that analyzer does, but I'm guessing it uses n-grams?
 Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629
 instead?
 
  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: Fer-Bj fernando.b...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 2:20:03 AM
 Subject: Re: indexing Chienese langage
 
 
 We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after
 reindexing
 the index size went from 1.5 Gb to 2.7 Gb.
 
 Is that some expected behavior ?
 
 Is there any switch or trick to avoid having a double + index file size?
 
 Koji Sekiguchi-2 wrote:
  
  CharFilter can normalize (convert) traditional chinese to simplified 
  chinese or vice versa,
  if you define mapping.txt. Here is the sample of Chinese character 
  normalization:
  
  
 https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
  
  See SOLR-822 for the detail:
  
  https://issues.apache.org/jira/browse/SOLR-822
  
  Koji
  
  
  revathy arun wrote:
  Hi,
 
  When I index chinese content using chinese tokenizer and analyzer in
 solr
  1.3 ,some of the chinese text files are getting indexed but others are
  not.
 
  Since chinese has got many different language subtypes as in standard
  chinese,simplified chinese etc which of these does the chinese
 tokenizer
  support and is there any method to find the type of  chiense language 
  from
  the file?
 
  Rgds
 
   
  
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field Compression


Here is what we have:

for all the documents we have a field called small_body , which is a 60
chars max text field that were we store the abstract for each article.

We have about 8,000,000 documents indexed, and usually we display this
small_body on our listing pages. 

For each listing page we load 50 documents at the time, that is to say, we
need to display this small_body we want to compress every time.

I'll probably do the compress for this field and run a 1 week test to see
the outcome, roll it back eventually.

Last question: what's the best way to determine the compress threshold ?

Grant Ingersoll-6 wrote:
 
 
 On Jun 4, 2009, at 6:42 AM, Erick Erickson wrote:
 

 It *will* cause performance issues if you load that field for a large
 number of documents on a particular search. I know Lucene itself
 has lazy field loading that helps in this case, but I don't know how
 to persuade SOLR to use it (it may even lazy-load automatically).
 But this is separate from searching...
 
 Lazy loading is an option configured in the solrconfig.xml
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Field-Compression-tp15258669p23879859.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Questions regarding IT search solution

2009-06-04 Thread silentsurfer77

Hi,
This is encouraging to know that solr/lucene solution may work.
Can anyone using solr/lucene for such scenario can confirm that the solution is 
used and working fine? That would be really helpful, as I just started looking 
into the solr/lucene solution only couple of days back and might be difficult 
to be 100% confident before proposing the solution approach in next couple of 
days.
Thanks,Surfer

--- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

From: Otis Gospodnetic otis_gospodne...@yahoo.com
Subject: Re: Questions regarding IT search solution
To:
 solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 10:26 PM


My guess is Solr/Lucene would work.  Not sure how well/fast, but it would, esp. 
if you avoid range queries (or use tdate), and esp. if you shard/segment 
indices smartly, so that at query time you send (or distribute if you have to) 
the query to only those shards that have the data (if your query is for a 
limited time period).

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Silent Surfer silentsurfe...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 5:52:21 PM
 Subject: Re:
 Questions regarding IT search solution
 
 Hi,
 As Alex correctly pointed out my main intention is to figure out whether 
 Solr/lucene offer functionalities to replicate what Splunk is doing in terms 
 of 
 building indexes etc for enabling search capabilities.
 We evaluated Splunk, but it is not very cost effective solution for us as we 
 may 
 have logs running into few GBs per day as there can be around 25-20 servers 
 running, and Splunk licensing model is based of size of logs per day that 
 too, 
 the license valid for only 1 year.
 With this back ground, any further inputs on this are greatly appreciated.
 Thanks,Surfer 
 
 --- On Thu, 6/4/09, Alexandre Rafalovitch wrote:
 
 From: Alexandre Rafalovitch 
 Subject: Re: Questions regarding IT search solution
 To: solr-user@lucene.apache.org
 Date: Thursday, June 4, 2009, 9:27 PM
 
 I would also be interested to know what other existing solutions exist.
 
 Splunk's advantage is that it does extraction of the fields with
 advanced searching functionality (it has lexers/parsers for multiple
 content types). I believe that's the Solr's function desired in
 original posting. At the time they came out (2004), I was not aware of
 any good open source solutions to do what they did. And I would have
 loved one, as I was analyzing multi-gigabite logs.
 
 Hadoop might be a way to process the files, but what would do the
 indexing and searching?
 
 Regards,
     Alex.
 
 On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote:
  Why build one? Don't those already exist?
 
  Personally, I'd start with Hadoop instead of Solr. Putting
 logs in a
  search index is guaranteed to not scale. People were already trying
  different approaches ten years ago.
 
  wunder
 
  On 6/4/09 8:41 AM, Silent Surfer wrote:
 
  Hi,
  Any help/pointers on the following message would really help me..
  Thanks,Surfer
 
  --- On Tue, 6/2/09, Silent Surfer wrote:
 
  From: Silent Surfer 
  Subject: Questions regarding IT search solution
  To: solr-user@lucene.apache.org
  Date: Tuesday, June 2, 2009, 5:45 PM
 
  Hi,
  I am new to Lucene forum and it is my first question.I need a clarification
  from you.
  Requirement:--1. Build a IT search tool for logs similar to
  that of Splunk(Only wrt searching logs but not in terms of reporting, 
  graphs
  etc) using
 solr/lucene. The log files are mainly the server logs like JBoss,
  Custom application server logs (May or may not be log4j logs) and the files
  size can go potentially upto 100 MB2. The logs are spread across multiple
  servers (25 to 30 servers)2. Capability to be do search almost realtime3.
  Support  distributed search
 
  Our search criterion can be based on a keyword or timestamp or IP address 
 etc.
  Can anyone throw some light if solr/lucene is right solution for this ?
  Appreciate any quick help in this regard.
  Thanks,Surfer

Re: Questions regarding IT search solution

2009-06-04 Thread Jeff Hammerbacher

Hey,

Your system sounds similar to the work don by Stu Hood at Rackspace in their
Mailtrust unit. See
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor
more details and inspiration.

Regards,
Jeff

On Thu, Jun 4, 2009 at 4:58 PM, silentsurfe...@yahoo.com wrote:

Hi,
This is encouraging to know that solr/lucene solution may work.
Can anyone using solr/lucene for such scenario can confirm that the
solution is used and working fine? That would be really helpful, as I just
started looking into the solr/lucene solution only couple of days back and
might be difficult to be 100% confident before proposing the solution
approach in next couple of days.
Thanks,Surfer

--- On Thu, 6/4/09, Otis Gospodnetic otis_gospodne...@yahoo.com wrote:

From: Otis Gospodnetic otis_gospodne...@yahoo.com
Subject: Re: Questions regarding IT search solution
To:
solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 10:26 PM

My guess is Solr/Lucene would work. Not sure how well/fast, but it would,
esp. if you avoid range queries (or use tdate), and esp. if you
shard/segment indices smartly, so that at query time you send (or distribute
if you have to) the query to only those shards that have the data (if your
query is for a limited time period).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message
From: Silent Surfer silentsurfe...@yahoo.com
To: solr-user@lucene.apache.org
Sent: Thursday, June 4, 2009 5:52:21 PM
Subject: Re:
Questions regarding IT search solution

Hi,
As Alex correctly pointed out my main intention is to figure out whether
Solr/lucene offer functionalities to replicate what Splunk is doing in
terms of
building indexes etc for enabling search capabilities.
We evaluated Splunk, but it is not very cost effective solution for us as
we may
have logs running into few GBs per day as there can be around 25-20
servers
running, and Splunk licensing model is based of size of logs per day that
too,
the license valid for only 1 year.
With this back ground, any further inputs on this are greatly
appreciated.
Thanks,Surfer

--- On Thu, 6/4/09, Alexandre Rafalovitch wrote:

From: Alexandre Rafalovitch
Subject: Re: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Thursday, June 4, 2009, 9:27 PM

I would also be interested to know what other existing solutions exist.

Splunk's advantage is that it does extraction of the fields with
advanced searching functionality (it has lexers/parsers for multiple
content types). I believe that's the Solr's function desired in
original posting. At the time they came out (2004), I was not aware of
any good open source solutions to do what they did. And I would have
loved one, as I was analyzing multi-gigabite logs.

Hadoop might be a way to process the files, but what would do the
indexing and searching?

Regards,
Alex.

On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote:
Why build one? Don't those already exist?

Personally, I'd start with Hadoop instead of Solr. Putting
logs in a
search index is guaranteed to not scale. People were already trying
different approaches ten years ago.

wunder

On 6/4/09 8:41 AM, Silent Surfer wrote:

Hi,
Any help/pointers on the following message would really help me..
Thanks,Surfer

--- On Tue, 6/2/09, Silent Surfer wrote:

From: Silent Surfer
Subject: Questions regarding IT search solution
To: solr-user@lucene.apache.org
Date: Tuesday, June 2, 2009, 5:45 PM

Hi,
I am new to Lucene forum and it is my first question.I need a
clarification
from you.
Requirement:--1. Build a IT search tool for logs
similar to
that of Splunk(Only wrt searching logs but not in terms of reporting,
graphs
etc) using
solr/lucene. The log files are mainly the server logs like JBoss,
Custom application server logs (May or may not be log4j logs) and the
files
size can go potentially upto 100 MB2. The logs are spread across
multiple
servers (25 to 30 servers)2. Capability to be do search almost
realtime3.
Support distributed search

Our search criterion can be based on a keyword or timestamp or IP
address
etc.
Can anyone throw some light if solr/lucene is right solution for this
?
Appreciate any quick help in this regard.
Thanks,Surfer

using UpdateRequestProcessor from a custom analyzer

2009-06-04 Thread Kir4


Is it possible to create a custom analyzer (index time) that uses
UpdateRequestProcessor to add new fields to posts, based on the tokens
generated by the other analyzers that have been run (before my custom
analyzer)?
The content of said fields must differ from post to post based on the tokens
extracted from each one of them.
Thank you very much for any answer/suggestion you can give me!!!

G.

-- 
View this message in context: 
http://www.nabble.com/using-UpdateRequestProcessor-from-a-custom-analyzer-tp23880160p23880160.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Determining Search Query Category


Ram,

Typical queries are short, so they are hard to categorize using statistical 
approaches.  Maybe categorization of queries would work with a custom set of 
rules applied to queries?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: ram_sj rpachaiyap...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 6:26:33 PM
 Subject: Determining Search Query Category
 
 
 Hi,
 
 I have more than 20 categories for my search application. I'm interested in
 finding the category of query entered by user dynamically instead of asking
 the user to filter the results through long list of categories. 
 
 Its a general question, its not specific to solr though, any suggestion
 about how to approach this problem will be helpful.
 
 Thanks
 Ram 
 -- 
 View this message in context: 
 http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to do exact serch with solrj


I don't think there is anything ready to be used in Solr (but would be easy to 
add), but if you indexed your with a custom beginning of string and end of 
string anchors, you'll be able to get your exact matching working.

For example, convert hello the world to $hello the world$ before indexing 
(and make sure you use string type or KeywordTokenizer -- things that won't 
remove any characters.  Then search for $hello the world$.  This will not 
match $hello the world, Jack$.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Jianbin Dai djian...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 6:42:39 PM
 Subject: Re: how to do exact serch with solrj
 
 
 I still have a problem with exact matching.
 
 query.setQuery(title:\hello the world\);
 
 This will return all docs with title containing hello the world, i.e.,
 hello the world, Jack will also be matched. What I want is exactly hello 
 the 
 world. Setting this field to string instead of text doesn't work well 
 either, 
 because I want something like Hello, The World to be matched as well.
 Any idea? Thanks.
 
 
  --- On Sat, 5/30/09, Avlesh Singh 
  wrote:
  
   From: Avlesh Singh 
   Subject: Re: how to do exact serch with solrj
   To: solr-user@lucene.apache.org
   Date: Saturday, May 30, 2009, 11:45 PM
   You need exact match for all the
   three tokens?
   If yes, try query.setQuery(title:\hello the
  world\);
   
   Cheers
   Avlesh
   
   On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai 
   wrote:
   
   
I tried, but seems it's not working right.
   
--- On Sat, 5/30/09, Avlesh Singh 
   wrote:
   
 From: Avlesh Singh 
 Subject: Re: how to do exact serch with
  solrj
 To: solr-user@lucene.apache.org
 Date: Saturday, May 30, 2009, 10:56 PM
 query.setQuery(title:hello the
 world) is what you need.

 Cheers
 Avlesh

 On Sun, May 31, 2009 at 6:23 AM, Jianbin
  Dai
   
 wrote:

 
  Hi,
 
  I want to search hello the world in
  the
   title
 field using solrj. I set
  the query filter
  query.addFilterQuery(title);
  query.setQuery(hello the world);
 
  but it returns not exact match results
  as
   well.
 
  I know one way to do it is to set
  title
   field to
 string instead of text.
  But is there any way i can do it? If I
  do
   the search
 through web interface
  Solr Admin by title:hello the world,
  it
   returns
 exact matches.
 
  Thanks.
 
  JB

Re: how to do exact serch with solrj


I re-read your original request.  Here is the recipe that should work:

* Define new field type that:
  Uses KeywordTokenizer
   Uses LowerCaseFilter

* Make your field be of the above type.

* Use those begin/end anchor characters at index and search time.


I believe that should work.  Please try it and let us know.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Otis Gospodnetic otis_gospodne...@yahoo.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 8:47:50 PM
 Subject: Re: how to do exact serch with solrj
 
 
 I don't think there is anything ready to be used in Solr (but would be easy 
 to 
 add), but if you indexed your with a custom beginning of string and end of 
 string anchors, you'll be able to get your exact matching working.
 
 For example, convert hello the world to $hello the world$ before indexing 
 (and make sure you use string type or KeywordTokenizer -- things that won't 
 remove any characters.  Then search for $hello the world$.  This will not 
 match $hello the world, Jack$.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
  From: Jianbin Dai 
  To: solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 6:42:39 PM
  Subject: Re: how to do exact serch with solrj
  
  
  I still have a problem with exact matching.
  
  query.setQuery(title:\hello the world\);
  
  This will return all docs with title containing hello the world, i.e.,
  hello the world, Jack will also be matched. What I want is exactly hello 
 the 
  world. Setting this field to string instead of text doesn't work well 
  either, 
 
  because I want something like Hello, The World to be matched as well.
  Any idea? Thanks.
  
  
   --- On Sat, 5/30/09, Avlesh Singh 
   wrote:
   
From: Avlesh Singh 
Subject: Re: how to do exact serch with solrj
To: solr-user@lucene.apache.org
Date: Saturday, May 30, 2009, 11:45 PM
You need exact match for all the
three tokens?
If yes, try query.setQuery(title:\hello the
   world\);

Cheers
Avlesh

On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai 
wrote:


 I tried, but seems it's not working right.

 --- On Sat, 5/30/09, Avlesh Singh 
wrote:

  From: Avlesh Singh 
  Subject: Re: how to do exact serch with
   solrj
  To: solr-user@lucene.apache.org
  Date: Saturday, May 30, 2009, 10:56 PM
  query.setQuery(title:hello the
  world) is what you need.
 
  Cheers
  Avlesh
 
  On Sun, May 31, 2009 at 6:23 AM, Jianbin
   Dai

  wrote:
 
  
   Hi,
  
   I want to search hello the world in
   the
title
  field using solrj. I set
   the query filter
   query.addFilterQuery(title);
   query.setQuery(hello the world);
  
   but it returns not exact match results
   as
well.
  
   I know one way to do it is to set
   title
field to
  string instead of text.
   But is there any way i can do it? If I
   do
the search
  through web interface
   Solr Admin by title:hello the world,
   it
returns
  exact matches.
  
   Thanks.
  
   JB

Re: indexing Chienese langage

2009-06-04 Thread James liu

first: u not have to restart solr,,,u can use new data to replace old data
and call solr to use new search..u can find something in shell script which
with solr

two: u not have to restart solr,,,just keep id is same..example: old
id:1,title:hi, new id:1,title:welcome,,just index new data,,it will delete
old data and insert new doc,,,like replace,,but it will use more time and
resouce.

u can find indexed doc number from solr admin page.


On Fri, Jun 5, 2009 at 7:42 AM, Fer-Bj fernando.b...@gmail.com wrote:


 What we usually do to reindex is:

 1. stop solr
 2. rmdir -r data  (that is to remove everything in  /opt/solr/data/
 3. mkdir data
 4. start solr
 5. start reindex.   with this we're sure about not having old copies or
 index..

 To check the index size we do:
 cd data
 du -sh



 Otis Gospodnetic wrote:
 
 
  I can't tell what that analyzer does, but I'm guessing it uses n-grams?
  Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629
  instead?
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: Fer-Bj fernando.b...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 2:20:03 AM
  Subject: Re: indexing Chienese langage
 
 
  We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after
  reindexing
  the index size went from 1.5 Gb to 2.7 Gb.
 
  Is that some expected behavior ?
 
  Is there any switch or trick to avoid having a double + index file size?
 
  Koji Sekiguchi-2 wrote:
  
   CharFilter can normalize (convert) traditional chinese to simplified
   chinese or vice versa,
   if you define mapping.txt. Here is the sample of Chinese character
   normalization:
  
  
 
 https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
  
   See SOLR-822 for the detail:
  
   https://issues.apache.org/jira/browse/SOLR-822
  
   Koji
  
  
   revathy arun wrote:
   Hi,
  
   When I index chinese content using chinese tokenizer and analyzer in
  solr
   1.3 ,some of the chinese text files are getting indexed but others
 are
   not.
  
   Since chinese has got many different language subtypes as in standard
   chinese,simplified chinese etc which of these does the chinese
  tokenizer
   support and is there any method to find the type of  chiense language
   from
   the file?
  
   Rgds
  
  
  
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

 --
 View this message in context:
 http://www.nabble.com/indexing-Chienese-langage-tp22033302p23879730.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
regards
j.L ( I live in Shanghai, China)

Re: indexing Chienese langage

2009-06-04 Thread James liu

On Mon, Feb 16, 2009 at 4:30 PM, revathy arun revas...@gmail.com wrote:

 Hi,

 When I index chinese content using chinese tokenizer and analyzer in solr
 1.3 ,some of the chinese text files are getting indexed but others are not.


are u sure ur analyzer can do it good?

if not sure, u can use analzyer link in solr admin page to check it



 Since chinese has got many different language subtypes as in standard
 chinese,simplified chinese etc which of these does the chinese tokenizer
 support and is there any method to find the type of  chiense language  from
 the file?

 Rgds




-- 
regards
j.L ( I live in Shanghai, China)

Re: Which caches should use the solr.FastLRUCache

On Thu, Jun 4, 2009 at 11:29 PM, Robert Purdy rdpu...@gmail.com wrote:

Thanks for the Good information :) Well I haven't had any evictions in any of
the caches in years, but the hit ratio is 0.51 in queryResultCache, 0.77 in
documentCache, 1.00 in the fieldValueCache, and 0.99 in the filterCache. So
in your opinion should the documentCache and queryResultCache use the old
way on a single CPU quad core machine?

Also right now I have all caches using the solr.FastLRUCache (tried with
both the cleanupThread = false or true) and I have noticed some queries that
are taking 53 ms on a freshly warmed new searcher (when nothing else is
querying the slave), but when the slave is busy the same query, that should
be using the caches, is sometimes taking 8 secs? Any thoughts?

This overhead may not be because of the cache itself. Some queries are
definitely missing the cache and they are likely to take time. if
cleanupThread=true, then no eviction should take more time

Thanks Robert.

Yonik Seeley-2 wrote:

2009/6/4 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com:
FastLRUCache is designed to be lock free so it is well suited for
caches which are hit several times in a request. I guess there is no
harm in using FastLRUCache across all the caches.

Gets are cheaper, but evictions are more expensive. If the cache hit
rate is low, the old synchronized cache may be faster, unless you have
a ton of CPUs... not sure where the crossover point is though.

-Yonik
http://www.lucidimagination.com

--
View this message in context:
http://www.nabble.com/Which-caches-should-use-the-solr.FastLRUCache-tp23860182p23874898.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Determining Search Query Category

If you haven't already given this a thought, you may want to try out an
auto-complete feature, suggesting those categories upfront.

Cheers
Avlesh

On Fri, Jun 5, 2009 at 3:56 AM, ram_sj rpachaiyap...@gmail.com wrote:


 Hi,

 I have more than 20 categories for my search application. I'm interested in
 finding the category of query entered by user dynamically instead of asking
 the user to filter the results through long list of categories.

 Its a general question, its not specific to solr though, any suggestion
 about how to approach this problem will be helpful.

 Thanks
 Ram
 --
 View this message in context:
 http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index Comma Separated numbers

did you try the NumberFormatTransformer ?

On Fri, Jun 5, 2009 at 12:09 AM, Jianbin Dai djian...@yahoo.com wrote:

 Hi, One of the fields to be indexed is price which is comma separated, e.g., 
 12,034.00.  How can I indexed it as a number?
 I am using DIH to pull the data. Thanks.








-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Determining Search Query Category

2009-06-04 Thread Walter Underwood

Can you analyze the logs to see which categories people choose for
each query? When there are enough queries and a clear preference,
you can highlight that choice.

wunder

On 6/4/09 9:21 PM, Avlesh Singh avl...@gmail.com wrote:

 If you haven't already given this a thought, you may want to try out an
 auto-complete feature, suggesting those categories upfront.
 
 Cheers
 Avlesh
 
 On Fri, Jun 5, 2009 at 3:56 AM, ram_sj rpachaiyap...@gmail.com wrote:
 
 
 Hi,
 
 I have more than 20 categories for my search application. I'm interested in
 finding the category of query entered by user dynamically instead of asking
 the user to filter the results through long list of categories.
 
 Its a general question, its not specific to solr though, any suggestion
 about how to approach this problem will be helpful.
 
 Thanks
 Ram
 --
 View this message in context:
 http://www.nabble.com/Determining-Search-Query-Category-tp23878965p23878965.h
 tml
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Customizing results

How are you accessing Solr? SolrJ?

does this help?
https://issues.apache.org/jira/browse/SOLR-1129

On Fri, Jun 5, 2009 at 3:00 AM, Manepalli, Kalyan
kalyan.manepa...@orbitz.com wrote:
 Otis,
        With that solution, the client has to accept all type location fields 
 (location_de_de, location_it_it). I want to copy the result into location 
 field, so that client can just accept location.

 Thanks,
 Kalyan Manepalli
 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Thursday, June 04, 2009 4:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Customizing results


 Hello,

 If you know what language the user specified (or is associated with), then 
 you just have to ensure the fl URL parameter contain that field (and any 
 other fields you want returned).  So if the language/locale is de_de, then 
 make sure the request has fl=location_de_de,another_field,another_field, and 
 not, for example location_it_it

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Manepalli, Kalyan kalyan.manepa...@orbitz.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 12:36:30 PM
 Subject: Customizing results

 Hi,
             I am trying to customize the response that I receive from Solr. 
 In
 the index I have multiple fields that contain the same data in different
 language.
 At the query time client specifies the language. Based on this param, I want 
 to
 return the value, copied into a different field.
 Eg:
 Lubang, Filippinerne
 Lubang, Philippinen
 Lubang, Philippines
 Lubang, Filipinas

 If the user specifies language as de_de, then I want to return the result as
 Lubang, Philippinen

 What is the most optimal way of doing this?
 Any suggestions on this will be helpful

 Thanks,
 Kalyan Manepalli





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Customizing results

Nice suggestion Noble!
If you are using SolrJ, then this particular binding can be an answer to
your question.

Cheers
Avlesh

2009/6/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 How are you accessing Solr? SolrJ?

 does this help?
 https://issues.apache.org/jira/browse/SOLR-1129

 On Fri, Jun 5, 2009 at 3:00 AM, Manepalli, Kalyan
 kalyan.manepa...@orbitz.com wrote:
  Otis,
 With that solution, the client has to accept all type location
 fields (location_de_de, location_it_it). I want to copy the result into
 location field, so that client can just accept location.
 
  Thanks,
  Kalyan Manepalli
  -Original Message-
  From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
  Sent: Thursday, June 04, 2009 4:16 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Customizing results
 
 
  Hello,
 
  If you know what language the user specified (or is associated with),
 then you just have to ensure the fl URL parameter contain that field (and
 any other fields you want returned).  So if the language/locale is de_de,
 then make sure the request has
 fl=location_de_de,another_field,another_field, and not, for example
 location_it_it
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: Manepalli, Kalyan kalyan.manepa...@orbitz.com
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 12:36:30 PM
  Subject: Customizing results
 
  Hi,
  I am trying to customize the response that I receive from
 Solr. In
  the index I have multiple fields that contain the same data in different
  language.
  At the query time client specifies the language. Based on this param, I
 want to
  return the value, copied into a different field.
  Eg:
  Lubang, Filippinerne
  Lubang, Philippinen
  Lubang, Philippines
  Lubang, Filipinas
 
  If the user specifies language as de_de, then I want to return the
 result as
  Lubang, Philippinen
 
  What is the most optimal way of doing this?
  Any suggestions on this will be helpful
 
  Thanks,
  Kalyan Manepalli
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: how to do exact serch with solrj

And the field should be of type, text, right Otis?
Does one still need those anchors if the type is string with the filters
you suggested?

Cheers
Avlesh

On Fri, Jun 5, 2009 at 6:35 AM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:


 I re-read your original request.  Here is the recipe that should work:

 * Define new field type that:
  Uses KeywordTokenizer
   Uses LowerCaseFilter

 * Make your field be of the above type.

 * Use those begin/end anchor characters at index and search time.


 I believe that should work.  Please try it and let us know.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Otis Gospodnetic otis_gospodne...@yahoo.com
  To: solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 8:47:50 PM
  Subject: Re: how to do exact serch with solrj
 
 
  I don't think there is anything ready to be used in Solr (but would be
 easy to
  add), but if you indexed your with a custom beginning of string and
 end of
  string anchors, you'll be able to get your exact matching working.
 
  For example, convert hello the world to $hello the world$ before
 indexing
  (and make sure you use string type or KeywordTokenizer -- things that
 won't
  remove any characters.  Then search for $hello the world$.  This will
 not
  match $hello the world, Jack$.
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
   From: Jianbin Dai
   To: solr-user@lucene.apache.org
   Sent: Thursday, June 4, 2009 6:42:39 PM
   Subject: Re: how to do exact serch with solrj
  
  
   I still have a problem with exact matching.
  
   query.setQuery(title:\hello the world\);
  
   This will return all docs with title containing hello the world,
 i.e.,
   hello the world, Jack will also be matched. What I want is exactly
 hello
  the
   world. Setting this field to string instead of text doesn't work well
 either,
 
   because I want something like Hello, The World to be matched as well.
   Any idea? Thanks.
  
  
--- On Sat, 5/30/09, Avlesh Singh
wrote:
   
 From: Avlesh Singh
 Subject: Re: how to do exact serch with solrj
 To: solr-user@lucene.apache.org
 Date: Saturday, May 30, 2009, 11:45 PM
 You need exact match for all the
 three tokens?
 If yes, try query.setQuery(title:\hello the
world\);

 Cheers
 Avlesh

 On Sun, May 31, 2009 at 12:12 PM, Jianbin Dai
 wrote:

 
  I tried, but seems it's not working right.
 
  --- On Sat, 5/30/09, Avlesh Singh
 wrote:
 
   From: Avlesh Singh
   Subject: Re: how to do exact serch with
solrj
   To: solr-user@lucene.apache.org
   Date: Saturday, May 30, 2009, 10:56 PM
   query.setQuery(title:hello the
   world) is what you need.
  
   Cheers
   Avlesh
  
   On Sun, May 31, 2009 at 6:23 AM, Jianbin
Dai

   wrote:
  
   
Hi,
   
I want to search hello the world in
the
 title
   field using solrj. I set
the query filter
query.addFilterQuery(title);
query.setQuery(hello the world);
   
but it returns not exact match results
as
 well.
   
I know one way to do it is to set
title
 field to
   string instead of text.
But is there any way i can do it? If I
do
 the search
   through web interface
Solr Admin by title:hello the world,
it
 returns
   exact matches.
   
Thanks.
   
JB

Re: Customizing results

Hi Otis,

is it a good idea to provide as aliasing feature for Solr similar to
the  SQL 'as'

in SQL we can do

select location_da_dk as location

Solr may have

fl.alias=location_da_dk:location

--Noble




On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Aha, so you really want to rename the field at response time?  I wonder if 
 this is something that could be done with (or should be added to) response 
 writers.  That's where I'd go look first.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Manepalli, Kalyan kalyan.manepa...@orbitz.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Thursday, June 4, 2009 5:30:40 PM
 Subject: RE: Customizing results

 Otis,
     With that solution, the client has to accept all type location fields
 (location_de_de, location_it_it). I want to copy the result into location
 field, so that client can just accept location.

 Thanks,
 Kalyan Manepalli
 -Original Message-
 From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
 Sent: Thursday, June 04, 2009 4:16 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Customizing results


 Hello,

 If you know what language the user specified (or is associated with), then 
 you
 just have to ensure the fl URL parameter contain that field (and any other
 fields you want returned).  So if the language/locale is de_de, then make 
 sure
 the request has fl=location_de_de,another_field,another_field, and not, for
 example location_it_it

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
  From: Manepalli, Kalyan
  To: solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 12:36:30 PM
  Subject: Customizing results
 
  Hi,
              I am trying to customize the response that I receive from 
  Solr. In
  the index I have multiple fields that contain the same data in different
  language.
  At the query time client specifies the language. Based on this param, I 
  want
 to
  return the value, copied into a different field.
  Eg:
  Lubang, Filippinerne
  Lubang, Philippinen
  Lubang, Philippines
  Lubang, Filipinas
 
  If the user specifies language as de_de, then I want to return the result 
  as
  Lubang, Philippinen
 
  What is the most optimal way of doing this?
  Any suggestions on this will be helpful
 
  Thanks,
  Kalyan Manepalli





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Customizing results

Generally a good idea, but be prepared to entertain requests that should
also ask you to be able to perform the query using those aliases. I mean
when you talk about something similar to aliases in SQL, those aliases can
be used in SQL scripts in the where clause too.

Cheers
Avlesh

2009/6/5 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 Hi Otis,

 is it a good idea to provide as aliasing feature for Solr similar to
 the  SQL 'as'

 in SQL we can do

 select location_da_dk as location

 Solr may have

 fl.alias=location_da_dk:location

 --Noble




 On Fri, Jun 5, 2009 at 3:10 AM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 
  Aha, so you really want to rename the field at response time?  I wonder
 if this is something that could be done with (or should be added to)
 response writers.  That's where I'd go look first.
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: Manepalli, Kalyan kalyan.manepa...@orbitz.com
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Sent: Thursday, June 4, 2009 5:30:40 PM
  Subject: RE: Customizing results
 
  Otis,
  With that solution, the client has to accept all type location
 fields
  (location_de_de, location_it_it). I want to copy the result into
 location
  field, so that client can just accept location.
 
  Thanks,
  Kalyan Manepalli
  -Original Message-
  From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
  Sent: Thursday, June 04, 2009 4:16 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Customizing results
 
 
  Hello,
 
  If you know what language the user specified (or is associated with),
 then you
  just have to ensure the fl URL parameter contain that field (and any
 other
  fields you want returned).  So if the language/locale is de_de, then
 make sure
  the request has fl=location_de_de,another_field,another_field, and not,
 for
  example location_it_it
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
   From: Manepalli, Kalyan
   To: solr-user@lucene.apache.org
   Sent: Thursday, June 4, 2009 12:36:30 PM
   Subject: Customizing results
  
   Hi,
   I am trying to customize the response that I receive from
 Solr. In
   the index I have multiple fields that contain the same data in
 different
   language.
   At the query time client specifies the language. Based on this param,
 I want
  to
   return the value, copied into a different field.
   Eg:
   Lubang, Filippinerne
   Lubang, Philippinen
   Lubang, Philippines
   Lubang, Filipinas
  
   If the user specifies language as de_de, then I want to return the
 result as
   Lubang, Philippinen
  
   What is the most optimal way of doing this?
   Any suggestions on this will be helpful
  
   Thanks,
   Kalyan Manepalli
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Customizing results