date:20130609

Re: LIMIT on number of OR in fq

2013-06-09 Thread Aloke Ghoshal

True, the container's request header size limit must be the reason then.
Try:
http://serverfault.com/questions/136249/how-do-we-increase-the-maximum-allowed-http-get-query-length-in-jetty



On Sun, Jun 9, 2013 at 11:04 PM, Jack Krupansky wrote:

> Maybe it is hitting some kind of container limit on URL length, like more
> than 2048?
>
> Add &debugQuery=true to your query and see what query is both received and
> parsed and generated.
>
> Also, if the default query operator is set to or, fq={! q.op=OR}..., then
> you can drop the " OR " operators for a shorter query string.
>
> That said, as with most features of Lucene and Solr, the #1 rule is: Use
> them in moderation. A few dozen IDs are fine. A hundred immediately raising
> suspicion - what are you really trying to do? 200?! 250??!! Over 300?!!
> 1,000?!?! 5,000?!?! I mean, do you really need to do all of this on a
> single "query"? If you find yourself saying "Yes", go back to the drawing
> board and think a lot more carefully what your data model is. I mean, the
> application data model is supposed to simplify queries. Your case does not
> seem simple at all.
>
> Tell us what you are really trying to do with this extreme filter query.
> The fact that you stumbled into an apparent problem should just be a wakeup
> call that you need to reconsider your basic design assumptions.
>
> -- Jack Krupansky
>
> -Original Message- From: Kamal Palei
> Sent: Sunday, June 09, 2013 9:07 AM
> To: solr-user@lucene.apache.org
> Subject: LIMIT on number of OR in fq
>
>
> Dear All
> I am using below syntax to check for a particular field.
> &fq=locations:(5000 OR 1 OR 15000 OR 2 OR 75100)
> With this I get the expected result properly.
>
> In a particular situations the number of ORs are more (looks around 280)
> something as below.
>
> &fq=pref_work_locations:(5000 OR 1 OR 15000 OR 2 OR 75100 OR 125300
> OR 25300 OR 141100 OR 100700 OR 50300 OR 132100 OR 25000 OR 25100 OR 25200
> OR 25400 OR 25500 OR 25600 OR 25700 OR 25800 OR 25900 OR 26000 OR 26100 OR
> 26200 OR 26300 OR 26400 OR 26500 OR 3 OR 30100 OR 35000 OR 35100 OR
> 35200 OR 35300 OR 35400 OR 35500 OR 35600 OR 35700 OR 35800 OR 4 OR
> 45000 OR 45100 OR 45200 OR 45300 OR 45400 OR 45500 OR 5 OR 50100 OR
> 50200 OR 55000 OR 55100 OR 55200 OR 55300 OR 55400 OR 55500 OR 55600 OR
> 55700 OR 6 OR 60100 OR 60200 OR 60300 OR 60400 OR 60500 OR 65000 OR
> 65100 OR 65200 OR 7 OR 70100 OR 70200 OR 70300 OR 70400 OR 75000 OR
> 75200 OR 75300 OR 75400 OR 75500 OR 75600 OR 75700 OR 75800 OR 75900 OR
> 76000 OR 76100 OR 76200 OR 76300 OR 76400 OR 8 OR 80100 OR 80200 OR
> 80300 OR 80400 OR 80500 OR 85000 OR 85100 OR 85200 OR 85300 OR 85400 OR
> 85500 OR 85600 OR 85700 OR 85800 OR 85900 OR 86000 OR 86100 OR 86200 OR
> 9 OR 90100 OR 90200 OR 90300 OR 90400 OR 90500 OR 90600 OR 90700 OR
> 90800 OR 90900 OR 91000 OR 91100 OR 91200 OR 91300 OR 91400 OR 91500 OR
> 91600 OR 91700 OR 91800 OR 91900 OR 92000 OR 92100 OR 92200 OR 92300 OR
> 92400 OR 92500 OR 92600 OR 92700 OR 92800 OR 92900 OR 95000 OR 95100 OR
> 10 OR 100100 OR 105000 OR 105100 OR 105200 OR 105300 OR 105400 OR
> 105500 OR 105600 OR 105700 OR 105800 OR 105900 OR 106000 OR 106100 OR
> 106200 OR 11 OR 110100 OR 115000 OR 115100 OR 115200 OR 115300 OR
> 115400 OR 115500 OR 12 OR 120100 OR 120200 OR 120300 OR 120400 OR
> 120500 OR 120600 OR 120700 OR 120800 OR 120900 OR 121000 OR 121100 OR
> 125000 OR 125100 OR 125200 OR 125400 OR 125500 OR 125600 OR 125700 OR
> 125800 OR 125900 OR 126000 OR 126100 OR 13 OR 130100 OR 130200 OR
> 130300 OR 130400 OR 130500 OR 130600 OR 130700 OR 130800 OR 130900 OR
> 131000 OR 131100 OR 131200 OR 131300 OR 131400 OR 131500 OR 131600 OR
> 131700 OR 131800 OR 131900 OR 132000 OR 132200 OR 132300 OR 132400 OR
> 132500 OR 135000 OR 135100 OR 14 OR 140100 OR 140200 OR 140300 OR
> 140400 OR 140500 OR 140600 OR 140700 OR 140800 OR 140900 OR 141000 OR
> 141200 OR 141300 OR 141400 OR 141500 OR 141600 OR 141700 OR 141800 OR
> 141900 OR 142000 OR 142100 OR 145000 OR 15 OR 155000 OR 16 OR
> 165000 OR 17 OR 175000 OR 18 OR 185000 OR 19 OR 195000 OR
> 20 OR 205000 OR 21 OR 215000 OR 22 OR 225000 OR 23 OR
> 235000 OR 24 OR 245000 OR 25 OR 255000 OR 26 OR 265000 OR
> 27 OR 275000 OR 28 OR 285000 OR 29 OR 295000 OR 30 OR
> 305000 OR 31 OR 315000 OR 32 OR 325000 OR 33 OR 335000 OR
> 34 OR 345000 OR 35 OR 355000 OR 36 OR 365000 OR 37 OR
> 375000 OR 38 OR 385000 OR 39)
>
>
> When we have such a high number of ORs, it gives me 0 records, whereas I
> expected all possible records.
>
> So I am wondering, is there any limit for ORs in one fq filter.
>
> I know I need to go for something like, &fq=locations:[min , max] format,
> but that may not be possible always.., or probably we need to modify a
> bigger piece of code. So just as a temporary solution, is there anyother
> way I can follow?
>
> Bes

Re: [blogpost] Memory is overrated, use SSDs

2013-06-09 Thread Sourajit Basak

Hopefully I will be able to post results shortly on 2P4C performance.

~ Sourajit


On Mon, Jun 10, 2013 at 2:20 AM, Toke Eskildsen wrote:

> Sourajit Basak [sourajit.ba...@gmail.com]:
> > Does more processors with less cores or less processors with more cores
> > i.e. which of 4P2C or 2P4C has best cost per query ?
>
> I have not tested that, so everything I say is (somewhat qualified)
> guesswork.
>
> Assuming a NUMA architecture, my guess is that 2P4C would be superior to
> 4P2C. Solr utilizes both disk caching and explicit caching on the JVM heap
> making memory access quite heavy; the less processors, the higher the
> chance that the memory will be controlled by the processor running the
> given search thread. I am by no means a NUMA expert, but it seems that
> requests for memory controlled by another processor takes about twice as
> long as local memory.
>
> Our machine is a NUMA dual processor and if I can find the time, I would
> love to perform some tests on how that part influences query time. If would
> be interesting to lock usage to 2 cores on each processor vs. 4 cores on
> the same processor. The tricky part is to ensure that RAM is fully
> controlled by the single processor in the second test, including the disk
> cache.
>
> Regards,
> Toke Eskildsen

Re: solr facet query on multiple search term

2013-06-09 Thread vrparekh

Thanks Erick,

yes example url i provided is bit confusing, sorry for that.

Actual requirement is to get day wise total no. of counts for multiple
terms.

if we use q=(firstterm OR
secondterm)&facet.query=firstterm&facet.query=secondTerm. It will provide
total no. of records count for both search term, but not day wise
(facet.range will have combine results of both.)

need something like below (just sample),


  
 
   
 
  10551
  20802
  

 

   
 
   
 
  100
  5
  

 






--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-facet-query-on-multiple-search-term-tp4068856p4069259.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

2013-06-09 Thread Prathik Puthran

Hi,

@Walter
I'm trying to implement the below feature for the user.
User types in any "substring" of the strings in the dictionary (i.e. the
indexed string) .
SOLR Suggester should return all the strings in the dictionary which has
the input string as substring.

Thanks,
Prathik



On Fri, Jun 7, 2013 at 4:01 AM, Otis Gospodnetic  wrote:

> Hi
>
> Ngrams *will* do this for you.
>
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jun 6, 2013 7:53 AM, "Prathik Puthran" 
> wrote:
>
> > Basically I want the Suggester to return for "Jason Bourne" as suggestion
> > for ".*Bour.*" regex.
> >
> > Thanks,
> > Prathik
> >
> >
> > On Thu, Jun 6, 2013 at 12:52 PM, Prathik Puthran <
> > prathik.puthra...@gmail.com> wrote:
> >
> > > This works even now i.e. when I search for "Jas" it suggests "Jason
> > > Bourne". What I want is when I search for "Bour" or "ason" (any
> > substring)
> > > it should suggest me "Jason Bourne" .
> > >
> > >
> > > On Thu, Jun 6, 2013 at 12:34 PM, Upayavira  wrote:
> > >
> > >> Can you se the ShingleFilterFactory? It is ngrams for terms rather
> than
> > >> characters. If you limited it to two term ngrams, when the user
> presses
> > >> space after their first word, you could do a suggested query against
> > >> your two term ngram field, which would suggest Jason Bourne, Jason
> > >> Statham, etc then you press space after "Jason".
> > >>
> > >> Upayavira
> > >>
> > >> On Thu, Jun 6, 2013, at 07:25 AM, Prathik Puthran wrote:
> > >> > My use case is I want to search for any substring of the indexed
> > string
> > >> > and
> > >> > the Suggester should suggest the indexed string. What can I do to
> make
> > >> > this
> > >> > work?
> > >> >
> > >> > Thanks,
> > >> > Prathik
> > >> >
> > >> >
> > >> > On Thu, Jun 6, 2013 at 2:05 AM, Mikhail Khludnev
> > >> >  > >> > > wrote:
> > >> >
> > >> > > Please excuse my misunderstanding, but I always wonder why this
> > index
> > >> time
> > >> > > processing is suggested usually. from my POV is the case for
> > >> query-time
> > >> > > processing i.e. PrefixQuery aka wildcard query Jason* .
> > >> > > Ultra-fast term retrieval also provided by TermsComponent.
> > >> > >
> > >> > >
> > >> > > On Wed, Jun 5, 2013 at 8:09 PM, Jack Krupansky <
> > >> j...@basetechnology.com
> > >> > > >wrote:
> > >> > >
> > >> > > > ngrams?
> > >> > > >
> > >> > > > See:
> > >> > > > http://lucene.apache.org/core/**4_3_0/analyzers-common/org/**
> > >> > > > apache/lucene/analysis/ngram/**NGramFilterFactory.html<
> > >> > >
> > >>
> >
> http://lucene.apache.org/core/4_3_0/analyzers-common/org/apache/lucene/analysis/ngram/NGramFilterFactory.html
> > >> > > >
> > >> > > >
> > >> > > > -- Jack Krupansky
> > >> > > >
> > >> > > > -Original Message- From: Prathik Puthran
> > >> > > > Sent: Wednesday, June 05, 2013 11:59 AM
> > >> > > > To: solr-user@lucene.apache.org
> > >> > > > Subject: Configuring lucene to suggest the indexed string for
> all
> > >> the
> > >> > > > searches of the substring of the indexed string
> > >> > > >
> > >> > > >
> > >> > > > Hi,
> > >> > > >
> > >> > > > Is it possible to configure solr to suggest the indexed string
> for
> > >> all
> > >> > > the
> > >> > > > searches of the substring of the string?
> > >> > > >
> > >> > > > Thanks,
> > >> > > > Prathik
> > >> > > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Sincerely yours
> > >> > > Mikhail Khludnev
> > >> > > Principal Engineer,
> > >> > > Grid Dynamics
> > >> > >
> > >> > > 
> > >> > >  
> > >> > >
> > >>
> > >
> > >
> >
>

Re: Search for misspelled words in corpus

2013-06-09 Thread కామేశ్వర రావు భైరవభట్ల

Hi Upayavira,

The word I am searching for is "fight". Terms like "figth", "figh" are
spelling mistakes of fight. So I would like to find them. "sight" is
obviously not a spelling mistake of "fight". Even if it was a typo, I don't
really want to match "sight" with "fight".

regards,
Kamesh

On Sun, Jun 9, 2013 at 10:49 PM, Upayavira  wrote:

> You haven't stated why figh is correct and sight isn't. Is it because
> the first letter is different?
>
> Upayavira
>
> On Wed, Jun 5, 2013, at 02:10 PM, కామేశ్వర రావు భైరవభట్ల wrote:
> > Hi,
> >
> > I have a problem where our text corpus on which we need to do search
> > contains many misspelled words. Same word could also be misspelled in
> > several different ways. It could also have documents that have correct
> > spellings However, the search term that we give in query would always be
> > correct spelling. Now when we search on a term, we would like to get all
> > the documents that contain both correct and misspelled forms of the
> > search
> > term.
> > We tried fuzzy search, but it doesn't work as per our expectations. It
> > returns any close match, not specifically misspelled words. For example,
> > if
> > I'm searching for a word like "fight", I would like to return the
> > documents
> > that have words like "figth" and "feight", not documents with words like
> > "sight" and "light".
> > Is there any suggested approach for doing this?
> >
> > regards,
> > Kamesh
>

Re: Search for misspelled words in corpus

2013-06-09 Thread కామేశ్వర రావు భైరవభట్ల

Thanks everyone for the replies. I too had the same idea of a
pre-processing step. So, I first analyzed the corpus using a dictionary and
got all the misspelled words and created a separate index with those words
in Solr. Now, when I search for a given query word, first I search for the
exact match in the original index (created out of the text) and then a
fuzzy search on the index of misspelled words. This way it is giving more
accurate results. However, there is still issue with some proper nouns
(like say "Angie" showing up as a misspelled word and it gets matched with
a word like "Anger" in the fuzzy search). But I think the precision is good
enough for us.
I wanted to confirm that there is no other  in-built way in Solr to do this.

regards,
Kamesh

On Sun, Jun 9, 2013 at 10:40 PM, Jagdish Nomula wrote:

> ngrams will definitely increase the index. But the increase in size might
> not be super high as the total possible set of dictionary size is 26^3 and
> we are just storing docs list with each ngram.
>
> Another variation of the above ideas would be to add a pre-processing step,
> where-in you analyze the input corpus to explore the words which can be
> mis-spelt. You can use any of the word based LSH algorithms to do this and
> then index selectlively.
>
> This is a theoretical answer. You would have to cherry pick
> solutions/approaches for your use case.
>
> Thanks,
>
>
>
>
> On Sat, Jun 8, 2013 at 11:49 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hm, I was purposely avoiding mentioning ngrams because just ngramming
> > all indexed tokens would balloon the index My assumption was that
> > only *some* words are misspelled, in which case it may be better not
> > to ngram all tokens
> >
> > Otis
> > --
> > Solr & ElasticSearch Support
> > http://sematext.com/
> >
> >
> >
> >
> >
> > On Sun, Jun 9, 2013 at 2:30 AM, Jagdish Nomula 
> > wrote:
> > > Another theoretical answer for this question is ngrams approach. You
> can
> > > index the word and its trigrams. Query the index, by the string as well
> > as
> > > its trigrams, with a % match search. You than pass the exhaustive
> > resultset
> > > through a more expensive scoring such as Smith Waterman.
> > >
> > > Thanks,
> > >
> > > Jagdish
> > >
> > >
> > > On Sat, Jun 8, 2013 at 11:03 PM, Shashi Kant 
> > wrote:
> > >
> > >> n-grams might help, followed by a edit distance metric such as
> > Jaro-Winkler
> > >> or Smith-Waterman-Gotoh to further filter out.
> > >>
> > >>
> > >> On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic <
> > >> otis.gospodne...@gmail.com
> > >> > wrote:
> > >>
> > >> > Interesting problem.  The first thing that comes to mind is to do
> > >> > "word expansion" during indexing.  Kind of like synonym expansion,
> but
> > >> > maybe a bit more dynamic. If you can have a dictionary of correctly
> > >> > spelled words, then for each token emitted by the tokenizer you
> could
> > >> > look up the dictionary and expand the token to all other words that
> > >> > are similar/close enough.  This would not be super fast, and you'd
> > >> > likely have to add some custom heuristic for figuring out what
> > >> > "similar/close enough" means, but it might work.
> > >> >
> > >> > I'd love to hear other ideas...
> > >> >
> > >> > Otis
> > >> > --
> > >> > Solr & ElasticSearch Support
> > >> > http://sematext.com/
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల
> > >> >  wrote:
> > >> > > Hi,
> > >> > >
> > >> > > I have a problem where our text corpus on which we need to do
> search
> > >> > > contains many misspelled words. Same word could also be misspelled
> > in
> > >> > > several different ways. It could also have documents that have
> > correct
> > >> > > spellings However, the search term that we give in query would
> > always
> > >> be
> > >> > > correct spelling. Now when we search on a term, we would like to
> get
> > >> all
> > >> > > the documents that contain both correct and misspelled forms of
> the
> > >> > search
> > >> > > term.
> > >> > > We tried fuzzy search, but it doesn't work as per our
> expectations.
> > It
> > >> > > returns any close match, not specifically misspelled words. For
> > >> example,
> > >> > if
> > >> > > I'm searching for a word like "fight", I would like to return the
> > >> > documents
> > >> > > that have words like "figth" and "feight", not documents with
> words
> > >> like
> > >> > > "sight" and "light".
> > >> > > Is there any suggested approach for doing this?
> > >> > >
> > >> > > regards,
> > >> > > Kamesh
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > ***Jagdish Nomula*
> > > Sr. Manager Search
> > > Simply Hired, Inc.
> > > 370 San Aleso Ave., Ste 200
> > > Sunnyvale, CA 94085
> > >
> > > office - 408.400.4700
> > > cell - 408.431.2916
> > > email - jagd...@simplyhired.com 
> > >
> > > www.simplyhired.com
> >
>
>
>
> --
> ***Jagdish Nomula*
> Sr. Manager Search
> Simply Hired, Inc.
> 370 San Aleso Av

Solr 4.3 - Schema Parsing Failed: Invalid field property: compressed

2013-06-09 Thread Uomesh

Hi,

I am getting below after upgrading to Solr 4.3. Is compressed attribute no
longer supported in Solr 4.3 or it is a bug in 4.3?

org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Schema Parsing Failed: Invalid field property: compressed

Thanks,
Umesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-3-Schema-Parsing-Failed-Invalid-field-property-compressed-tp4069254.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog


Found the problem. Please see:
https://issues.apache.org/jira/browse/LUCENE-2899?focusedCommentId=13679293&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13679293
On 06/09/2013 04:38 PM, Patrick Mi wrote:

Hi Lance,

I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.


Regards,
Patrick

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems

Patrick-
I found the problem with multiple documents. The problem was that the
API for the life cycle of a Tokenizer changed, and I only noticed part
of the change. You can now upload multiple documents in one post, and
the OpenNLPTokenizer will process each document.

You're right, the example on the wiki is wrong. The FilterPayloadsFilter
default is to remove the given payloads, and needs keepPayloads="true"
to retain them.

The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it.

Lance

https://issues.apache.org/jira/browse/LUCENE-2899

On 05/28/2013 10:08 PM, Patrick Mi wrote:

Hi there,

Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems

Followed the wiki page instruction and set up a field with this type

aiming

to keep nouns and verbs and do a facet on the field
==


  
  
  
  

  
==

Struggled to get that going until I put the extra parameter
keepPayloads="true" in as below.
   

Question: am I doing the right thing? Is this a mistake on wiki

Second problem:

Posted the document xml one by one to the solr and the result was what I
expected.



1
check in the hotel


However if I put multiple documents into the same xml file and post it in
one go only the first document gets processed( only 'check' and 'hotel'

were

showing in the facet result.)
   



1
check in the hotel


2
removes the payloads


3
retains only nouns and verbs 



Same problem when updated the data using csv upload.

Is that a bug or something I did wrong?

Thanks in advance!

Regards,
Patrick

Re: OPENNLP problems

2013-06-09 Thread Lance Norskog


text_opennlp has the right behavior.
text_opennlp_pos does what you describe.
I'll look some more.

On 06/09/2013 04:38 PM, Patrick Mi wrote:

Hi Lance,

I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.


Regards,
Patrick

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com]
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems

Patrick-
I found the problem with multiple documents. The problem was that the
API for the life cycle of a Tokenizer changed, and I only noticed part
of the change. You can now upload multiple documents in one post, and
the OpenNLPTokenizer will process each document.

You're right, the example on the wiki is wrong. The FilterPayloadsFilter
default is to remove the given payloads, and needs keepPayloads="true"
to retain them.

The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it.

Lance

https://issues.apache.org/jira/browse/LUCENE-2899

On 05/28/2013 10:08 PM, Patrick Mi wrote:

Hi there,

Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems

Followed the wiki page instruction and set up a field with this type

aiming

to keep nouns and verbs and do a facet on the field
==


  
  
  
  

  
==

Struggled to get that going until I put the extra parameter
keepPayloads="true" in as below.
   

Question: am I doing the right thing? Is this a mistake on wiki

Second problem:

Posted the document xml one by one to the solr and the result was what I
expected.



1
check in the hotel


However if I put multiple documents into the same xml file and post it in
one go only the first document gets processed( only 'check' and 'hotel'

were

showing in the facet result.)
   



1
check in the hotel


2
removes the payloads


3
retains only nouns and verbs 



Same problem when updated the data using csv upload.

Is that a bug or something I did wrong?

Thanks in advance!

Regards,
Patrick

Re: Get Statistics With CloudSolrServer?

2013-06-09 Thread Mark Miller

On Jun 9, 2013, at 7:52 PM, Furkan KAMACI  wrote:

> There is a stat,st,cs section at admin page and gives information as like:
> 
> Last Modified, Num Docs, Max Doc and etc. How can I get such kind of
> information using CloudSolrServer with Solrj?

There is an admin request handler that exposes them as one option: the 
/admin/mbeans admin request handler - you can use solrj to hit that handler.

- Mark

Get Statistics With CloudSolrServer?

2013-06-09 Thread Furkan KAMACI

There is a stat,st,cs section at admin page and gives information as like:

Last Modified, Num Docs, Max Doc and etc. How can I get such kind of
information using CloudSolrServer with Solrj?

Re: Boosting based on value of field

2013-06-09 Thread Otis Gospodnetic

Index time boosting should be a bit faster, but not as flexible. Probably
better to go for query time boosting first.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Jun 9, 2013 5:46 AM, "Spadez"  wrote:

> Hi,
>
> By the looks of it I have a few options with regards to boosting. I was
> wondering from a performance point of view am I better to set the boost of
> certain results on import via the DIH or instead is it better to set the
> boost when doing queries, by adding it to the default queries?
>
> I have a "source" value and I want to boost it in the relevancy if it has a
> certain value, say for example:
>
> source=google then boost 10
> source=bing then boost 5
>
> Thanks for any help you can give!
>
> James
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Boosting-based-on-value-of-field-tp4069157.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

RE: OPENNLP problems

2013-06-09 Thread Patrick Mi

Hi Lance,

I updated the src from 4.x and applied the latest patch LUCENE-2899-x.patch
uploaded on 6th June but still had the same problem.


Regards,
Patrick

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Thursday, 6 June 2013 5:16 p.m.
To: solr-user@lucene.apache.org
Subject: Re: OPENNLP problems

Patrick-
I found the problem with multiple documents. The problem was that the 
API for the life cycle of a Tokenizer changed, and I only noticed part 
of the change. You can now upload multiple documents in one post, and 
the OpenNLPTokenizer will process each document.

You're right, the example on the wiki is wrong. The FilterPayloadsFilter 
default is to remove the given payloads, and needs keepPayloads="true" 
to retain them.

The fixed patch is up as LUCENE-2899-x.patch. Again, thanks for trying it.

Lance

https://issues.apache.org/jira/browse/LUCENE-2899

On 05/28/2013 10:08 PM, Patrick Mi wrote:
> Hi there,
>
> Checked out branch_4x and applied the latest patch
> LUCENE-2899-current.patch however I ran into 2 problems
>
> Followed the wiki page instruction and set up a field with this type
aiming
> to keep nouns and verbs and do a facet on the field
> ==
>  positionIncrementGap="100">
>
>   tokenizerModel="opennlp/en-token.bin"/>
>   posTaggerModel="opennlp/en-pos-maxent.bin"/>
>   payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>  
>
>  
> ==
>
> Struggled to get that going until I put the extra parameter
> keepPayloads="true" in as below.
>payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
>
> Question: am I doing the right thing? Is this a mistake on wiki
>
> Second problem:
>
> Posted the document xml one by one to the solr and the result was what I
> expected.
>
> 
> 
>1
>check in the hotel
> 
>
> However if I put multiple documents into the same xml file and post it in
> one go only the first document gets processed( only 'check' and 'hotel'
were
> showing in the facet result.)
>   
> 
> 
>1
>check in the hotel
> 
> 
>2
>removes the payloads
> 
> 
>3
>retains only nouns and verbs 
> 
> 
>
> Same problem when updated the data using csv upload.
>
> Is that a bug or something I did wrong?
>
> Thanks in advance!
>
> Regards,
> Patrick
>
>

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Furkan KAMACI

Here is my code to check state of node:

!liveNodes.contains(replica.getNodeName()) ? ZkStateReader.DOWN :
replica.get(ZkStateReader.STATE_PROP).toString()


2013/6/10 Mark Miller 

> You currently kind of have to look at both if you want to know the true
> state.
>
> An active state means that shard is up to date and online serving - as
> long as it's live node is also up.
>
> - Mark
>
> On Jun 9, 2013, at 6:18 PM, Furkan KAMACI  wrote:
>
> > Is it enough just look at only live nodes(if not: could you tell me is
> there any example code part at Solr source code)? By the way what does
> active means for clusterstate.json?
> >
> > 2013/6/10 Mark Miller 
> > The true current state is the live nodes info combined with the
> > clusterstate.json. If a node is not live, whatever is in
> clusterstate.json
> > is simply it's last state, not the current one.
> >
> > - Mark
> >
> >
> > On Sun, Jun 9, 2013 at 4:40 PM, Furkan KAMACI  >wrote:
> >
> > > I want to get cluster state of my SolrCloud and this is my method:
> > >
> > > private final CloudSolrServer solrServer;
> > >
> > > public SolrCloudServerFactory(String zkHost) throws
> MalformedURLException {
> > >  this.solrServer = new CloudSolrServer(zkHost);
> > >  solrServer.connect();
> > > }
> > >
> > > and I get what I want from solrServer variable. However I killed a
> running
> > > Solr node from my cluster and I see that clusterstate.json at still
> shows
> > > it as active and my solrserver variable says same thing too. I have
> killed
> > > that start.jar but why it is still active? How can I understand it is
> down?
> > > I think that Solr admin page doesn't look at clusterstate.json?
> > >
> >
> >
> >
> > --
> > - Mark
> >
>
>

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Mark Miller

You currently kind of have to look at both if you want to know the true state.

An active state means that shard is up to date and online serving - as long as 
it's live node is also up.

- Mark

On Jun 9, 2013, at 6:18 PM, Furkan KAMACI  wrote:

> Is it enough just look at only live nodes(if not: could you tell me is there 
> any example code part at Solr source code)? By the way what does active means 
> for clusterstate.json?
> 
> 2013/6/10 Mark Miller 
> The true current state is the live nodes info combined with the
> clusterstate.json. If a node is not live, whatever is in clusterstate.json
> is simply it's last state, not the current one.
> 
> - Mark
> 
> 
> On Sun, Jun 9, 2013 at 4:40 PM, Furkan KAMACI wrote:
> 
> > I want to get cluster state of my SolrCloud and this is my method:
> >
> > private final CloudSolrServer solrServer;
> >
> > public SolrCloudServerFactory(String zkHost) throws MalformedURLException {
> >  this.solrServer = new CloudSolrServer(zkHost);
> >  solrServer.connect();
> > }
> >
> > and I get what I want from solrServer variable. However I killed a running
> > Solr node from my cluster and I see that clusterstate.json at still shows
> > it as active and my solrserver variable says same thing too. I have killed
> > that start.jar but why it is still active? How can I understand it is down?
> > I think that Solr admin page doesn't look at clusterstate.json?
> >
> 
> 
> 
> --
> - Mark
>

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Furkan KAMACI

Is it enough just look at only live nodes(if not: could you tell me is
there any example code part at Solr source code)? By the way what does
active means for clusterstate.json?

2013/6/10 Mark Miller 

> The true current state is the live nodes info combined with the
> clusterstate.json. If a node is not live, whatever is in clusterstate.json
> is simply it's last state, not the current one.
>
> - Mark
>
>
> On Sun, Jun 9, 2013 at 4:40 PM, Furkan KAMACI  >wrote:
>
> > I want to get cluster state of my SolrCloud and this is my method:
> >
> > private final CloudSolrServer solrServer;
> >
> > public SolrCloudServerFactory(String zkHost) throws
> MalformedURLException {
> >  this.solrServer = new CloudSolrServer(zkHost);
> >  solrServer.connect();
> > }
> >
> > and I get what I want from solrServer variable. However I killed a
> running
> > Solr node from my cluster and I see that clusterstate.json at still shows
> > it as active and my solrserver variable says same thing too. I have
> killed
> > that start.jar but why it is still active? How can I understand it is
> down?
> > I think that Solr admin page doesn't look at clusterstate.json?
> >
>
>
>
> --
> - Mark
>

Re: Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Mark Miller

The true current state is the live nodes info combined with the
clusterstate.json. If a node is not live, whatever is in clusterstate.json
is simply it's last state, not the current one.

- Mark


On Sun, Jun 9, 2013 at 4:40 PM, Furkan KAMACI wrote:

> I want to get cluster state of my SolrCloud and this is my method:
>
> private final CloudSolrServer solrServer;
>
> public SolrCloudServerFactory(String zkHost) throws MalformedURLException {
>  this.solrServer = new CloudSolrServer(zkHost);
>  solrServer.connect();
> }
>
> and I get what I want from solrServer variable. However I killed a running
> Solr node from my cluster and I see that clusterstate.json at still shows
> it as active and my solrserver variable says same thing too. I have killed
> that start.jar but why it is still active? How can I understand it is down?
> I think that Solr admin page doesn't look at clusterstate.json?
>



-- 
- Mark

Re: LotsOfCores feature

2013-06-09 Thread Jack Krupansky

You're right - ZK is simply managing the shared config information for the 
cluster and has no part in query or transactions between the actual nodes, 
except as it depends on shared config information (e.g., what the shards are 
and where the nodes are.)


Somewhere in there I was simply making the point that ZK manages 1MB-size 
blobs of data, so a database of the status of millions of Solr cores would 
be beyond what can readily be managed by ZK.


-- Jack Krupansky

-Original Message- 
From: Upayavira

Sent: Sunday, June 09, 2013 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: LotsOfCores feature



On Fri, Jun 7, 2013, at 02:59 PM, Jack Krupansky wrote:

AFAICT, SolrCloud addresses the use case of distributed update for a
relatively smaller number of collections (dozens?) that have a relatively
larger number of rows - billions over a modest to moderate number of
nodes
(a handful to a dozen or dozens). So, maybe dozens of collections (some
people still call these "cores") that distribute hundreds of millions if
not
billions of rows over dozens (or potentially low hundreds) of nodes.
Technically, ZK was designed for thousands of nodes, but I don't think
that
was for the use case of distributed query that constantly fans out to all
shards.


Not sure I get what you're saying here. ZK was designed for thousands of
nodes, and the way it works is by making sure that each node has an
active cache of all relevant data within it so they don't need to poll
ZK for the data. Therefore, as far as ZK is concerned it is irrelevant
how many hosts are involved in any particular transaction - the node
that is handling the distribution consults its cache of the list of
active nodes, decides which one to hit, and off it goes, no interaction
with ZK required.

Or am I missing something?

Upayavira

RE: [blogpost] Memory is overrated, use SSDs

2013-06-09 Thread Toke Eskildsen

Sourajit Basak [sourajit.ba...@gmail.com]:
> Does more processors with less cores or less processors with more cores
> i.e. which of 4P2C or 2P4C has best cost per query ?

I have not tested that, so everything I say is (somewhat qualified) guesswork.

Assuming a NUMA architecture, my guess is that 2P4C would be superior to 4P2C. 
Solr utilizes both disk caching and explicit caching on the JVM heap making 
memory access quite heavy; the less processors, the higher the chance that the 
memory will be controlled by the processor running the given search thread. I 
am by no means a NUMA expert, but it seems that requests for memory controlled 
by another processor takes about twice as long as local memory.

Our machine is a NUMA dual processor and if I can find the time, I would love 
to perform some tests on how that part influences query time. If would be 
interesting to lock usage to 2 cores on each processor vs. 4 cores on the same 
processor. The tricky part is to ensure that RAM is fully controlled by the 
single processor in the second test, including the disk cache.

Regards,
Toke Eskildsen

Why clusterstate.json says active for a killed Solr Node?

2013-06-09 Thread Furkan KAMACI

I want to get cluster state of my SolrCloud and this is my method:

private final CloudSolrServer solrServer;

public SolrCloudServerFactory(String zkHost) throws MalformedURLException {
 this.solrServer = new CloudSolrServer(zkHost);
 solrServer.connect();
}

and I get what I want from solrServer variable. However I killed a running
Solr node from my cluster and I see that clusterstate.json at still shows
it as active and my solrserver variable says same thing too. I have killed
that start.jar but why it is still active? How can I understand it is down?
I think that Solr admin page doesn't look at clusterstate.json?

Re: LotsOfCores feature

2013-06-09 Thread Upayavira

On Fri, Jun 7, 2013, at 02:59 PM, Jack Krupansky wrote:
> AFAICT, SolrCloud addresses the use case of distributed update for a 
> relatively smaller number of collections (dozens?) that have a relatively 
> larger number of rows - billions over a modest to moderate number of
> nodes 
> (a handful to a dozen or dozens). So, maybe dozens of collections (some 
> people still call these "cores") that distribute hundreds of millions if
> not 
> billions of rows over dozens (or potentially low hundreds) of nodes. 
> Technically, ZK was designed for thousands of nodes, but I don't think
> that 
> was for the use case of distributed query that constantly fans out to all 
> shards.

Not sure I get what you're saying here. ZK was designed for thousands of
nodes, and the way it works is by making sure that each node has an
active cache of all relevant data within it so they don't need to poll
ZK for the data. Therefore, as far as ZK is concerned it is irrelevant
how many hosts are involved in any particular transaction - the node
that is handling the distribution consults its cache of the list of
active nodes, decides which one to hit, and off it goes, no interaction
with ZK required.

Or am I missing something?

Upayavira

Re: LotsOfCores feature

2013-06-09 Thread Aleksey

Thanks Paul. Just a little clarification:

You mention that you migrate data using built-in replication, but if
you map and route users yourself, doesn't that mean that you also need
to manage replication yourself? Your routing logic needs to be aware
of how to map both replicas for each user, and if one hosts goes down,
then it needs to distribute traffic that it was receiving over other
hosts. Same thing for adding more hosts.
I did a couple of quick searches and found mostly older wikis that say
solr replication will change in the future. Would you be able to point
me to the right one?


-

On Fri, Jun 7, 2013 at 8:34 PM, Noble Paul നോബിള്‍  नोब्ळ्
 wrote:
> We set it up like this
> + individual solr instances are setup
> + external mapping/routing to allocate users to instances. This information
> can be stored in an external data store
> + all cores are created as transient and loadonstart as false
> + cores come online on demand
> + as and when users data get bigger (or hosts are hot)they are migrated
> between less hit hosts using in built replication
>
> Keep in mind we had the schema for all users. Currently there is no way to
> upload a new schema to solr.
> On Jun 8, 2013 1:15 AM, "Aleksey"  wrote:
>
>> > Aleksey: What would you say is the average core size for your use case -
>> > thousands or millions of rows? And how sharded would each of your
>> > collections be, if at all?
>>
>> Average core/collection size wouldn't even be thousands, hundreds more
>> like. And the largest would be half a million or so but that's a
>> pathological case. I don't need sharding and queries than fan out to
>> different machines. If fact I'd like to avoid that so I don't have to
>> collate the results.
>>
>>
>> > The Wiki page was built not for Cloud Solr.
>> >
>> > We have done such a deployment where less than a tenth of cores were
>> active
>> > at any given point in time. though there were tens of million indices
>> they
>> > were split among a large no:of hosts.
>> >
>> > If you don't insist of Cloud deployment it is possible. I'm not sure if
>> it
>> > is possible with cloud
>>
>> By Cloud you mean specifically SolrCloud? I don't have to have it if I
>> can do without it. Bottom line is I want a bunch of small cores to be
>> distributed over a fleet, each core completely fitting on one server.
>> Would you be willing to provide a little more details on your setup?
>> In particular, how are you managing the cores?
>> How do you route requests to proper server?
>> If you scale the fleet up and down, does reshuffling of the cores
>> happen automatically or is it an involved manual process?
>>
>> Thanks,
>>
>> Aleksey
>>

Re: LIMIT on number of OR in fq

2013-06-09 Thread Jack Krupansky

Maybe it is hitting some kind of container limit on URL length, like more 
than 2048?


Add &debugQuery=true to your query and see what query is both received and 
parsed and generated.


Also, if the default query operator is set to or, fq={! q.op=OR}..., then 
you can drop the " OR " operators for a shorter query string.


That said, as with most features of Lucene and Solr, the #1 rule is: Use 
them in moderation. A few dozen IDs are fine. A hundred immediately raising 
suspicion - what are you really trying to do? 200?! 250??!! Over 300?!! 
1,000?!?! 5,000?!?! I mean, do you really need to do all of this on a single 
"query"? If you find yourself saying "Yes", go back to the drawing board and 
think a lot more carefully what your data model is. I mean, the application 
data model is supposed to simplify queries. Your case does not seem simple 
at all.


Tell us what you are really trying to do with this extreme filter query. The 
fact that you stumbled into an apparent problem should just be a wakeup call 
that you need to reconsider your basic design assumptions.


-- Jack Krupansky

-Original Message- 
From: Kamal Palei

Sent: Sunday, June 09, 2013 9:07 AM
To: solr-user@lucene.apache.org
Subject: LIMIT on number of OR in fq

Dear All
I am using below syntax to check for a particular field.
&fq=locations:(5000 OR 1 OR 15000 OR 2 OR 75100)
With this I get the expected result properly.

In a particular situations the number of ORs are more (looks around 280)
something as below.

&fq=pref_work_locations:(5000 OR 1 OR 15000 OR 2 OR 75100 OR 125300
OR 25300 OR 141100 OR 100700 OR 50300 OR 132100 OR 25000 OR 25100 OR 25200
OR 25400 OR 25500 OR 25600 OR 25700 OR 25800 OR 25900 OR 26000 OR 26100 OR
26200 OR 26300 OR 26400 OR 26500 OR 3 OR 30100 OR 35000 OR 35100 OR
35200 OR 35300 OR 35400 OR 35500 OR 35600 OR 35700 OR 35800 OR 4 OR
45000 OR 45100 OR 45200 OR 45300 OR 45400 OR 45500 OR 5 OR 50100 OR
50200 OR 55000 OR 55100 OR 55200 OR 55300 OR 55400 OR 55500 OR 55600 OR
55700 OR 6 OR 60100 OR 60200 OR 60300 OR 60400 OR 60500 OR 65000 OR
65100 OR 65200 OR 7 OR 70100 OR 70200 OR 70300 OR 70400 OR 75000 OR
75200 OR 75300 OR 75400 OR 75500 OR 75600 OR 75700 OR 75800 OR 75900 OR
76000 OR 76100 OR 76200 OR 76300 OR 76400 OR 8 OR 80100 OR 80200 OR
80300 OR 80400 OR 80500 OR 85000 OR 85100 OR 85200 OR 85300 OR 85400 OR
85500 OR 85600 OR 85700 OR 85800 OR 85900 OR 86000 OR 86100 OR 86200 OR
9 OR 90100 OR 90200 OR 90300 OR 90400 OR 90500 OR 90600 OR 90700 OR
90800 OR 90900 OR 91000 OR 91100 OR 91200 OR 91300 OR 91400 OR 91500 OR
91600 OR 91700 OR 91800 OR 91900 OR 92000 OR 92100 OR 92200 OR 92300 OR
92400 OR 92500 OR 92600 OR 92700 OR 92800 OR 92900 OR 95000 OR 95100 OR
10 OR 100100 OR 105000 OR 105100 OR 105200 OR 105300 OR 105400 OR
105500 OR 105600 OR 105700 OR 105800 OR 105900 OR 106000 OR 106100 OR
106200 OR 11 OR 110100 OR 115000 OR 115100 OR 115200 OR 115300 OR
115400 OR 115500 OR 12 OR 120100 OR 120200 OR 120300 OR 120400 OR
120500 OR 120600 OR 120700 OR 120800 OR 120900 OR 121000 OR 121100 OR
125000 OR 125100 OR 125200 OR 125400 OR 125500 OR 125600 OR 125700 OR
125800 OR 125900 OR 126000 OR 126100 OR 13 OR 130100 OR 130200 OR
130300 OR 130400 OR 130500 OR 130600 OR 130700 OR 130800 OR 130900 OR
131000 OR 131100 OR 131200 OR 131300 OR 131400 OR 131500 OR 131600 OR
131700 OR 131800 OR 131900 OR 132000 OR 132200 OR 132300 OR 132400 OR
132500 OR 135000 OR 135100 OR 14 OR 140100 OR 140200 OR 140300 OR
140400 OR 140500 OR 140600 OR 140700 OR 140800 OR 140900 OR 141000 OR
141200 OR 141300 OR 141400 OR 141500 OR 141600 OR 141700 OR 141800 OR
141900 OR 142000 OR 142100 OR 145000 OR 15 OR 155000 OR 16 OR
165000 OR 17 OR 175000 OR 18 OR 185000 OR 19 OR 195000 OR
20 OR 205000 OR 21 OR 215000 OR 22 OR 225000 OR 23 OR
235000 OR 24 OR 245000 OR 25 OR 255000 OR 26 OR 265000 OR
27 OR 275000 OR 28 OR 285000 OR 29 OR 295000 OR 30 OR
305000 OR 31 OR 315000 OR 32 OR 325000 OR 33 OR 335000 OR
34 OR 345000 OR 35 OR 355000 OR 36 OR 365000 OR 37 OR
375000 OR 38 OR 385000 OR 39)


When we have such a high number of ORs, it gives me 0 records, whereas I
expected all possible records.

So I am wondering, is there any limit for ORs in one fq filter.

I know I need to go for something like, &fq=locations:[min , max] format,
but that may not be possible always.., or probably we need to modify a
bigger piece of code. So just as a temporary solution, is there anyother
way I can follow?

Best Regards
Kamal

Re: Search for misspelled words in corpus

2013-06-09 Thread Upayavira

You haven't stated why figh is correct and sight isn't. Is it because
the first letter is different?

Upayavira

On Wed, Jun 5, 2013, at 02:10 PM, కామేశ్వర రావు భైరవభట్ల wrote:
> Hi,
> 
> I have a problem where our text corpus on which we need to do search
> contains many misspelled words. Same word could also be misspelled in
> several different ways. It could also have documents that have correct
> spellings However, the search term that we give in query would always be
> correct spelling. Now when we search on a term, we would like to get all
> the documents that contain both correct and misspelled forms of the
> search
> term.
> We tried fuzzy search, but it doesn't work as per our expectations. It
> returns any close match, not specifically misspelled words. For example,
> if
> I'm searching for a word like "fight", I would like to return the
> documents
> that have words like "figth" and "feight", not documents with words like
> "sight" and "light".
> Is there any suggested approach for doing this?
> 
> regards,
> Kamesh

Re: Search for misspelled words in corpus

2013-06-09 Thread Jagdish Nomula

ngrams will definitely increase the index. But the increase in size might
not be super high as the total possible set of dictionary size is 26^3 and
we are just storing docs list with each ngram.

Another variation of the above ideas would be to add a pre-processing step,
where-in you analyze the input corpus to explore the words which can be
mis-spelt. You can use any of the word based LSH algorithms to do this and
then index selectlively.

This is a theoretical answer. You would have to cherry pick
solutions/approaches for your use case.

Thanks,




On Sat, Jun 8, 2013 at 11:49 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hm, I was purposely avoiding mentioning ngrams because just ngramming
> all indexed tokens would balloon the index My assumption was that
> only *some* words are misspelled, in which case it may be better not
> to ngram all tokens
>
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
>
>
>
>
>
> On Sun, Jun 9, 2013 at 2:30 AM, Jagdish Nomula 
> wrote:
> > Another theoretical answer for this question is ngrams approach. You can
> > index the word and its trigrams. Query the index, by the string as well
> as
> > its trigrams, with a % match search. You than pass the exhaustive
> resultset
> > through a more expensive scoring such as Smith Waterman.
> >
> > Thanks,
> >
> > Jagdish
> >
> >
> > On Sat, Jun 8, 2013 at 11:03 PM, Shashi Kant 
> wrote:
> >
> >> n-grams might help, followed by a edit distance metric such as
> Jaro-Winkler
> >> or Smith-Waterman-Gotoh to further filter out.
> >>
> >>
> >> On Sun, Jun 9, 2013 at 1:59 AM, Otis Gospodnetic <
> >> otis.gospodne...@gmail.com
> >> > wrote:
> >>
> >> > Interesting problem.  The first thing that comes to mind is to do
> >> > "word expansion" during indexing.  Kind of like synonym expansion, but
> >> > maybe a bit more dynamic. If you can have a dictionary of correctly
> >> > spelled words, then for each token emitted by the tokenizer you could
> >> > look up the dictionary and expand the token to all other words that
> >> > are similar/close enough.  This would not be super fast, and you'd
> >> > likely have to add some custom heuristic for figuring out what
> >> > "similar/close enough" means, but it might work.
> >> >
> >> > I'd love to hear other ideas...
> >> >
> >> > Otis
> >> > --
> >> > Solr & ElasticSearch Support
> >> > http://sematext.com/
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Jun 5, 2013 at 9:10 AM, కామేశ్వర రావు భైరవభట్ల
> >> >  wrote:
> >> > > Hi,
> >> > >
> >> > > I have a problem where our text corpus on which we need to do search
> >> > > contains many misspelled words. Same word could also be misspelled
> in
> >> > > several different ways. It could also have documents that have
> correct
> >> > > spellings However, the search term that we give in query would
> always
> >> be
> >> > > correct spelling. Now when we search on a term, we would like to get
> >> all
> >> > > the documents that contain both correct and misspelled forms of the
> >> > search
> >> > > term.
> >> > > We tried fuzzy search, but it doesn't work as per our expectations.
> It
> >> > > returns any close match, not specifically misspelled words. For
> >> example,
> >> > if
> >> > > I'm searching for a word like "fight", I would like to return the
> >> > documents
> >> > > that have words like "figth" and "feight", not documents with words
> >> like
> >> > > "sight" and "light".
> >> > > Is there any suggested approach for doing this?
> >> > >
> >> > > regards,
> >> > > Kamesh
> >> >
> >>
> >
> >
> >
> > --
> > ***Jagdish Nomula*
> > Sr. Manager Search
> > Simply Hired, Inc.
> > 370 San Aleso Ave., Ste 200
> > Sunnyvale, CA 94085
> >
> > office - 408.400.4700
> > cell - 408.431.2916
> > email - jagd...@simplyhired.com 
> >
> > www.simplyhired.com
>



-- 
***Jagdish Nomula*
Sr. Manager Search
Simply Hired, Inc.
370 San Aleso Ave., Ste 200
Sunnyvale, CA 94085

office - 408.400.4700
cell - 408.431.2916
email - jagd...@simplyhired.com 

www.simplyhired.com

Nutch installation

2013-06-09 Thread Andrea Lanzoni

Hi everyone, I am a newcomer to Nutch and Solr and, after studying 
literature available on web, I tried to install them.

I have not been able to match the few instructions on the wikiapache site.
I then turned on YouTube and found a video on how to install Nutch and 
Solr on *Windows7*.


I followed the steps of the video by installing:
- Tomcat
- Java jdk 7
- Cygwin, Nutch 1.6 and Solr 4

Everything went apparently smooth and I got cartels where, following the 
video, I copied in:

C:\cygwin\home\apache-nutch-1.6-bin
and
C:\cygwin\home\solr-4.2.0\solr-4.2.0

Whilst Java jdk7 is presently in: C:\glassfish3\jdk7

Problems arouse when the video tutorial began an MS-DOS look&feel 
configuration: although I kept diligently following video-tutorial 
steps, the outcome was always different and I couldn't go through due to 
continuous error output.


I apologize for my dumbness but I couldn't find how to manage it. If 
somebody has a clear and detailed step by step pattern to follow for 
installing Nutch and Solr I would be very grateful.

Thanks in advance.
Andrea Lanzoni

Re: LIMIT on number of OR in fq

2013-06-09 Thread Aloke Ghoshal

Hi Kamal,

You might have to increase the value of maxBooleanClauses in solrconfig.xml
(http://wiki.apache.org/solr/SolrConfigXml). The default value 1024 should
have been fine for 280 search terms.

Though not relevant to your query (OR query) take a look at for an
explanation:
http://solr.pl/en/2011/12/19/do-i-have-to-look-for-maxbooleanclauses-when-using-filters/

Regards,
Aloke


On Sun, Jun 9, 2013 at 6:37 PM, Kamal Palei  wrote:

> Dear All
> I am using below syntax to check for a particular field.
> &fq=locations:(5000 OR 1 OR 15000 OR 2 OR 75100)
> With this I get the expected result properly.
>
> In a particular situations the number of ORs are more (looks around 280)
> something as below.
>
> &fq=pref_work_locations:(5000 OR 1 OR 15000 OR 2 OR 75100 OR 125300
> OR 25300 OR 141100 OR 100700 OR 50300 OR 132100 OR 25000 OR 25100 OR 25200
> OR 25400 OR 25500 OR 25600 OR 25700 OR 25800 OR 25900 OR 26000 OR 26100 OR
> 26200 OR 26300 OR 26400 OR 26500 OR 3 OR 30100 OR 35000 OR 35100 OR
> 35200 OR 35300 OR 35400 OR 35500 OR 35600 OR 35700 OR 35800 OR 4 OR
> 45000 OR 45100 OR 45200 OR 45300 OR 45400 OR 45500 OR 5 OR 50100 OR
> 50200 OR 55000 OR 55100 OR 55200 OR 55300 OR 55400 OR 55500 OR 55600 OR
> 55700 OR 6 OR 60100 OR 60200 OR 60300 OR 60400 OR 60500 OR 65000 OR
> 65100 OR 65200 OR 7 OR 70100 OR 70200 OR 70300 OR 70400 OR 75000 OR
> 75200 OR 75300 OR 75400 OR 75500 OR 75600 OR 75700 OR 75800 OR 75900 OR
> 76000 OR 76100 OR 76200 OR 76300 OR 76400 OR 8 OR 80100 OR 80200 OR
> 80300 OR 80400 OR 80500 OR 85000 OR 85100 OR 85200 OR 85300 OR 85400 OR
> 85500 OR 85600 OR 85700 OR 85800 OR 85900 OR 86000 OR 86100 OR 86200 OR
> 9 OR 90100 OR 90200 OR 90300 OR 90400 OR 90500 OR 90600 OR 90700 OR
> 90800 OR 90900 OR 91000 OR 91100 OR 91200 OR 91300 OR 91400 OR 91500 OR
> 91600 OR 91700 OR 91800 OR 91900 OR 92000 OR 92100 OR 92200 OR 92300 OR
> 92400 OR 92500 OR 92600 OR 92700 OR 92800 OR 92900 OR 95000 OR 95100 OR
> 10 OR 100100 OR 105000 OR 105100 OR 105200 OR 105300 OR 105400 OR
> 105500 OR 105600 OR 105700 OR 105800 OR 105900 OR 106000 OR 106100 OR
> 106200 OR 11 OR 110100 OR 115000 OR 115100 OR 115200 OR 115300 OR
> 115400 OR 115500 OR 12 OR 120100 OR 120200 OR 120300 OR 120400 OR
> 120500 OR 120600 OR 120700 OR 120800 OR 120900 OR 121000 OR 121100 OR
> 125000 OR 125100 OR 125200 OR 125400 OR 125500 OR 125600 OR 125700 OR
> 125800 OR 125900 OR 126000 OR 126100 OR 13 OR 130100 OR 130200 OR
> 130300 OR 130400 OR 130500 OR 130600 OR 130700 OR 130800 OR 130900 OR
> 131000 OR 131100 OR 131200 OR 131300 OR 131400 OR 131500 OR 131600 OR
> 131700 OR 131800 OR 131900 OR 132000 OR 132200 OR 132300 OR 132400 OR
> 132500 OR 135000 OR 135100 OR 14 OR 140100 OR 140200 OR 140300 OR
> 140400 OR 140500 OR 140600 OR 140700 OR 140800 OR 140900 OR 141000 OR
> 141200 OR 141300 OR 141400 OR 141500 OR 141600 OR 141700 OR 141800 OR
> 141900 OR 142000 OR 142100 OR 145000 OR 15 OR 155000 OR 16 OR
> 165000 OR 17 OR 175000 OR 18 OR 185000 OR 19 OR 195000 OR
> 20 OR 205000 OR 21 OR 215000 OR 22 OR 225000 OR 23 OR
> 235000 OR 24 OR 245000 OR 25 OR 255000 OR 26 OR 265000 OR
> 27 OR 275000 OR 28 OR 285000 OR 29 OR 295000 OR 30 OR
> 305000 OR 31 OR 315000 OR 32 OR 325000 OR 33 OR 335000 OR
> 34 OR 345000 OR 35 OR 355000 OR 36 OR 365000 OR 37 OR
> 375000 OR 38 OR 385000 OR 39)
>
>
> When we have such a high number of ORs, it gives me 0 records, whereas I
> expected all possible records.
>
> So I am wondering, is there any limit for ORs in one fq filter.
>
> I know I need to go for something like, &fq=locations:[min , max] format,
> but that may not be possible always.., or probably we need to modify a
> bigger piece of code. So just as a temporary solution, is there anyother
> way I can follow?
>
> Best Regards
> Kamal
>

Re: Note on The Book

2013-06-09 Thread Jack Krupansky

Point taken. Although initially the focus is on one big e-book - to make 
searching easier, with zero chance of printing that as one paper book, the 
intent is to go multi-volume for the print edition down the road a little 
bit.

-- Jack Krupansky

-Original Message- 
From: Otis Gospodnetic

Sent: Sunday, June 09, 2013 2:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

It's 2013 and people suffer from ADD.  Break it up into a la carte
chapter books.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/

On Wed, May 29, 2013 at 6:23 PM, Jack Krupansky  
wrote:

Markus,

Okay, more pages it is!

-- Jack Krupansky

-Original Message- From: Markus Jelsma
Sent: Wednesday, May 29, 2013 5:35 PM

To: solr-user@lucene.apache.org
Subject: RE: Note on The Book

Jack,

I'd prefer tons of information instead of a meager 300 page book that 
leaves
a lot of questions. I'm looking forward to a paperback or hardcover book 
and

price doesn't really matter, it is going to be worth it anyway.

Thanks,
Markus

-Original message-

From:Jack Krupansky 
Sent: Wed 29-May-2013 15:10
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

Erick, your point is well taken. Although my primary interest/skill is to
produce a solid foundation reference (including tons of examples), the
real
goal is to then build on top of that foundation.

While I focus on the hard-core material - which really does include some
narrative and lots of examples in addition to tons of "mere" reference, 
my

co-author, Ryan Tabora, will focus almost exclusively on... narrative and
diagrams.

And when I say reference, I also mean lots of examples. Even as the
hard-core reference stabilizes, the examples will continue to grow ("like
weeds!").

Once we get the current, existing, under-review, chapters packaged into
the
new book and available for purchase and download (maybe Lulu, not 
decided)

-
available, in a couple of weeks, it will be updated approximately every
other week, both with additional reference material, and additional
narrative and diagrams.

One of our priorities (after we get through Stage 0 of the next few 
weeks)

is to in fact start giving each of the long Deep Dive Chapters enough
narrative lead to basically say exactly that - why you should care.

A longer-term priority is to improve the balance of narrative and
hard-core
reference. Yeah, that will be a lot of pages. It already is. We were at
907
pages and I was about to drop in another 166 pages on update handlers 
when

O'Reilly threw up their hands and pulled the plug. I was estimating 1200
pages at that stage. And I'll probably have another 60-80 pages on update
request processors within a week or so. With more to come. That did
include
a lot of hard-core material and example code for Lucene, which won't be 
in

the new Solr-only book. By focusing on an e-book the raw page count alone
becomes moot. We haven't given up on print - the intent is eventually to
have multiple volumes (4-8 or so, maybe more), both as cheaper e-books 
($3
to $5 each) and slimmer print volumes for people who don't need 
everything

in print.

In fact, we will likely offer the revamped initial chapters of the book 
as

a
standalone introduction to Solr - narrative introduction ("why should you
care about Solr"), basic concepts of Lucene and Solr (and why you should
care!), brief tutorial walkthough of the major feature areas of Solr, and
a
case study. The intent would be both e-book and a slim print volume (75
pages?).

Another priority (beyond Stage 0) is to develop a detailed roadmap 
diagram

of Solr and how applications can use Solr, and then use that to show how
each of the Deep Dive sections (heavy reference, but gradually adding 
more

narrative over time.)

We will probably be very open to requests - what people really wish a 
book

would actually do for them. The only request we won't be open to is to do
it
all in only 300 pages.

-- Jack Krupansky

-Original Message- From: Erick Erickson
Sent: Wednesday, May 29, 2013 7:19 AM
To: solr-user@lucene.apache.org
Subject: Re: Note on The Book

FWIW, picking up on Alexandre's point. One of my continual
frustrations with virtually _all_
technical books is they become endless pages of details without ever
mentioning why
the hell I should care. Unfortunately, explaining use-cases for
everything would only make
the book about 10,000 pages long. Siiigh.

I guess you can take this as a vote for narrative

Erick

On Tue, May 28, 2013 at 4:53 PM, Jack Krupansky 
wrote:
> We'll have a blog for the book. We hope to have a first
> raw/rough/partial/draft published as an e-book in maybe 10 days to 2
> weeks.
> As soon as we get that process under control, we'll start the blog. 
> I'll

> keep your email on file and keep you posted.
>
> -- Jack Krupansky
>
> -Original Message- From: Swati Swoboda
> Sent: Tuesday, May 28, 2013 1:36 PM
> To: solr-user@lucene.apache.org
> Subject: RE: No

LIMIT on number of OR in fq

2013-06-09 Thread Kamal Palei

Dear All
I am using below syntax to check for a particular field.
&fq=locations:(5000 OR 1 OR 15000 OR 2 OR 75100)
With this I get the expected result properly.

In a particular situations the number of ORs are more (looks around 280)
something as below.

&fq=pref_work_locations:(5000 OR 1 OR 15000 OR 2 OR 75100 OR 125300
OR 25300 OR 141100 OR 100700 OR 50300 OR 132100 OR 25000 OR 25100 OR 25200
OR 25400 OR 25500 OR 25600 OR 25700 OR 25800 OR 25900 OR 26000 OR 26100 OR
26200 OR 26300 OR 26400 OR 26500 OR 3 OR 30100 OR 35000 OR 35100 OR
35200 OR 35300 OR 35400 OR 35500 OR 35600 OR 35700 OR 35800 OR 4 OR
45000 OR 45100 OR 45200 OR 45300 OR 45400 OR 45500 OR 5 OR 50100 OR
50200 OR 55000 OR 55100 OR 55200 OR 55300 OR 55400 OR 55500 OR 55600 OR
55700 OR 6 OR 60100 OR 60200 OR 60300 OR 60400 OR 60500 OR 65000 OR
65100 OR 65200 OR 7 OR 70100 OR 70200 OR 70300 OR 70400 OR 75000 OR
75200 OR 75300 OR 75400 OR 75500 OR 75600 OR 75700 OR 75800 OR 75900 OR
76000 OR 76100 OR 76200 OR 76300 OR 76400 OR 8 OR 80100 OR 80200 OR
80300 OR 80400 OR 80500 OR 85000 OR 85100 OR 85200 OR 85300 OR 85400 OR
85500 OR 85600 OR 85700 OR 85800 OR 85900 OR 86000 OR 86100 OR 86200 OR
9 OR 90100 OR 90200 OR 90300 OR 90400 OR 90500 OR 90600 OR 90700 OR
90800 OR 90900 OR 91000 OR 91100 OR 91200 OR 91300 OR 91400 OR 91500 OR
91600 OR 91700 OR 91800 OR 91900 OR 92000 OR 92100 OR 92200 OR 92300 OR
92400 OR 92500 OR 92600 OR 92700 OR 92800 OR 92900 OR 95000 OR 95100 OR
10 OR 100100 OR 105000 OR 105100 OR 105200 OR 105300 OR 105400 OR
105500 OR 105600 OR 105700 OR 105800 OR 105900 OR 106000 OR 106100 OR
106200 OR 11 OR 110100 OR 115000 OR 115100 OR 115200 OR 115300 OR
115400 OR 115500 OR 12 OR 120100 OR 120200 OR 120300 OR 120400 OR
120500 OR 120600 OR 120700 OR 120800 OR 120900 OR 121000 OR 121100 OR
125000 OR 125100 OR 125200 OR 125400 OR 125500 OR 125600 OR 125700 OR
125800 OR 125900 OR 126000 OR 126100 OR 13 OR 130100 OR 130200 OR
130300 OR 130400 OR 130500 OR 130600 OR 130700 OR 130800 OR 130900 OR
131000 OR 131100 OR 131200 OR 131300 OR 131400 OR 131500 OR 131600 OR
131700 OR 131800 OR 131900 OR 132000 OR 132200 OR 132300 OR 132400 OR
132500 OR 135000 OR 135100 OR 14 OR 140100 OR 140200 OR 140300 OR
140400 OR 140500 OR 140600 OR 140700 OR 140800 OR 140900 OR 141000 OR
141200 OR 141300 OR 141400 OR 141500 OR 141600 OR 141700 OR 141800 OR
141900 OR 142000 OR 142100 OR 145000 OR 15 OR 155000 OR 16 OR
165000 OR 17 OR 175000 OR 18 OR 185000 OR 19 OR 195000 OR
20 OR 205000 OR 21 OR 215000 OR 22 OR 225000 OR 23 OR
235000 OR 24 OR 245000 OR 25 OR 255000 OR 26 OR 265000 OR
27 OR 275000 OR 28 OR 285000 OR 29 OR 295000 OR 30 OR
305000 OR 31 OR 315000 OR 32 OR 325000 OR 33 OR 335000 OR
34 OR 345000 OR 35 OR 355000 OR 36 OR 365000 OR 37 OR
375000 OR 38 OR 385000 OR 39)


When we have such a high number of ORs, it gives me 0 records, whereas I
expected all possible records.

So I am wondering, is there any limit for ORs in one fq filter.

I know I need to go for something like, &fq=locations:[min , max] format,
but that may not be possible always.., or probably we need to modify a
bigger piece of code. So just as a temporary solution, is there anyother
way I can follow?

Best Regards
Kamal

Re: Help required with fq syntax

2013-06-09 Thread Kamal Palei

Hi Otis
Your suggestion worked fine.

Thanks
kamal


On Sun, Jun 9, 2013 at 7:58 AM, Kamal Palei  wrote:

> Though the syntax looks fine, but I get all the records. As per example
> given above I get all the documents, meaning filtering did not work. I am
> curious to know if my indexing went fine or not. I will check and revert
> back.
>
>
> On Sun, Jun 9, 2013 at 7:21 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
>> Try:
>>
>> ...&q=*:*&fq=-blocked_company_ids:5
>>
>> Otis
>> --
>> Solr & ElasticSearch Support
>> http://sematext.com/
>>
>>
>>
>>
>>
>> On Sat, Jun 8, 2013 at 9:37 PM, Kamal Palei 
>> wrote:
>> > Dear All
>> > I have a multi-valued field blocked_company_ids in index.
>> >
>> > You can think like
>> >
>> > 1. document1 , blocked_company_ids: 1, 5, 7
>> > 2. document2 , blocked_company_ids: 2, 6, 7
>> > 3. document3 , blocked_company_ids: 4, 5, 6
>> >
>> > and so on .
>> >
>> > If I want to retrieve all the documents  where blocked_company_id does
>> not
>> > contain one particular company id say 5.
>> >
>> > So my search result should give me only document2 as document1 and
>> > document3 both contains 5.
>> >
>> > To achieve this how fq syntax looks like is it something like below
>> >
>> > &fq=blocked_company_ids:-5
>> >
>> > I tried like above syntax, but it gives me 0 record.
>> >
>> > Can somebody help me with the syntax please, and point me where all
>> syntax
>> > details are given.
>> >
>> > Thanks
>> > Kamal
>> > Net Cloud Systems
>>
>
>

Boosting based on value of field

2013-06-09 Thread Spadez

Hi,

By the looks of it I have a few options with regards to boosting. I was
wondering from a performance point of view am I better to set the boost of
certain results on import via the DIH or instead is it better to set the
boost when doing queries, by adding it to the default queries?

I have a "source" value and I want to boost it in the relevancy if it has a
certain value, say for example:

source=google then boost 10
source=bing then boost 5

Thanks for any help you can give!

James



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boosting-based-on-value-of-field-tp4069157.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [blogpost] Memory is overrated, use SSDs

2013-06-09 Thread Sourajit Basak

@Erick,
Your revelation on SSDs is very valuable.
Do you have any idea on the following ?

Does more processors with less cores or less processors with more cores
i.e. which of 4P2C or 2P4C has best cost per query ?

~ Sourajit


On Fri, Jun 7, 2013 at 4:45 PM, Erick Erickson wrote:

> Thanks for this, hard data is always welcome!
>
> Another blog post for my reference list!
>
> Erick
>
> On Fri, Jun 7, 2013 at 2:59 AM, Toke Eskildsen 
> wrote:
> > On Fri, 2013-06-07 at 07:15 +0200, Andy wrote:
> >> One question I have is did you precondition the SSD (
> http://www.sandforce.com/userfiles/file/downloads/FMS2009_F2A_Smith.pdf)? SSD 
> performance tends to take a very deep dive once all blocks are
> written at least once and the garbage collector kicks in.
> >
> > Not explicitly so. The machine is our test server with the SSDs in RAID
> > 0 with - to my knowledge - no TRIM support. They are 2½ year old and has
> > had a fair amount of data written and being 3/4 full most of the time.
> > At one point in time we experimented with 10M+ relatively small files
> > and a couple of 40GB databases, so the drives are definitely not in
> > pristine condition.
> >
> > Anyway, as Solr searches is heavy on tiny random reads, I suspect that
> > search performance will be largely unaffected by SSD fragmentation. It
> > would be interesting to examine, but for now I cannot prioritize another
> > large performance test.
> >
> >
> > Thank you for your input. I will update the blog post accordingly,
> > Toke Eskildsen, State and University Library, Denmark
> >
>

Re: does solr support query time only stopwords?

2013-06-09 Thread Upayavira

Can you give examples? Show your field type config, the search terms you
used. 

Also, did you reindex after changing your field type? As the index will
be written using the analyser that was active at the time of indexing,
so maybe your index still contains stop words.

Upayavira

On Sun, Jun 9, 2013, at 08:09 AM, jchen2000 wrote:
> Nope. I only searched with individual stop words.  Very strange to me
> 
> 
> Otis Gospodnetic-5 wrote
> > Maybe returned hits match other query terms.
> > 
> > Otis
> > Solr & ElasticSearch Support
> > http://sematext.com/
> > On Jun 8, 2013 6:34 PM, "jchen2000" <
> 
> > jchen200@
> 
> > > wrote:
> > 
> >> I wanted to analyze high frequency terms using Solr's Luke request
> >> handler
> >> and keep updating the stopwords file for new queries from time to time.
> >> Obviously I have to index all terms whether they belong to stopwords list
> >> or
> >> not.
> >>
> >> So I configured query analyzer stopwords list but disabled index analyzer
> >> stopwords list, However, it seems like the query would return all records
> >> containing stopwords after this.
> >>
> >> Anybody has an idea why this would happen?
> >>
> >> ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> 
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087p4069143.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: does solr support query time only stopwords?

2013-06-09 Thread jchen2000

Nope. I only searched with individual stop words.  Very strange to me


Otis Gospodnetic-5 wrote
> Maybe returned hits match other query terms.
> 
> Otis
> Solr & ElasticSearch Support
> http://sematext.com/
> On Jun 8, 2013 6:34 PM, "jchen2000" <

> jchen200@

> > wrote:
> 
>> I wanted to analyze high frequency terms using Solr's Luke request
>> handler
>> and keep updating the stopwords file for new queries from time to time.
>> Obviously I have to index all terms whether they belong to stopwords list
>> or
>> not.
>>
>> So I configured query analyzer stopwords list but disabled index analyzer
>> stopwords list, However, it seems like the query would return all records
>> containing stopwords after this.
>>
>> Anybody has an idea why this would happen?
>>
>> ps. I am using Datastax Enterprise 3.0.2 and the solr version is 4.0
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/does-solr-support-query-time-only-stopwords-tp4069087p4069143.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: LIMIT on number of OR in fq

Re: [blogpost] Memory is overrated, use SSDs

Re: solr facet query on multiple search term

Re: Configuring lucene to suggest the indexed string for all the searches of the substring of the indexed string

Re: Search for misspelled words in corpus

Re: Search for misspelled words in corpus

Solr 4.3 - Schema Parsing Failed: Invalid field property: compressed

Re: OPENNLP problems

Re: OPENNLP problems

Re: Get Statistics With CloudSolrServer?

Get Statistics With CloudSolrServer?

Re: Boosting based on value of field

RE: OPENNLP problems

Re: Why clusterstate.json says active for a killed Solr Node?

Re: Why clusterstate.json says active for a killed Solr Node?

Re: Why clusterstate.json says active for a killed Solr Node?

Re: Why clusterstate.json says active for a killed Solr Node?

Re: LotsOfCores feature

RE: [blogpost] Memory is overrated, use SSDs

Why clusterstate.json says active for a killed Solr Node?

Re: LotsOfCores feature

Re: LotsOfCores feature

Re: LIMIT on number of OR in fq

Re: Search for misspelled words in corpus

Re: Search for misspelled words in corpus

Nutch installation

Re: LIMIT on number of OR in fq

Re: Note on The Book

LIMIT on number of OR in fq

Re: Help required with fq syntax

Boosting based on value of field

Re: [blogpost] Memory is overrated, use SSDs

Re: does solr support query time only stopwords?

Re: does solr support query time only stopwords?

34 matches

Site Navigation

Mail list logo

Footer information