Re: Question about MoreLikeThis query with solrj

2012-10-10 Thread Dominique Bejean

Hi,

Are you using a correct stopword file for the French language ? It is 
very importante in order the the MLT component works fine.

You should also take a look at this document.
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/

MLT support in SolrJ is a an old story. May be this can help also.
https://issues.apache.org/jira/browse/SOLR-1085

Regards

--
Dominique
www.eolya.fr
www.crawl-anywhere.com
www.mysolrserver.com




Le 02/10/12 18:14, G.Long a écrit :

Hi :)

I'm using Solr 3.6.1 and i'm trying to use the similarity features of 
lucene/solr to compare texts.


The content of my documents is in french so I defined a field like :

indexed="true" stored="true"/>


(it uses the default text_fr fieldType provided with the default 
schema.xml file)


i'm using the following method to query my index :

SolrQuery sQuery = new SolrQuery();
sQuery.setQueryType("/" + MoreLikeThisParams.MLT);
sQuery.set(MoreLikeThisParams.MATCH_INCLUDE, false);
sQuery.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
sQuery.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
sQuery.set(MoreLikeThisParams.MAX_QUERY_TERMS, 50);
sQuery.set(MoreLikeThisParams.SIMILARITY_FIELDS, field);
sQuery.set("fl", "*,id,score");
sQuery.setRows(5);
sQuery.setQuery("content_mlt:"/the content to find/");

QueryResponse rsp = server.query(sQuery);
return rsp.getResults();

The problem is that the returned results and the associated scores 
look strange to me.


I indexed the three following texts :

sample 1 :
"Le 1° de l'article 81 du CGI exige que les allocations pour frais 
soient utilisées conformément à leur objet
pour être affranchies de l'impôt. Lorsque la réalité du versement des 
allocations est établie,
le bénéficiaire doit cependant être en mesure de justifier de leur 
utilisation";


sample 2:
"Le premier alinéa du 1° de l'article 81 du CGI prévoit que les 
rémunérations des journalistes,
rédacteurs, photographes, directeurs de journaux et critiques 
dramatiques et musicaux
perçues ès qualités constituent des allocations pour frais d'emploi 
affranchies d'impôt

à concurrence de 7 650 EUR.";

sample 3:
"Par ailleurs, lorsque leur montant est fixé par voie législative, les 
allocations
pour frais prévues au 1° de l'article 81 du CGI sont toujours réputées 
utilisées
conformément à leur objet et ne peuvent donner lieu à aucune 
vérification de la part de l'administration.
Il s'agit d'une présomption irréfragable, qui ne peut donc pas être 
renversée par la preuve contraire qui
serait apportée par l'administration d'une utilisation non conforme à 
son objet de l'allocation concernée.
Pour que le deuxième alinéa du 1° de l'article 81 du CGI s'applique, 
deux conditions doivent être réunies
simultanément : - la nature d'allocation spéciale inhérente à la 
fonction ou à l'emploi résulte directement de la loi ;

- son montant est fixé par la loi";

I tried to query the index by passing the first sample as the content 
to query and the result is the following :

MLT result: id: dc3 - score: 0.114195324 (correspond to the sample 3)
MLT result: id: dc2 - score: 0.035233106 (correspond to the sample 2)

The results don't even contain the first sample, although it is 
exactly the same text as the one put into the query :/


Any idea of why I get these results?
Maybe the query parameters are incorrect or there is something to 
change in the solr config?


Thanks :)

Gary









Re: Solr - Make Exact Search on Field with Fuzzy Query

2012-10-10 Thread meghana
Hi Erickson,

Thanks for your valuable reply. 

Actually we had tried with just storing one field and highlighting on that
field all the time , whether we search on it or not.

It sometimes occurs issue , like if i search with the term : 'hospitality' .
and I use field for highlighting , which having stemming applied. it returns
me highlights with 'hospital' , 'hospitality'. whether it should return
highlighting only on 'hospitality' as I am doing exact term search, can you
suggest anything on this?? If we can eliminate this issue while highlighting
on original field (having applied stemming on it). 

The other solutions are sounds really good, but as you said they are hard to
implement and we at this point , wanted to implement inbuilt solutions if
possible. 

Please suggest if we can eliminate above explained issue on highlighting.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888p4013067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: segment number during optimize of index

2012-10-10 Thread Lance Norskog
Study index merging. This is awesome.
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Jame- opening lots of segments is not a problem. A major performance problem 
you will find is 'Large Pages'. This is an operating-system strategy for 
managing servers with 10s of gigabytes of memory. Without it, all large 
programs run much more slowly than they could. It is not a Solr or JVM problem.


- Original Message -
| From: "jun Wang" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 6:36:09 PM
| Subject: Re: segment number during optimize of index
| 
| I have an other question, does the number of segment affect speed for
| update index?
| 
| 2012/10/10 jame vaalet 
| 
| > Guys,
| > thanks for all the inputs, I was continuing my research to know
| > more about
| > segments in Lucene. Below are my conclusion, please correct me if
| > am wrong.
| >
| >1. Segments are independent sub-indexes in seperate file, while
| >indexing
| >its better to create new segment as it doesnt have to modify an
| >existing
| >file. where as while searching, smaller the segment the better
| >it is
| > since
| >you open x (not exactly x but xn a value proportional to x)
| >physical
| > files
| >to search if you have got x segments in the index.
| >2. since lucene has memory map concept, for each file/segment in
| >index a
| >new m-map file is created and mapped to the physcial file in
| >disk. Can
| >someone explain or correct this in detail, i am sure there are
| >lot many
| >people wondering how m-map works while you merge or optimze
| >index
| > segments.
| >
| >
| >
| > On 6 October 2012 07:41, Otis Gospodnetic
| >  >wrote:
| >
| > > If I were you and not knowing all your details...
| > >
| > > I would optimize indices that are static (not being modified) and
| > > would optimize down to 1 segment.
| > > I would do it when search traffic is low.
| > >
| > > Otis
| > > --
| > > Search Analytics -
| > > http://sematext.com/search-analytics/index.html
| > > Performance Monitoring - http://sematext.com/spm/index.html
| > >
| > >
| > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet
| > > 
| > wrote:
| > > > Hi Eric,
| > > > I  am in a major dilemma with my index now. I have got 8 cores
| > > > each
| > > around
| > > > 300 GB in size and half of them are deleted documents in it and
| > > > above
| > > that
| > > > each has got around 100 segments as well. Do i issue a
| > > > expungeDelete
| > and
| > > > allow the merge policy to take care of the segments or optimize
| > > > them
| > into
| > > > single segment. Search performance is not at par compared to
| > > > usual solr
| > > > speed.
| > > > If i have to optimize what segment number should i choose? my
| > > > RAM size
| > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30
| > > > GB). Pleas
| > > > advice !
| > > >
| > > > thanks.
| > > >
| > > >
| > > > On 6 October 2012 00:00, Erick Erickson
| > > > 
| > wrote:
| > > >
| > > >> because eventually you'd run out of file handles. Imagine a
| > > >> long-running server with 100,000 segments. Totally
| > > >> unmanageable.
| > > >>
| > > >> I think shawn was emphasizing that RAM requirements don't
| > > >> depend on the number of segments. There are other
| > > >> resources that file consume however.
| > > >>
| > > >> Best
| > > >> Erick
| > > >>
| > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet
| > > >> 
| > > wrote:
| > > >> > hi Shawn,
| > > >> > thanks for the detailed explanation.
| > > >> > I have got one doubt, you said it doesn matter how many
| > > >> > segments
| > index
| > > >> have
| > > >> > but then why does solr has this merge policy which merges
| > > >> > segments
| > > >> > frequently?  why can it leave the segments as it is rather
| > > >> > than
| > > merging
| > > >> > smaller one's into bigger one?
| > > >> >
| > > >> > thanks
| > > >> > .
| > > >> >
| > > >> > On 5 October 2012 05:46, Shawn Heisey 
| > > >> > wrote:
| > > >> >
| > > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote:
| > > >> >>
| > > >> >>> so imagine i have merged the 150 Gb index into single
| > > >> >>> segment,
| > this
| > > >> would
| > > >> >>> make a single segment of 150 GB in memory. When new docs
| > > >> >>> are
| > > indexed it
| > > >> >>> wouldn't alter this 150 Gb index unless i update or delete
| > > >> >>> the
| > older
| > > >> docs,
| > > >> >>> right? will 150 Gb single segment have problem with memory
| > swapping
| > > at
| > > >> OS
| > > >> >>> level?
| > > >> >>>
| > > >> >>
| > > >> >> Supplement to my previous reply:  the real memory mentioned
| > > >> >> in the
| > > last
| > > >> >> paragraph does not include the memory that the OS uses to
| > > >> >> cache
| > disk
| > > >> >> access.  If more memory is needed and all the free memory
| > > >> >> is being
| > > used
| > > >> by
| > > >> >> the disk cache, the OS will throw away part of the disk
| > > >> >> cache (a
| > > >

Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Lance Norskog
Hapax legomena (terms with DF of 1) are very often typos. You can automatically 
build a stopword file from these. If you want to be picky, you can use only 
words with a very small distance from words with much larger DF.

- Original Message -
| From: "Robert Muir" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 5:40:23 PM
| Subject: Re: Using additional dictionary with DirectSolrSpellChecker
| 
| On Wed, Oct 10, 2012 at 9:02 AM, O. Klein  wrote:
| > I don't want to tweak the threshold. For majority of cases it works
| > fine.
| >
| > It's for cases where term has low frequency but is spelled
| > correctly.
| >
| > If you lower the threshold you would also get incorrect spelled
| > terms as
| > suggestions.
| >
| 
| Yeah there is no real magic here when the corpus contains typos. this
| existing docFreq heuristic was just borrowed from the old index-based
| spellchecker.
| 
| I do wonder if using # of occurrences (totalTermFreq) instead of # of
| documents with the term (docFreq) would improve the heuristic.
| 
| In all cases I think if you want to also integrate a dictionary or
| something, it seems like this could somehow be done with the
| File-based spellchecker?
| 


Re: Auto Correction?

2012-10-10 Thread deniz
so other than commercial solutions, it seems like i need to have plugin
right? i couldnt find any open source solutions yet...



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-Correction-tp4012666p4013044.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query foreign language "synonyms" / words of equivalent meaning?

2012-10-10 Thread Lance Norskog
I want an update processor that runs Translation Party.

http://translationparty.com/

http://downloadsquad.switched.com/2009/08/14/translation-party-achieves-hilarious-results-using-google-transl/

- Original Message -
| From: "SUJIT PAL" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 2:51:37 PM
| Subject: Re: Query foreign language "synonyms" / words of equivalent meaning?
| 
| Hi,
| 
| We are using google translate to do something like what you
| (onlinespending) want to do, so maybe it will help.
| 
| During indexing, we store the searchable fields from documents into a
| fields named _en, _fr, _es, etc. So assuming we capture title and
| body from each document, the fields are (title_en, body_en),
| (title_fr, body_fr), etc, with their own analyzer chains. These
| documents come from a controlled source (ie not the web), so we know
| the language they are authored in.
| 
| During searching, a custom component intercepts the client language
| and the query. The query is sent to google translate for language
| detection. The largest amount of docs in the corpus is english, so
| if the detected language is either english or the client language,
| then we call google translate again to find the translated query in
| the other (english or client) language. Another custom component
| constructs an OR query between the two languages one component of
| which is aimed at the _en field set and the other aimed at the _xx
| (client language) field set.
| 
| -sujit
| 
| On Oct 9, 2012, at 11:24 PM, Bernd Fehling wrote:
| 
| > 
| > As far as I know, there is no built-in functionality for language
| > translation.
| > I would propose to write one, but there are many many pitfalls.
| > If you want to translate from one language to another you might
| > have to
| > know the "starting" language. Otherwise you get problems with
| > translation.
| > 
| > Not (german) -> distress (english), affliction (english)
| > 
| > - you might have words in one language which are stopwords in other
| > language "not"
| > - you don't have a one to one mapping, it's more like "1 to n+x"
| >  toilette (french) -> bathroom, rest room / restroom, powder room
| > 
| > This are just two points which jump into my mind but there are tons
| > of pitfalls.
| > 
| > We use the solution of a multilingual thesaurus as synonym
| > dictionary.
| > http://en.wikipedia.org/wiki/Eurovoc
| > It holds translations of 22 official languages of the European
| > Union.
| > 
| > So a search for "europäischer währungsfonds" gives also results
| > with
| > "european monetary fund", "fonds monétaire européen", ...
| > 
| > Regards
| > Bernd
| > 
| > 
| > 
| > Am 10.10.2012 04:54, schrieb onlinespend...@gmail.com:
| >> Hi,
| >> 
| >> English is going to be the predominant language used in my
| >> documents, but
| >> there may be a spattering of words in other languages (such as
| >> Spanish or
| >> French). What I'd like is to initiate a query for something like
| >> "bathroom"
| >> for example and for Solr to return documents that not only contain
| >> "bathroom" but also "baño" (Spanish). And the same goes when
| >> searching for "
| >> baño". I'd like Solr to return documents that contain either
| >> "bathroom" or "
| >> baño".
| >> 
| >> One possibility is to pre-translate all indexed documents to a
| >> common
| >> language, in this case English. And if someone were to search
| >> using a
| >> foreign word, I'd need to translate that to English before issuing
| >> a query
| >> to Solr. This appears to be problematic, since I'd have to know
| >> whether the
| >> indexed words and the query are even in a foreign language, which
| >> is not
| >> trivial.
| >> 
| >> Another possibility is to pre-build a list of foreign word
| >> synonyms. So baño
| >> would be listed as a synonym for bathroom. But I'd need to include
| >> other
| >> languages (such as toilette in French) and other words. This
| >> requires that
| >> I know in advance all possible words I'd need to include foreign
| >> language
| >> versions of (not to mention needing to know which languages to
| >> include).
| >> This isn't trivial either.
| >> 
| >> I'm assuming there's no built-in functionality that supports the
| >> foreign
| >> language translation on the fly, so what do people propose?
| >> 
| >> Thanks!
| >> 
| > 
| > --
| > *
| > Bernd FehlingUniversitätsbibliothek Bielefeld
| > Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie
| > Universitätsstr. 25 und Wissensmanagement
| > 33615 Bielefeld
| > Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
| > 
| > BASE - Bielefeld Academic Search Engine - www.base-search.net
| > *
| 
| 


Re: segment number during optimize of index

2012-10-10 Thread jun Wang
I have an other question, does the number of segment affect speed for
update index?

2012/10/10 jame vaalet 

> Guys,
> thanks for all the inputs, I was continuing my research to know more about
> segments in Lucene. Below are my conclusion, please correct me if am wrong.
>
>1. Segments are independent sub-indexes in seperate file, while indexing
>its better to create new segment as it doesnt have to modify an existing
>file. where as while searching, smaller the segment the better it is
> since
>you open x (not exactly x but xn a value proportional to x) physical
> files
>to search if you have got x segments in the index.
>2. since lucene has memory map concept, for each file/segment in index a
>new m-map file is created and mapped to the physcial file in disk. Can
>someone explain or correct this in detail, i am sure there are lot many
>people wondering how m-map works while you merge or optimze index
> segments.
>
>
>
> On 6 October 2012 07:41, Otis Gospodnetic  >wrote:
>
> > If I were you and not knowing all your details...
> >
> > I would optimize indices that are static (not being modified) and
> > would optimize down to 1 segment.
> > I would do it when search traffic is low.
> >
> > Otis
> > --
> > Search Analytics - http://sematext.com/search-analytics/index.html
> > Performance Monitoring - http://sematext.com/spm/index.html
> >
> >
> > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet 
> wrote:
> > > Hi Eric,
> > > I  am in a major dilemma with my index now. I have got 8 cores each
> > around
> > > 300 GB in size and half of them are deleted documents in it and above
> > that
> > > each has got around 100 segments as well. Do i issue a expungeDelete
> and
> > > allow the merge policy to take care of the segments or optimize them
> into
> > > single segment. Search performance is not at par compared to usual solr
> > > speed.
> > > If i have to optimize what segment number should i choose? my RAM size
> > > around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
> > > advice !
> > >
> > > thanks.
> > >
> > >
> > > On 6 October 2012 00:00, Erick Erickson 
> wrote:
> > >
> > >> because eventually you'd run out of file handles. Imagine a
> > >> long-running server with 100,000 segments. Totally
> > >> unmanageable.
> > >>
> > >> I think shawn was emphasizing that RAM requirements don't
> > >> depend on the number of segments. There are other
> > >> resources that file consume however.
> > >>
> > >> Best
> > >> Erick
> > >>
> > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet 
> > wrote:
> > >> > hi Shawn,
> > >> > thanks for the detailed explanation.
> > >> > I have got one doubt, you said it doesn matter how many segments
> index
> > >> have
> > >> > but then why does solr has this merge policy which merges segments
> > >> > frequently?  why can it leave the segments as it is rather than
> > merging
> > >> > smaller one's into bigger one?
> > >> >
> > >> > thanks
> > >> > .
> > >> >
> > >> > On 5 October 2012 05:46, Shawn Heisey  wrote:
> > >> >
> > >> >> On 10/4/2012 3:22 PM, jame vaalet wrote:
> > >> >>
> > >> >>> so imagine i have merged the 150 Gb index into single segment,
> this
> > >> would
> > >> >>> make a single segment of 150 GB in memory. When new docs are
> > indexed it
> > >> >>> wouldn't alter this 150 Gb index unless i update or delete the
> older
> > >> docs,
> > >> >>> right? will 150 Gb single segment have problem with memory
> swapping
> > at
> > >> OS
> > >> >>> level?
> > >> >>>
> > >> >>
> > >> >> Supplement to my previous reply:  the real memory mentioned in the
> > last
> > >> >> paragraph does not include the memory that the OS uses to cache
> disk
> > >> >> access.  If more memory is needed and all the free memory is being
> > used
> > >> by
> > >> >> the disk cache, the OS will throw away part of the disk cache (a
> > >> >> near-instantaneous operation that should never involve disk I/O)
> and
> > >> give
> > >> >> that memory to the application that requests it.
> > >> >>
> > >> >> Here's a very good breakdown of how memory gets used with
> > MMapDirectory
> > >> in
> > >> >> Solr.  It's applicable to any program that uses memory mapping, not
> > just
> > >> >> Solr:
> > >> >>
> > >> >>
> > http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory<
> > >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory>
> > >> >>
> > >> >> Thanks,
> > >> >> Shawn
> > >> >>
> > >> >>
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > -JAME
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > -JAME
> >
>
>
>
> --
>
> -JAME
>



-- 
from Jun Wang


Re: [ANN] new SolrMeter release

2012-10-10 Thread Lance Norskog
Cool! 

Who made the logo? It's nice.

- Original Message -
| From: "Tomás Fernández Löbbe" 
| To: solr-user@lucene.apache.org
| Sent: Wednesday, October 10, 2012 3:57:32 PM
| Subject: [ANN] new SolrMeter release
| 
| Hi everyone, I'm pleased to announce that SolrMeter 0.3.0 was
| released
| today.
| 
| To see the issues resolved for this version go to:
| 
http://code.google.com/p/solrmeter/issues/list?can=1&q=Milestone%3DRelease-0.3.0+status%3AResolved
| 
| To download the last version:
| http://code.google.com/p/solrmeter/downloads/list
| 
| Thanks,
| 
| Tomás
| 
| http://code.google.com/p/solrmeter/
| 


Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread Robert Muir
On Wed, Oct 10, 2012 at 9:02 AM, O. Klein  wrote:
> I don't want to tweak the threshold. For majority of cases it works fine.
>
> It's for cases where term has low frequency but is spelled correctly.
>
> If you lower the threshold you would also get incorrect spelled terms as
> suggestions.
>

Yeah there is no real magic here when the corpus contains typos. this
existing docFreq heuristic was just borrowed from the old index-based
spellchecker.

I do wonder if using # of occurrences (totalTermFreq) instead of # of
documents with the term (docFreq) would improve the heuristic.

In all cases I think if you want to also integrate a dictionary or
something, it seems like this could somehow be done with the
File-based spellchecker?


Re: Wild card searching - well sort of

2012-10-10 Thread Erick Erickson
Have you looked at WordDelimiterFilterFactory that was mentioned
earlier? Try a fieldType in the admin/analysis page that has
WDFF as part of the analysis chain. It would do exactly what you've
described so far.

WDFF splits the input up as tokens on non-alphanum characters,
alpha/num transitions and case transitions (you can configure these).
Then searching will match these split-out tokens.

Best
Erick

On Wed, Oct 10, 2012 at 10:28 AM, Kissue Kissue  wrote:
> It is really not fixed. It could also be *-*-BAAN or BAAN-CAN20-*. In each
> i just want only the fixed character(s) to match then the * can match any
> character.
>
>
> On Wed, Oct 10, 2012 at 2:05 PM, Toke Eskildsen 
> wrote:
>
>> On Wed, 2012-10-10 at 14:15 +0200, Kissue Kissue wrote:
>> > I have added the string: *-BAAN-* to the index to a field called pattern
>> > which is a string type. Now i want to be able to search for A100-BAAN-C20
>> > or ZA20-BAAN-300 and have Solr return *-BAAN-*.
>>
>> That sounds a lot like the problem presented in the thread
>> "Indexing wildcard patterns":
>> http://web.archiveorange.com/archive/v/AAfXfcuIJY9BQJL3mjty
>>
>> The short answer is no, Solr does not support this in the general form.
>> But maybe you can make it work anyway. In your example, the two queries
>> A100-BAAN-C20 and ZA20-BAAN-300 share the form
>> [4 random characters]-[4 significant characters]-[3 random characters]
>> so a little bit of pre-processing would rewrite that to
>> *-[4 significant characters]-*
>> which would match *-BAAN-*
>>
>> If you describe the patterns and common elements to your indexed terms
>> and to your queries, we might come up with something.
>>
>>


Re: add shard to index

2012-10-10 Thread Upayavira
That is what is being discussed already. The thing is, at present, Solr
requires an even distribution of documents across shards, so you can't
just add another shard, assign it to a hash range, and be done with it.

The reason is down to the scoring mechanism used - TF/IDF (term
frequency/inverse document frequency). The IDF portion says "how many
times does this term appear in the whole index?" If there are only two
documents in the index, then the IDF will be very different from when
there are 2 million docs, resulting in different scores for equivalent
documents based upon which shard they are in.

Currently, the only solution to this is to distribute your documents
evenly, which would mean, if you have four shards and you create a
fifth, you'd need to send 1/4 of your documents from each shard to the
new shard, which is not really a trivial task.

I believe the JIRA ticket covering this was mentioned earlier in this
thread.

Upayavira

On Mon, Oct 8, 2012, at 04:33 PM, Radim Kolar wrote:
> Do it as it is done in cassandra database. Adding new node and 
> redistributing data can be done in live system without problem it looks 
> like this:
> 
> every cassandra node has key range assigned. instead of assigning keys 
> to nodes like hash(key) mod nodes, then every node has its portion of 
> hash keyspace. They do not need to be same, some node can have larger 
> portion of keyspace then another.
> 
> hash function max possible value is 12.
> 
> shard1 - 1-4
> shard2 - 5-8
> shard3 - 9-12
> 
> now lets add new shard. In cassandra adding new shard by default cuts 
> existing one by half, so you will have
> shard1 - 1-2
> shard23-4
> shard35-8
> shard4   9-12
> 
> see? You needed to move only documents from old shard1. Usually you are 
> adding more then 1 shard during reorganization, you do not need to 
> rebalance cluster by moving every node into different position in hash 
> keyspace that much.


[ANN] new SolrMeter release

2012-10-10 Thread Tomás Fernández Löbbe
Hi everyone, I'm pleased to announce that SolrMeter 0.3.0 was released
today.

To see the issues resolved for this version go to:
http://code.google.com/p/solrmeter/issues/list?can=1&q=Milestone%3DRelease-0.3.0+status%3AResolved

To download the last version:
http://code.google.com/p/solrmeter/downloads/list

Thanks,

Tomás

http://code.google.com/p/solrmeter/


unsuscribe

2012-10-10 Thread zMk Bnc

unsuscribe

Re: Query foreign language "synonyms" / words of equivalent meaning?

2012-10-10 Thread SUJIT PAL
Hi,

We are using google translate to do something like what you (onlinespending) 
want to do, so maybe it will help.

During indexing, we store the searchable fields from documents into a fields 
named _en, _fr, _es, etc. So assuming we capture title and body from each 
document, the fields are (title_en, body_en), (title_fr, body_fr), etc, with 
their own analyzer chains. These documents come from a controlled source (ie 
not the web), so we know the language they are authored in.

During searching, a custom component intercepts the client language and the 
query. The query is sent to google translate for language detection. The 
largest amount of docs in the corpus is english, so if the detected language is 
either english or the client language, then we call google translate again to 
find the translated query in the other (english or client) language. Another 
custom component constructs an OR query between the two languages one component 
of which is aimed at the _en field set and the other aimed at the _xx (client 
language) field set.

-sujit

On Oct 9, 2012, at 11:24 PM, Bernd Fehling wrote:

> 
> As far as I know, there is no built-in functionality for language translation.
> I would propose to write one, but there are many many pitfalls.
> If you want to translate from one language to another you might have to
> know the "starting" language. Otherwise you get problems with translation.
> 
> Not (german) -> distress (english), affliction (english)
> 
> - you might have words in one language which are stopwords in other language 
> "not"
> - you don't have a one to one mapping, it's more like "1 to n+x"
>  toilette (french) -> bathroom, rest room / restroom, powder room
> 
> This are just two points which jump into my mind but there are tons of 
> pitfalls.
> 
> We use the solution of a multilingual thesaurus as synonym dictionary.
> http://en.wikipedia.org/wiki/Eurovoc
> It holds translations of 22 official languages of the European Union.
> 
> So a search for "europäischer währungsfonds" gives also results with
> "european monetary fund", "fonds monétaire européen", ...
> 
> Regards
> Bernd
> 
> 
> 
> Am 10.10.2012 04:54, schrieb onlinespend...@gmail.com:
>> Hi,
>> 
>> English is going to be the predominant language used in my documents, but
>> there may be a spattering of words in other languages (such as Spanish or
>> French). What I'd like is to initiate a query for something like "bathroom"
>> for example and for Solr to return documents that not only contain
>> "bathroom" but also "baño" (Spanish). And the same goes when searching for "
>> baño". I'd like Solr to return documents that contain either "bathroom" or "
>> baño".
>> 
>> One possibility is to pre-translate all indexed documents to a common
>> language, in this case English. And if someone were to search using a
>> foreign word, I'd need to translate that to English before issuing a query
>> to Solr. This appears to be problematic, since I'd have to know whether the
>> indexed words and the query are even in a foreign language, which is not
>> trivial.
>> 
>> Another possibility is to pre-build a list of foreign word synonyms. So baño
>> would be listed as a synonym for bathroom. But I'd need to include other
>> languages (such as toilette in French) and other words. This requires that
>> I know in advance all possible words I'd need to include foreign language
>> versions of (not to mention needing to know which languages to include).
>> This isn't trivial either.
>> 
>> I'm assuming there's no built-in functionality that supports the foreign
>> language translation on the fly, so what do people propose?
>> 
>> Thanks!
>> 
> 
> -- 
> *
> Bernd FehlingUniversitätsbibliothek Bielefeld
> Dipl.-Inform. (FH)LibTec - Bibliothekstechnologie
> Universitätsstr. 25 und Wissensmanagement
> 33615 Bielefeld
> Tel. +49 521 106-4060   bernd.fehling(at)uni-bielefeld.de
> 
> BASE - Bielefeld Academic Search Engine - www.base-search.net
> *



RE: Faceted search question (Tokenizing)

2012-10-10 Thread Petersen, Robert
What do you want the results to be, persons?  And the facets should be 
interests or subinterests?  Why are there two layers of interests anyway?  Can 
there my many subinterests under one interest?  Is one of those two a name of 
the interest which would look nice as a facet?

Anyway, have you read these pages yet?  These should get you started in the 
right direction.
http://wiki.apache.org/solr/SolrFacetingOverview
http://wiki.apache.org/solr/HierarchicalFaceting

Hope that helps,
Robi

-Original Message-
From: Grapes [mailto:mkloub...@gmail.com] 
Sent: Wednesday, October 10, 2012 8:52 AM
To: solr-user@lucene.apache.org
Subject: Faceted search question (Tokenizing)

Hey There, 

We have the following data structure: 


- Person
-- Interest 1
--- Subinterest 1
--- Subinterest 1 Description
--- Subinterest 1 ID
-- Interest 2
--- Subinterest 2
--- Subinterest 2 Description
--- Subinterest 2 ID
. 
-- Interest 99
--- Subinterest 99
--- Subinterest 99 Description
--- Subinterest 99 ID 

Interest, Subinterest, Subinterest Description and Subinterest IDs are all 
multiavlued fields. A person can have any number of subinterests,descriptions 
and IDS. 

How could we faced/search this based on this data structure? Right now we 
tokenized everything in a seperate multivalued column in the following fasion; 


|Interest='Interest 1',Subinterest='Subinterest 1',Subinterest='Another
Subinterest 1',Description='Interest 1 Description',ID='Interest 1 ID'| 
|Interest='Interest 2',Description='Interest 2 Description',ID='Interest 
|2
ID'| 

I have a feeling like this is a wrong approach to this problem.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-search-question-Tokenizing-tp4012948.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Why is SolrDispatchFilter using 90% of the Time?

2012-10-10 Thread Yonik Seeley
> When I look at the distribution of the Response-time I notice
> 'SolrDispatchFilter.doFilter()' is taking up 90% of the time.

That's pretty much the top-level entry point to Solr (from the servlet
container), so it's normal.

-Yonik
http://lucidworks.com


Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
Thanks for the heads up. I just tested this and you are right. I am making
a call to "addBeans" and it succeeds without any issue even when the server
is down. That sucks.

A big part of this process is reliant on knowing exactly what has made it
into the index and what has not, so this a difficult problem to solve when
you can't catch exceptions. I was thinking I could execute a ping request
first to determine if the Solr server is still operational, but that
doesn't help if the updateRequestHandler fails.

On Wed, Oct 10, 2012 at 1:48 PM, Shawn Heisey  wrote:

> On 10/9/2012 3:02 PM, Briggs Thompson wrote:
>
>> *Otis* - jstack is a great suggestion, thanks! The problem didn't happen
>>
>> this morning but next time it does I will certainly get the dump to see
>> exactly where the app is swimming around. I haven't used
>> StreamingUpdateSolrServer
>> but I will see if that makes a difference. Are there any major drawbacks
>> of
>> going this route?
>>
>
> One caveat -- when using the Streaming/Concurrent object, your application
> will not be notified when there is a problem indexing. I've been told there
> is a way to override a method in the object to allow trapping errors, but I
> have not seen sample code and haven't figured out how to do it.  I've filed
> an issue and a patch to fix this.  It's received some comments, but so far
> nobody has decided to commit it.
>
> https://issues.apache.org/**jira/browse/SOLR-3284
>
> Thanks,
> Shawn
>
>


RE: anyone have any clues about this exception

2012-10-10 Thread Petersen, Robert
You could be right.  Going back in the logs, I noticed it used to happen less 
frequently and always towards the end of an optimize operation.  It is probably 
my indexer timing out waiting for updates to occur during optimizes.  The 
errors grew recently due to my upping the indexer threadcount to 22 threads, so 
there's a lot more timeouts occurring now.  Also our index has grown to double 
the old size so the optimize operation has started taking a lot longer, also 
contributing to what I'm seeing.   I have just changed my optimize frequency 
from three times a day to one time a day after reading the following:

Here they are talking about completely deprecating the optimize command in the 
next version of solr…
https://issues.apache.org/jira/browse/SOLR-3141c


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Wednesday, October 10, 2012 11:10 AM
To: solr-user@lucene.apache.org
Subject: Re: anyone have any clues about this exception

Something timed out, the other end closed the connection. This end tried to 
write to closed pipe and died, something tried to catch that exception and 
write its own and died even worse? Just making it up really, but sounds good 
(plus a 3-year Java tech-support hunch).

If it happens often enough, see if you can run WireShark on that machine's 
network interface and catch the whole network conversation in action. Often, 
there is enough clues there by looking at tcp packets and/or stuff transmitted. 
WireShark is a power-tool, so takes a little while the first time, but the 
learning will pay for itself over and over again.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert  wrote:
> Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
> instance contains lots of these exceptions but solr itself seems to be doing 
> fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
> servers btw, just the master where we do our indexing only.
>
>
>
> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve 
> invoke
> SEVERE: Servlet.service() for servlet default threw exception 
> java.lang.IllegalStateException
> at 
> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> at java.lang.Thread.run(Unknown Source)



Re: PointType doc reindex issue

2012-10-10 Thread Chris Hostetter
: I have a weird problem, Whenever I read the doc from solr and
: then index the same doc that already exists in the index (aka
: reindexing) I get the following error. Can somebody tell me what I am
: doing wrong. I use solr 3.6 and the definition of the field is given
: below

When you use the LatLonType field type you get "synthetic" *_coordinate" 
fields automicaly constructed under the covers from each of your fields 
that use a "latlon" fieldType.  because you have configured the 
"*_coordinate" fields to be "stored" they are included in the response 
when you request the doc.

this means that unless you explicitly remove those synthetically 
constructed values before "reindexing", they will still be there in 
addition to the new (posisbly redundent) synthetic values created while 
indexing.

This is why the "*_coordinate" dynamicField in the solr example schema.xml 
is marked 'stored="false"' so that this field doesn't come back in the 
response -- it's not ment for end users.


: 
: 
: 
: Exception in thread "main"
: org.apache.solr.client.solrj.SolrServerException: Server at
: http://testsolr:8080/solr/mycore returned non ok status:400,
: message:ERROR: [doc=1182684] multiple values encountered for non
: multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
:   at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
:   at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
:   at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
: 
: 
: The data in the index looks as follows
: 
: 39.017608,-77.375239
: 
:  39.017608
:  39.017608
: 
: 
: -77.375239
: -77.375239
: 
: 
: Thanks
: 
: Ravi Kiran Bhaskar
: 

-Hoss


Re: PriorityQueue:initialize consistently showing up as hot spot while profiling

2012-10-10 Thread Aaron Daubman
Hi Mikhail,

On Fri, Oct 5, 2012 at 7:15 AM, Mikhail Khludnev
 wrote:
> okay. huge rows value is no.1 way to kill Lucene. It's not possible,
> absolutely. You need to rethink logic of your component. Check Solr's
> FieldCollapsing code, IIRC it makes second search to achieve similar goal.
> Also check PostFilter and DelegatingCollector classes, their approach can
> also be handy for your task.

This sounds like it could be a much saner way to handle the task,
however, I'm not sure what I should be looking at for the
'FieldCollapsing code' you mention - can you point me to a class?

Also, is there anything more you can say about PostFilter and
DelegatingCollector classes - I reviewed them but it was not obvious
to me what they were doing that would allow me to reduce the large
rows param we use to ensure all relevant docs are included in the
grouping and limiting occurs at the group level, rather than
pre-grouping...

Thanks again,
  Aaron


Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Shawn Heisey

On 10/9/2012 3:02 PM, Briggs Thompson wrote:

*Otis* - jstack is a great suggestion, thanks! The problem didn't happen
this morning but next time it does I will certainly get the dump to see
exactly where the app is swimming around. I haven't used
StreamingUpdateSolrServer
but I will see if that makes a difference. Are there any major drawbacks of
going this route?


One caveat -- when using the Streaming/Concurrent object, your 
application will not be notified when there is a problem indexing. I've 
been told there is a way to override a method in the object to allow 
trapping errors, but I have not seen sample code and haven't figured out 
how to do it.  I've filed an issue and a patch to fix this.  It's 
received some comments, but so far nobody has decided to commit it.


https://issues.apache.org/jira/browse/SOLR-3284

Thanks,
Shawn



Re: Why is SolrDispatchFilter using 90% of the Time?

2012-10-10 Thread Aaron Daubman
Hi Stijn,

I have occasionally been seeing similar behavior when profiling one of
our Solr  3.6.1 servers using the similar AppDynamics product. Did you
ever hunt down what was causing this for you or get more info? (I
haven't been able to rule out truncated or filtered call-graphs that
don't show some very-large number of sub-threshold calls... but that
seems implausible)

Always looking to learn more,
 Aaron

---original message---
Hi,

I'm working with a recent NightlyBuild of Solr and I'm doing some serious
ZooKeeper testing.
I've NewRelic monitoring enabled on my solr machines.

When I look at the distribution of the Response-time I notice
'SolrDispatchFilter.doFilter()' is taking up 90% of the time.
The other 10% is used by SolrSeacher and the QueryComponent.

+ Can anyone explain me why SolrDispatchFilter is consuming so much time?
++ Can I do something to lower this number?
 ( After all SolrDispatchFilter must Dispatch each time to the standard
searcher. )

Stijn Vanhoorelbeke


Re: anyone have any clues about this exception

2012-10-10 Thread Alexandre Rafalovitch
Something timed out, the other end closed the connection. This end
tried to write to closed pipe and died, something tried to catch that
exception and write its own and died even worse? Just making it up
really, but sounds good (plus a 3-year Java tech-support hunch).

If it happens often enough, see if you can run WireShark on that
machine's network interface and catch the whole network conversation
in action. Often, there is enough clues there by looking at tcp
packets and/or stuff transmitted. WireShark is a power-tool, so takes
a little while the first time, but the learning will pay for itself
over and over again.

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert  wrote:
> Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
> instance contains lots of these exceptions but solr itself seems to be doing 
> fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
> servers btw, just the master where we do our indexing only.
>
>
>
> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve invoke
> SEVERE: Servlet.service() for servlet default threw exception
> java.lang.IllegalStateException
> at 
> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
> at 
> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
> at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
> at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
> at java.lang.Thread.run(Unknown Source)


anyone have any clues about this exception

2012-10-10 Thread Petersen, Robert
Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
instance contains lots of these exceptions but solr itself seems to be doing 
fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
servers btw, just the master where we do our indexing only.



Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet default threw exception
java.lang.IllegalStateException
at 
org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
at 
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
at java.lang.Thread.run(Unknown Source)


Creating a new Collection through API

2012-10-10 Thread Markus Mirsberger

Hi,

what is the best way to create a new Collection through the API so I get 
an own config folder with schema.xml and solrconfig.xml inside the 
created Core?


When I just create a Collection, only the data folder will be created 
but the config folder with schema.xml and solrconfig.xml will be used 
from another Collection. Even when I add the config folder later, I have 
to reload the core on every server to get the changes :(


Do I have to create a default Core somewhere, copy it inside my solr 
folder, rename it and then add this as a Collection or is there a better 
way to do this?



Thanks,
Markus




Re: Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Ahmet Arslan

> Do you have a "_version_" field in
> your schema. I believe SOLR 4.0
> Beta requires that field.

Probably he is hitting this https://issues.apache.org/jira/browse/SOLR-3432


Re: PointType doc reindex issue

2012-10-10 Thread Ravi Solr
I am using DirectXmlRequest to index XML. This is just a test case as
my client would be sending me a SOLR compliant XML. so I was trying to
simulate it by reading a doc from an exiting core and reindexing it.

HttpSolrServer server = new
HttpSolrServer("http://testsolr:8080/solr/mycore";);
QueryResponse resp = server.query(new 
SolrQuery("contentid:(1184911
OR 1182684)"));
SolrDocumentList list = resp.getResults();
if(list != null && !list.isEmpty()) {
for(SolrDocument doc : list) {
SolrInputDocument iDoc = 
ClientUtils.toSolrInputDocument(doc);  
String contentid = (String) 
iDoc.getFieldValue("egcontentid");
String name = (String) 
iDoc.getFieldValue("name");
iDoc.setField("name", DigestUtils.md5Hex(name));

String xml = ClientUtils.toXML(iDoc);   

DirectXmlRequest up = new 
DirectXmlRequest("/update",
""+xml+"");
server.request(up);
server.commit();

System.out.println("Updated name in contentid - 
" + contentid);

}
}

Ravi Kiran

On Wed, Oct 10, 2012 at 1:02 PM, Gopal Patwa  wrote:
> Instead addfield method use setfield
> On Oct 10, 2012 9:54 AM, "Ravi Solr"  wrote:
>
>> Gopal I did in fact test the same and it worked when I delete ted the
>> geolocation_0_coordinate and geolocation_1_coordinate. But that seems
>> weird, so I was thinking if there is something else I need to do to
>> avoid doing this awkward workaround.
>>
>> Ravi Kiran Bhaskar
>>
>> On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa 
>> wrote:
>> > You need remove field after read solr doc,  when u add new field it will
>> > add to list,  so when u try to commit the update field,  it will be multi
>> > value and in your schema it is single value
>> > On Oct 10, 2012 9:26 AM, "Ravi Solr"  wrote:
>> >
>> >> Hello,
>> >> I have a weird problem, Whenever I read the doc from solr and
>> >> then index the same doc that already exists in the index (aka
>> >> reindexing) I get the following error. Can somebody tell me what I am
>> >> doing wrong. I use solr 3.6 and the definition of the field is given
>> >> below
>> >>
>> >> > >> subFieldSuffix="_coordinate"/>
>> >> > >> stored="true"/>
>> >>
>> >> Exception in thread "main"
>> >> org.apache.solr.client.solrj.SolrServerException: Server at
>> >> http://testsolr:8080/solr/mycore returned non ok status:400,
>> >> message:ERROR: [doc=1182684] multiple values encountered for non
>> >> multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
>> >> at
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
>> >> at
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
>> >> at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
>> >>
>> >>
>> >> The data in the index looks as follows
>> >>
>> >> 39.017608,-77.375239
>> >> 
>> >>  39.017608
>> >>  39.017608
>> >> 
>> >> 
>> >> -77.375239
>> >> -77.375239
>> >> 
>> >>
>> >> Thanks
>> >>
>> >> Ravi Kiran Bhaskar
>> >>
>>


Re: PointType doc reindex issue

2012-10-10 Thread Gopal Patwa
Instead addfield method use setfield
On Oct 10, 2012 9:54 AM, "Ravi Solr"  wrote:

> Gopal I did in fact test the same and it worked when I delete ted the
> geolocation_0_coordinate and geolocation_1_coordinate. But that seems
> weird, so I was thinking if there is something else I need to do to
> avoid doing this awkward workaround.
>
> Ravi Kiran Bhaskar
>
> On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa 
> wrote:
> > You need remove field after read solr doc,  when u add new field it will
> > add to list,  so when u try to commit the update field,  it will be multi
> > value and in your schema it is single value
> > On Oct 10, 2012 9:26 AM, "Ravi Solr"  wrote:
> >
> >> Hello,
> >> I have a weird problem, Whenever I read the doc from solr and
> >> then index the same doc that already exists in the index (aka
> >> reindexing) I get the following error. Can somebody tell me what I am
> >> doing wrong. I use solr 3.6 and the definition of the field is given
> >> below
> >>
> >>  >> subFieldSuffix="_coordinate"/>
> >>  >> stored="true"/>
> >>
> >> Exception in thread "main"
> >> org.apache.solr.client.solrj.SolrServerException: Server at
> >> http://testsolr:8080/solr/mycore returned non ok status:400,
> >> message:ERROR: [doc=1182684] multiple values encountered for non
> >> multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
> >> at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
> >> at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
> >> at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
> >>
> >>
> >> The data in the index looks as follows
> >>
> >> 39.017608,-77.375239
> >> 
> >>  39.017608
> >>  39.017608
> >> 
> >> 
> >> -77.375239
> >> -77.375239
> >> 
> >>
> >> Thanks
> >>
> >> Ravi Kiran Bhaskar
> >>
>


Re: PointType doc reindex issue

2012-10-10 Thread Ravi Solr
Gopal I did in fact test the same and it worked when I delete ted the
geolocation_0_coordinate and geolocation_1_coordinate. But that seems
weird, so I was thinking if there is something else I need to do to
avoid doing this awkward workaround.

Ravi Kiran Bhaskar

On Wed, Oct 10, 2012 at 12:36 PM, Gopal Patwa  wrote:
> You need remove field after read solr doc,  when u add new field it will
> add to list,  so when u try to commit the update field,  it will be multi
> value and in your schema it is single value
> On Oct 10, 2012 9:26 AM, "Ravi Solr"  wrote:
>
>> Hello,
>> I have a weird problem, Whenever I read the doc from solr and
>> then index the same doc that already exists in the index (aka
>> reindexing) I get the following error. Can somebody tell me what I am
>> doing wrong. I use solr 3.6 and the definition of the field is given
>> below
>>
>> > subFieldSuffix="_coordinate"/>
>> > stored="true"/>
>>
>> Exception in thread "main"
>> org.apache.solr.client.solrj.SolrServerException: Server at
>> http://testsolr:8080/solr/mycore returned non ok status:400,
>> message:ERROR: [doc=1182684] multiple values encountered for non
>> multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
>> at
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
>> at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
>>
>>
>> The data in the index looks as follows
>>
>> 39.017608,-77.375239
>> 
>>  39.017608
>>  39.017608
>> 
>> 
>> -77.375239
>> -77.375239
>> 
>>
>> Thanks
>>
>> Ravi Kiran Bhaskar
>>


Filter results based on custom scoring and _val_

2012-10-10 Thread jimtronic
I'm using solr function queries to generate my own custom score. I achieve
this using something along these lines:

q=_val_:"my_custom_function()"
This populates the score field as expected, but it also includes documents
that score 0. I need a way to filter the results so that scores below zero
are not included.

I realize that I'm using score in a non-standard way and that normally the
score that lucene/solr produce is not absolute. However, producing my own
score works really well for my needs.

I've tried using {!frange l=0} but this causes the score for all documents
to be "1.0".

I've found that I can do the following:

q=*:*&fl=foo:my_custom_function()&fq={!frange l=1}my_custom_function() 

This puts my custom score into foo, but it requires me to list all the logic
twice. Sometimes my logic is very long.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Filter-results-based-on-custom-scoring-and-val-tp4012968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Memory Cost of group.cache.percent parameter

2012-10-10 Thread Mike Schultz
Does anyone have a clear understanding of how group.caching achieves it's
performance improvements memory wise?  Percent means percent of maxDoc so
it's a function of that, but is it a function of that *per* item in the
cache (like filterCache) or altogether?  The speed improvement looks pretty
dramatic for our macDoc=25M index but it would be helpful to understand what
the costs are.

Mike



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Cost-of-group-cache-percent-parameter-tp4012967.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: PointType doc reindex issue

2012-10-10 Thread Gopal Patwa
You need remove field after read solr doc,  when u add new field it will
add to list,  so when u try to commit the update field,  it will be multi
value and in your schema it is single value
On Oct 10, 2012 9:26 AM, "Ravi Solr"  wrote:

> Hello,
> I have a weird problem, Whenever I read the doc from solr and
> then index the same doc that already exists in the index (aka
> reindexing) I get the following error. Can somebody tell me what I am
> doing wrong. I use solr 3.6 and the definition of the field is given
> below
>
>  subFieldSuffix="_coordinate"/>
>  stored="true"/>
>
> Exception in thread "main"
> org.apache.solr.client.solrj.SolrServerException: Server at
> http://testsolr:8080/solr/mycore returned non ok status:400,
> message:ERROR: [doc=1182684] multiple values encountered for non
> multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
> at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
> at com.wpost.search.indexing.MyTest.main(MyTest.java:31)
>
>
> The data in the index looks as follows
>
> 39.017608,-77.375239
> 
>  39.017608
>  39.017608
> 
> 
> -77.375239
> -77.375239
> 
>
> Thanks
>
> Ravi Kiran Bhaskar
>


Re: Get report of keywords searched.

2012-10-10 Thread Rajani Maski
Hi Mikhail, Thank you for the reply. I will try with it .

Thanks
Rajani


On Sun, Oct 7, 2012 at 5:10 PM, Mikhail Khludnev  wrote:

> Rajani,
>
> IIRC solrmeter can grab search phrases from log. There is a special command
> for doing it there. Right - Tool/Extract Queries.
>
> Regards
>
> On Sun, Oct 7, 2012 at 10:02 AM, Rajani Maski 
> wrote:
>
> > Hi Davide,  Yes right. This can be done.
> >
> >  Just one question, I am not sure if I had to create new thread for
> > this question, Just wanted to know whether solrmeter or jmeter can help
> me
> > get the keywords searched list? I am novice to solrmeter, just know that
> > its used for stress test. Interested to know if I can use same tools for
> > this case of getting keywords searhed list.
> >
> >
> > Thanks
> > Rajani
> >
> > On Fri, Oct 5, 2012 at 7:23 PM, Davide Lorenzo Marino <
> > davide.mar...@gmail.com> wrote:
> >
> > > If you think this could be a problem for your performances you can try
> > two
> > > different solutions:
> > >
> > > 1 - Make the call to update the db in a different thread
> > > 2 - Make an asynchronous http call to a web application that update the
> > db
> > > (in this case the web app can be resident in a different machine, so
> the
> > > ram, cpu time and disk operations don't slow your solr engine)
> > >
> > >
> > > 2012/10/5 Rajani Maski 
> > >
> > > > Hi,
> > > >
> > > >  Thank you for the reply Davide.
> > > >
> > > >Writing to db you mean to insert into db the search queries? I was
> > > > thinking that this might effect search performance?
> > > > Yes you are right, Getting stats for particular key word is tough. It
> > > would
> > > > suffice if I can get q param and fq param values( when we search
> using
> > > > standard request handler).  Any open source solr log analysis tools?
> > Can
> > > we
> > > > achieve this with solrmeter? Has anyone tried with this?
> > > >
> > > > Thank You
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Oct 4, 2012 at 2:07 PM, Davide Lorenzo Marino <
> > > > davide.mar...@gmail.com> wrote:
> > > >
> > > > > If you need to analyze the search queries is not very difficult,
> just
> > > > > create a search plugin and put them in a db.
> > > > > If you need to search the single keywords it is more difficult and
> > you
> > > > need
> > > > > before starting to take some decision. In particular take the
> > following
> > > > > queries and try to answer how you would like to treat them for the
> > > > > keywards:
> > > > >
> > > > > 1) apple OR orange
> > > > > 2) apple AND orange
> > > > > 3) title:apple AND subject:orange
> > > > > 4) apple -orange
> > > > > 5) apple OR (orange AND banana)
> > > > > 6) title:apple OR subject:orange
> > > > >
> > > > > Ciao
> > > > >
> > > > > Davide Marino
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2012/10/3 Rajani Maski 
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > >I am using solrJ. When there is search query hit, I am logging
> > the
> > > > url
> > > > > > in a location and also it is getting logged into tomcat catalina
> > > logs.
> > > > > >  Now I wanted to implement a functionality of periodically(per
> > week)
> > > > > > analyzing search logs of solr and find out the keywords searched.
> > Is
> > > > > there
> > > > > > a way to do it using any of the existing functionality of solr?
> If
> > > not,
> > > > > > Anybody has tried this implementation with any open source tools?
> > > > > > Suggestions welcome. . Awaiting reply
> > > > > >
> > > > > >
> > > > > > Thank you.
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  
>


Re: Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Ravi Solr
Do you have a "_version_" field in your schema. I believe SOLR 4.0
Beta requires that field.

Ravi Kiran Bhaskar

On Wed, Oct 10, 2012 at 11:45 AM, Andrew Groh  wrote:
> I cannot seem to get delete by query working in my simple setup in Solr 4.0 
> beta.
>
> I have a single collection and I want to delete old documents from it.  There 
> is a single solr node in the config (no replication, not distributed). This 
> is something that I previously did in Solr 3.x
>
> My collection is called dine, so I do:
>
> curl  "http://localhost:8080/solr/dine/update"; -s -H 'Content-type:text/xml; 
> charset=utf-8' -d "timestamp_dt:[2012-09-01T00:00:00Z TO 
> 2012-09-27T00:00:00Z]"
>
> and then a commit.
>
> The problem is that the documents are not delete.  When I run the same query 
> in the admin page, it still returns documents.
>
> I walked through the code and find the code in 
> DistributedUpdateProcessor::doDeleteByQuery to be suspicious.
>
> Specifically, vinfo is not null, but I have no version field, so 
> versionsStored is false.
>
> So it gets to line 786, which looks like:
> if (versionsStored) {
>
> That then skips to line 813 (the finally clause) skipping all calls to 
> doLocalDelete
>
> Now, I do confess I don't understand exactly how this code should work.  
> However, in the add code, the check for versionsStored does not skip the call 
> to doLocalAdd.
>
> Any suggestions would be welcome.
>
> Andrew
>
>
>


PointType doc reindex issue

2012-10-10 Thread Ravi Solr
Hello,
I have a weird problem, Whenever I read the doc from solr and
then index the same doc that already exists in the index (aka
reindexing) I get the following error. Can somebody tell me what I am
doing wrong. I use solr 3.6 and the definition of the field is given
below




Exception in thread "main"
org.apache.solr.client.solrj.SolrServerException: Server at
http://testsolr:8080/solr/mycore returned non ok status:400,
message:ERROR: [doc=1182684] multiple values encountered for non
multiValued field geolocation_0_coordinate: [39.017608, 39.017608]
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:328)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:211)
at com.wpost.search.indexing.MyTest.main(MyTest.java:31)


The data in the index looks as follows

39.017608,-77.375239

 39.017608
 39.017608


-77.375239
-77.375239


Thanks

Ravi Kiran Bhaskar


RE: Unique terms without faceting

2012-10-10 Thread Phil Hoy
Hi,

I don't think you can use that component whilst taking into account any fq or q 
parameters.

Phil

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: 10 October 2012 16:51
To: solr-user@lucene.apache.org
Subject: Re: Unique terms without faceting

The Solr TermsComponent:

http://wiki.apache.org/solr/TermsComponent

-- Jack Krupansky

-Original Message-
From: Phil Hoy
Sent: Wednesday, October 10, 2012 11:45 AM
To: solr-user@lucene.apache.org
Subject: Unique terms without faceting

Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms for 
a field?

Phil


__
This email has been scanned by the brightsolid Email Security System. Powered 
by MessageLabs 
__


Faceted search question (Tokenizing)

2012-10-10 Thread Grapes
Hey There, 

We have the following data structure: 


- Person 
-- Interest 1 
--- Subinterest 1 
--- Subinterest 1 Description 
--- Subinterest 1 ID 
-- Interest 2 
--- Subinterest 2 
--- Subinterest 2 Description 
--- Subinterest 2 ID 
. 
-- Interest 99 
--- Subinterest 99 
--- Subinterest 99 Description 
--- Subinterest 99 ID 

Interest, Subinterest, Subinterest Description and Subinterest IDs are all
multiavlued fields. A person can have any number of
subinterests,descriptions and IDS. 

How could we faced/search this based on this data structure? Right now we
tokenized everything in a seperate multivalued column in the following
fasion; 


|Interest='Interest 1',Subinterest='Subinterest 1',Subinterest='Another
Subinterest 1',Description='Interest 1 Description',ID='Interest 1 ID'| 
|Interest='Interest 2',Description='Interest 2 Description',ID='Interest 2
ID'| 

I have a feeling like this is a wrong approach to this problem.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-search-question-Tokenizing-tp4012948.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with delete by query in Solr 4.0 beta

2012-10-10 Thread Andrew Groh
I cannot seem to get delete by query working in my simple setup in Solr 4.0 
beta.

I have a single collection and I want to delete old documents from it.  There 
is a single solr node in the config (no replication, not distributed). This is 
something that I previously did in Solr 3.x

My collection is called dine, so I do:

curl  "http://localhost:8080/solr/dine/update"; -s -H 'Content-type:text/xml; 
charset=utf-8' -d "timestamp_dt:[2012-09-01T00:00:00Z TO 
2012-09-27T00:00:00Z]"

and then a commit.

The problem is that the documents are not delete.  When I run the same query in 
the admin page, it still returns documents.

I walked through the code and find the code in 
DistributedUpdateProcessor::doDeleteByQuery to be suspicious.

Specifically, vinfo is not null, but I have no version field, so versionsStored 
is false.

So it gets to line 786, which looks like:
if (versionsStored) {

That then skips to line 813 (the finally clause) skipping all calls to 
doLocalDelete

Now, I do confess I don't understand exactly how this code should work.  
However, in the add code, the check for versionsStored does not skip the call 
to doLocalAdd.

Any suggestions would be welcome.

Andrew





Re: Faceted search question (Tokenizing)

2012-10-10 Thread Grapes
Here is another simpler example of what I am trying to achieve:

Multi-Valued Field 1:
Data 1
Data 2
Data 3
Data 4

Multi-Valued Field 2:
Data 11
Data 12
Data 13
Data 14

Multi-Valued Field 3:
Data 21
Data 22
Data 23
Data 24


How can I specify that Data 1,Data 11 and data 21 are all related? And if I
facet Data 1 + Data 11, I only want to see Data 21.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Faceted-search-question-Tokenizing-tp4012948p4012956.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique terms without faceting

2012-10-10 Thread Jack Krupansky

The Solr TermsComponent:

http://wiki.apache.org/solr/TermsComponent

-- Jack Krupansky

-Original Message- 
From: Phil Hoy

Sent: Wednesday, October 10, 2012 11:45 AM
To: solr-user@lucene.apache.org
Subject: Unique terms without faceting

Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms 
for a field?


Phil



Unique terms without faceting

2012-10-10 Thread Phil Hoy
Hi,

I know that you can use a facet query to get the unique terms for a field 
taking account of any q or fq parameters but for our use case the counts are 
not needed. So is there a more efficient way of finding  just unique terms for 
a field?

Phil



Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
They are both SolrJ.

What is happening is I have a "batch" indexer application that does a full
re-index once per day. I also have an "incremental" indexer that takes
items off a queue when they are updated.

The problem only happens when both are running at the same time - they also
run from the same machine. I am going to dig into this today and see what I
find - I didn't get around to it yesterday.

Question: I don't seem to see a StreamingUpdateSolrServer object on the 4.0
beta. I did see the ConcurrentUpdateSolrServer - this seems like a similar
choice. Is this correct?

On Wed, Oct 10, 2012 at 9:43 AM, Sami Siren  wrote:

> On Wed, Oct 10, 2012 at 5:36 PM, Briggs Thompson
>  wrote:
> > There are other updates that happen on the server that do not fail, so
> the
> > answer to your question is yes.
>
> The other updates are using solrj or something else?
>
> It would be helpful if you could prepare a simple java program that
> uses solrj to demonstrate the problem. Based on the available
> information it is really difficult try to guess what's happening.
>
> --
>  Sami Siren
>


Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Sami Siren
On Wed, Oct 10, 2012 at 5:36 PM, Briggs Thompson
 wrote:
> There are other updates that happen on the server that do not fail, so the
> answer to your question is yes.

The other updates are using solrj or something else?

It would be helpful if you could prepare a simple java program that
uses solrj to demonstrate the problem. Based on the available
information it is really difficult try to guess what's happening.

--
 Sami Siren


Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Briggs Thompson
There are other updates that happen on the server that do not fail, so the
answer to your question is yes.

On Wed, Oct 10, 2012 at 8:12 AM, Sami Siren  wrote:

> On Wed, Oct 10, 2012 at 12:02 AM, Briggs Thompson
>  wrote:
> > *Sami*
> > The client IS
> > instantiated only once and not for every request. I was curious if this
> was
> > part of the problem. Do I need to re-instantiate the object for each
> > request made?
>
> No, it is expensive if you instantiate the client every time.
>
> When the client seems to be hanging, can you still access the Solr
> instance normally and execute updates/searches from other clients?
>
> --
>  Sami Siren
>


Re: Synonym Filter: Removing all original tokens, retain matched synonyms

2012-10-10 Thread Daniel Rosher
Ah ha .. good thinking ... thanks!

Dan

On Wed, Oct 10, 2012 at 2:39 PM, Ahmet Arslan  wrote:

>
> > Token_Input:
> > the fox jumped over the lazy dog
> >
> > Synonym_Map:
> > fox => vulpes
> > dog => canine
> >
> > Token_Output:
> > vulpes canine
> >
> > So remove all tokens, but retain those matched against the
> > synonym map
>
> May be you can make use of
> http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/KeepWordFilterFactory.html
> .
>
> You need to copy entries (vulpes, canine) from synonym.txt into
> keepwords.txt file.
>


Re: Wild card searching - well sort of

2012-10-10 Thread Kissue Kissue
It is really not fixed. It could also be *-*-BAAN or BAAN-CAN20-*. In each
i just want only the fixed character(s) to match then the * can match any
character.


On Wed, Oct 10, 2012 at 2:05 PM, Toke Eskildsen wrote:

> On Wed, 2012-10-10 at 14:15 +0200, Kissue Kissue wrote:
> > I have added the string: *-BAAN-* to the index to a field called pattern
> > which is a string type. Now i want to be able to search for A100-BAAN-C20
> > or ZA20-BAAN-300 and have Solr return *-BAAN-*.
>
> That sounds a lot like the problem presented in the thread
> "Indexing wildcard patterns":
> http://web.archiveorange.com/archive/v/AAfXfcuIJY9BQJL3mjty
>
> The short answer is no, Solr does not support this in the general form.
> But maybe you can make it work anyway. In your example, the two queries
> A100-BAAN-C20 and ZA20-BAAN-300 share the form
> [4 random characters]-[4 significant characters]-[3 random characters]
> so a little bit of pre-processing would rewrite that to
> *-[4 significant characters]-*
> which would match *-BAAN-*
>
> If you describe the patterns and common elements to your indexed terms
> and to your queries, we might come up with something.
>
>


Re: Synonym Filter: Removing all original tokens, retain matched synonyms

2012-10-10 Thread Ahmet Arslan

> Token_Input:
> the fox jumped over the lazy dog
> 
> Synonym_Map:
> fox => vulpes
> dog => canine
> 
> Token_Output:
> vulpes canine
> 
> So remove all tokens, but retain those matched against the
> synonym map

May be you can make use of  
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/analysis/KeepWordFilterFactory.html.

You need to copy entries (vulpes, canine) from synonym.txt into keepwords.txt 
file.


Re: SolrJ 4.0 Beta maxConnectionsPerHost

2012-10-10 Thread Sami Siren
On Wed, Oct 10, 2012 at 12:02 AM, Briggs Thompson
 wrote:
> *Sami*
> The client IS
> instantiated only once and not for every request. I was curious if this was
> part of the problem. Do I need to re-instantiate the object for each
> request made?

No, it is expensive if you instantiate the client every time.

When the client seems to be hanging, can you still access the Solr
instance normally and execute updates/searches from other clients?

--
 Sami Siren


Re: Synonym Filter: Removing all original tokens, retain matched synonyms

2012-10-10 Thread Jack Krupansky
The synonym filter does set the "type" attribute to TYPE_SYNONYM for 
synonyms, so you could write your own token filter that "keeps" only tokens 
with that type.


Try the Solr Admin "analysis" page to see how various terms are analyzed by 
the synonym filter. It will show TYPE_SYNONYM.


-- Jack Krupansky

-Original Message- 
From: Daniel Rosher

Sent: Wednesday, October 10, 2012 8:34 AM
To: solr-user@lucene.apache.org
Subject: Synonym Filter: Removing all original tokens, retain matched 
synonyms


Hi,

Is there a way to do this?

Token_Input:
the fox jumped over the lazy dog

Synonym_Map:
fox => vulpes
dog => canine

Token_Output:
vulpes canine

So remove all tokens, but retain those matched against the synonym map

Cheers,
Dan 



Re: Wild card searching - well sort of

2012-10-10 Thread Toke Eskildsen
On Wed, 2012-10-10 at 14:15 +0200, Kissue Kissue wrote:
> I have added the string: *-BAAN-* to the index to a field called pattern
> which is a string type. Now i want to be able to search for A100-BAAN-C20
> or ZA20-BAAN-300 and have Solr return *-BAAN-*.

That sounds a lot like the problem presented in the thread 
"Indexing wildcard patterns":
http://web.archiveorange.com/archive/v/AAfXfcuIJY9BQJL3mjty

The short answer is no, Solr does not support this in the general form.
But maybe you can make it work anyway. In your example, the two queries
A100-BAAN-C20 and ZA20-BAAN-300 share the form 
[4 random characters]-[4 significant characters]-[3 random characters]
so a little bit of pre-processing would rewrite that to 
*-[4 significant characters]-*
which would match *-BAAN-*

If you describe the patterns and common elements to your indexed terms
and to your queries, we might come up with something.



Re: Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread O. Klein
I don't want to tweak the threshold. For majority of cases it works fine.

It's for cases where term has low frequency but is spelled correctly.

If you lower the threshold you would also get incorrect spelled terms as
suggestions.


Robert Muir wrote
> These thresholds are adjustable: read the javadocs and tweak them.
> 
> On Wed, Oct 10, 2012 at 5:59 AM, O. Klein <

> klein@

> > wrote:
>> Is there some way to supplement the DirectSolrSpellChecker with a
>> dictionary?
>>
>> (In some cases terms are not used because of threshold, but should be
>> offered as spellcheck suggestion)
>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Using-additional-dictionary-with-DirectSolrSpellChecker-tp4012873.html
>> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-additional-dictionary-with-DirectSolrSpellChecker-tp4012873p4012908.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: segment number during optimize of index

2012-10-10 Thread jame vaalet
Guys,
thanks for all the inputs, I was continuing my research to know more about
segments in Lucene. Below are my conclusion, please correct me if am wrong.

   1. Segments are independent sub-indexes in seperate file, while indexing
   its better to create new segment as it doesnt have to modify an existing
   file. where as while searching, smaller the segment the better it is since
   you open x (not exactly x but xn a value proportional to x) physical files
   to search if you have got x segments in the index.
   2. since lucene has memory map concept, for each file/segment in index a
   new m-map file is created and mapped to the physcial file in disk. Can
   someone explain or correct this in detail, i am sure there are lot many
   people wondering how m-map works while you merge or optimze index segments.



On 6 October 2012 07:41, Otis Gospodnetic wrote:

> If I were you and not knowing all your details...
>
> I would optimize indices that are static (not being modified) and
> would optimize down to 1 segment.
> I would do it when search traffic is low.
>
> Otis
> --
> Search Analytics - http://sematext.com/search-analytics/index.html
> Performance Monitoring - http://sematext.com/spm/index.html
>
>
> On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet  wrote:
> > Hi Eric,
> > I  am in a major dilemma with my index now. I have got 8 cores each
> around
> > 300 GB in size and half of them are deleted documents in it and above
> that
> > each has got around 100 segments as well. Do i issue a expungeDelete and
> > allow the merge policy to take care of the segments or optimize them into
> > single segment. Search performance is not at par compared to usual solr
> > speed.
> > If i have to optimize what segment number should i choose? my RAM size
> > around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
> > advice !
> >
> > thanks.
> >
> >
> > On 6 October 2012 00:00, Erick Erickson  wrote:
> >
> >> because eventually you'd run out of file handles. Imagine a
> >> long-running server with 100,000 segments. Totally
> >> unmanageable.
> >>
> >> I think shawn was emphasizing that RAM requirements don't
> >> depend on the number of segments. There are other
> >> resources that file consume however.
> >>
> >> Best
> >> Erick
> >>
> >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet 
> wrote:
> >> > hi Shawn,
> >> > thanks for the detailed explanation.
> >> > I have got one doubt, you said it doesn matter how many segments index
> >> have
> >> > but then why does solr has this merge policy which merges segments
> >> > frequently?  why can it leave the segments as it is rather than
> merging
> >> > smaller one's into bigger one?
> >> >
> >> > thanks
> >> > .
> >> >
> >> > On 5 October 2012 05:46, Shawn Heisey  wrote:
> >> >
> >> >> On 10/4/2012 3:22 PM, jame vaalet wrote:
> >> >>
> >> >>> so imagine i have merged the 150 Gb index into single segment, this
> >> would
> >> >>> make a single segment of 150 GB in memory. When new docs are
> indexed it
> >> >>> wouldn't alter this 150 Gb index unless i update or delete the older
> >> docs,
> >> >>> right? will 150 Gb single segment have problem with memory swapping
> at
> >> OS
> >> >>> level?
> >> >>>
> >> >>
> >> >> Supplement to my previous reply:  the real memory mentioned in the
> last
> >> >> paragraph does not include the memory that the OS uses to cache disk
> >> >> access.  If more memory is needed and all the free memory is being
> used
> >> by
> >> >> the disk cache, the OS will throw away part of the disk cache (a
> >> >> near-instantaneous operation that should never involve disk I/O) and
> >> give
> >> >> that memory to the application that requests it.
> >> >>
> >> >> Here's a very good breakdown of how memory gets used with
> MMapDirectory
> >> in
> >> >> Solr.  It's applicable to any program that uses memory mapping, not
> just
> >> >> Solr:
> >> >>
> >> >>
> http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory<
> >> http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory>
> >> >>
> >> >> Thanks,
> >> >> Shawn
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> >
> >> > -JAME
> >>
> >
> >
> >
> > --
> >
> > -JAME
>



-- 

-JAME


Re: Solr - Make Exact Search on Field with Fuzzy Query

2012-10-10 Thread Erick Erickson
There's nothing really built in to Solr to allow this. Are you
absolutely sure you can't just use the copyfield? Have you
actually tried it?

But I don't think you need to store the contents twice. Just
store it once and always highlight on that field whether you
search it or not. Since it's the raw text, you should be fine.
You'll have two versions of the field tokenized of course, but
that should take less space than you might think. You
probably want to store the version with the stemming turned on...

That said, storing twice only uses up some disk space, it
doesn't require additional memory for searching. So unless
you're running out of disk space you can just keep two stored
versions around.

But

If none of that works you might write a custom filter that
emits two tokens for each input token at indexing
time, similar to what synonyms do. The original should
have some special character appended, say $ and the
second should be the results of stemming (note, there
will be two tokens even if there is no stemming done).
So, indexing "running" would index "running$" and "run".
Now, when you need to search for an exact match on
running, you search for running$.

This works for the reverse too. Since the rule is "append
$ to all original tokens" "run" gets indexed as "run$" and "run".
Now, searching for "run" matches as does "run$". But
"run$" does not match the doc that had "running" since the two
tokens emitted in that case are "run" and "running$".

But look at what's happened here. You're indexing two tokens
for every one token in the input. Furthermore, you're adding
a bunch of unique tokens to the index. It's hard to see how this
results in any savings over just using copyField. You have
to index the two tokens since you have to distinguish between
the stemmed and un-stemmed version.

You might be able to do something really exotic with payloads.
This is _really_ out of left field, but it just occurred to me. You'd
have to define a transformation from the original word into the
stemmed word that created a unique value. Something like
no stemming -> 0
removing ing -> 1
removing s-> 2

etc. Actually, this would have to be some kind of function on the
letters removed so that removing "ing" mapped to, say,
the ordinal position of the letter in the alphabet * position * 100. So
"ing" would map to 'i' - 'a' + ('n' - 'a') * 100 + ('g' - 'a') * 1 etc...
(you'd have to take considerable care to get this right for any
code sets that had more than 100 possible code points)...
Now, you've included the information about what the original
word was and could use the payload to fail to match in the
exact-match case. Of course the other issue would be to figure
out the syntax to get the fact that you wanted an exact match
down into your custom scorer.

But as you can see, any scheme is harder than just flipping a switch,
so I'd _really_ verify that you can't just use copyField

Best
Erick

On Wed, Oct 10, 2012 at 7:38 AM, meghana  wrote:
>  0 down vote favorite
>
>
> We are using solr 3.6.
>
> We have field named Description. We want searching feature with stemming and
> also without stemming (exact word/phrase search), with highlighting in both
> .
>
> For that , we had made lot of research and come to conclusion, to use the
> copy field with data type which doesn't have stemming factory. it is working
> fine at now.
>
> (main field has stemming and copy field has not.)
>
> The data for that field is very large and we are having millions of
> documents; and as we want, both searching and highlighting on them; we need
> to keep this copy field stored and indexed both. which will increase index
> size a lot.
>
> we need to eliminate this duplication if possible any how.
>
> From the recent research, we read that combining fuzzy search with dismax
> will fulfill our requirement. (we have tried a bit but not getting success.)
>
> Please let me know , if this is possible, or any other solutions to make
> this happen.
>
> Thanks in Advance
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Wild card searching - well sort of

2012-10-10 Thread Jack Krupansky
1. What is your specific motivation for wanting to do this? (Sounds like yet 
another "XY problem"!)
2. What specific rules are you expecting to use for synthesis of patterns 
from the raw data?


For the latter, do you expect to index hand-coded specific patterns to be 
returned or do you have some sort of "machine learning" method in mind that 
will generate the patterns by examining all of the values?


-- Jack Krupansky

-Original Message- 
From: Kissue Kissue

Sent: Wednesday, October 10, 2012 8:15 AM
To: solr-user@lucene.apache.org
Subject: Wild card searching - well sort of

Hi,

I am wondering if there is a way i can get Solr to do this:

I have added the string: *-BAAN-* to the index to a field called pattern
which is a string type. Now i want to be able to search for A100-BAAN-C20
or ZA20-BAAN-300 and have Solr return *-BAAN-*.

Any ideas how i can accomplish something like this? I am currently using
Solr 3.5 with solrJ.

Thanks. 



Re: Questions about query times

2012-10-10 Thread Yuval Dotan
OK so I solved the question about the query that returns no results and
still takes time - I needed to add the facet.mincount=1 parameter and this
reduced the time to 200-300 ms instead of seconds.

I still could't figure out why a query that returns very few results (like
query number 2) still takes seconds to return even with
the facet.mincount=1 parameter.
I couldn't understand why the facet pivot takes so much time on 299 docs.

Does anyone have any idea?

Example Query:

(2)
q=*:*&fq=(trimTime:[2012-09-04T15:23:48Z TO *])&fq=(Severity:("High"
"Critical"))&fq=(trimTime:[2012-09-04T15:23:48Z TO
*])&fq=(Confidence_Level:("N/A")) OR (Confidence_Level:("Medium-High")) OR
(Confidence_Level:("High"))&f.product.facet.sort=index&f.product.facet.limit=-1&f.Severity.facet.sort=index&f.Severity.facet.limit=-1&f.trimTime.facet.sort=index&f.trimTime.facet.limit=-1&facet=true&f.product.facet.method=enum&facet.pivot=product,Severity,trimTime

NumFound: 299

Times(ms):
Qtime: 2,756 Query: 307 Facet: 2,449

On Thu, Sep 20, 2012 at 5:24 PM, Yuval Dotan  wrote:

> Hi,
>
> We have a system that inserts logs continuously (real-time).
> We have been using the Solr facet pivot feature for querying and have been
> experiencing slow query times and we were hoping to gain some insights with
> your help.
> schema and solrconfig are attached
>
> Here are our questions (data below):
>
>1. Why is facet time so long in (3) and (5) - in cases where there are
>0 or very few results?
>2. We ran two queries that are only differ in the time limit (for the
>second query - time range is very small) - we got the same time for both
>queries although the second one returned very few results - again why is
>that?
>3. Is there a way to improve pivot facet time?
>
> System Data:
>
> Index size: 63 GB
> RAM:4Gb
> CPU: 2 x Xeon E5410 2.33GHz
> Num of Documents: 109,278,476
>
>
> query examples:
>
> -
> (1)
> Query:
> q=*:*&fq=(trimTime:[2012-09-04T14:29:24Z TO
> *])&fq=(trimTime:[2012-09-04T14:29:24Z TO
> *])&f.product.facet.sort=index&f.product.facet.limit=-1&f.Severity.facet.sort=index&f.Severity.facet.limit=-1&f.trimTime.facet.sort=index&f.trimTime.facet.limit=-1&facet=true&f.product.facet.method=enum&facet.pivot=product,Severity,trimTime
>
> NumFound:
> 11,407,889
>
> Times (ms):
> Qtime: 3,239 Query: 353 Facet: 2,885
> -
>
> (2)
> Query:
> q=*:*&fq=(trimTime:[2012-09-04T15:23:48Z TO *])&fq=(Severity:("High"
> "Critical"))&fq=(trimTime:[2012-09-04T15:23:48Z TO
> *])&fq=(Confidence_Level:("N/A")) OR (Confidence_Level:("Medium-High")) OR
> (Confidence_Level:("High"))&f.product.facet.sort=index&f.product.facet.limit=-1&f.Severity.facet.sort=index&f.Severity.facet.limit=-1&f.trimTime.facet.sort=index&f.trimTime.facet.limit=-1&facet=true&f.product.facet.method=enum&facet.pivot=product,Severity,trimTime
>
> NumFound: 299
>
> Times(ms):
> Qtime: 2,756 Query: 307 Facet: 2,449
>
> -
> (3)
> Query:
> q=*:*&fq=(trimTime:[2012-09-11T12:55:00Z TO *])&fq=(Severity:("High"
> "Critical"))&fq=(trimTime:[2012-09-04T15:23:48Z TO
> *])&fq=(Confidence_Level:("N/A")) OR (Confidence_Level:("Medium-High")) OR
> (Confidence_Level:("High"))&f.product.facet.sort=index&f.product.facet.limit=-1&f.Severity.facet.sort=index&f.Severity.facet.limit=-1&f.trimTime.facet.sort=index&f.trimTime.facet.limit=-1&facet=true&f.product.facet.method=enum&facet.pivot=product,Severity,trimTime
>
> NumFound: 7
>
> Times(ms):
> Qtime: 2,798 Query: 312 Facet: 2,485
>
> -
> (4)
> Query:
> q=*:*&fq=(trimTime:[2012-09-04T15:43:16Z TO
> *])&fq=(trimTime:[2012-09-04T15:43:16Z TO *])&fq=(product:("Application
> Control")) OR (product:("URL
> Filtering"))&f.appi_name.facet.sort=index&f.appi_name.facet.limit=-1&f.app_risk.facet.sort=index&f.app_risk.facet.limit=-1&f.matched_category.facet.sort=index&f.matched_category.facet.limit=-1&f.trimTime.facet.sort=index&f.trimTime.facet.limit=-1&facet=true&f.appi_name.facet.method=enum&facet.pivot=appi_name,app_risk,matched_category,trimTimeex&f.trimTime.facet.limit=-1&facet=true&f.product.facet.method=enum&facet.pivot=product,Severity,trimTime
>
> NumFound: more than 30M
>
> Times(ms): Qtime: 23,288
> -
>
> (5)
> Query:
> q=*:*&fq=(trimTime:[2012-09-05T06:03:55Z TO *])&fq=(Severity:("High"
> "C

Re: Form too large error in SOLR4.0

2012-10-10 Thread Jack Krupansky
1611387 is 1,611,387 which is clearly greater than your revised limit of 
50 = 500,000.


Try setting the limit to 2,000,000 = 200. Or maybe even 5,000,000 = 
500.


-- Jack Krupansky

-Original Message- 
From: ravicv

Sent: Wednesday, October 10, 2012 4:49 AM
To: solr-user@lucene.apache.org
Subject: Form too large error in SOLR4.0

Hi,

Recently we have upgraded solr 1.4 version to 4.0 version. After upgrading
we are experiencing unusual behavior in SOLR4.0.
The same query is working properly in solr 1.4 and it is throwing SEVERE:
null:java.lang.IllegalStateException: Form too large1611387>20 error in
solr4.0.

I have increased maxFormContentSize value in jetty.xml
   
 org.eclipse.jetty.server.Request.maxFormContentSize
 50
   

But still i am facing same issue.

Can some one please help me to resolve this issue.

Full Stack trace:

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: Form too large1611387>20
   at
org.eclipse.jetty.server.Request.extractParameters(Request.java:279)
   at
org.eclipse.jetty.server.Request.getParameterMap(Request.java:705)
   at
org.apache.solr.request.ServletSolrParams.(ServletSolrParams.java:29)
   at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:394)
   at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
   at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
   at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
   at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
   at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
   at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
   at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
   at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
   at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
   at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
   at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
   at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
   at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
   at org.eclipse.jetty.server.Server.handle(Server.java:351)
   at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
   at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
   at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
   at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
   at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
   at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
   at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
   at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
   at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
   at java.lang.Thread.run(Thread.java:662)

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Server at
http://localhost:8983/solr/core0 returned non ok status:500, message:Server
Error
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
   at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
   at org.apache.solr.handler.component.HttpShardHandler$1.call(Htt

Thanks,
Ravi




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Form-too-large-error-in-SOLR4-0-tp4012868.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Form too large error in SOLR4.0

2012-10-10 Thread Otis Gospodnetic
Hi,

Check jetty configs,  this looks like an error from the container.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Oct 10, 2012 4:50 AM, "ravicv"  wrote:

> Hi,
>
> Recently we have upgraded solr 1.4 version to 4.0 version. After upgrading
> we are experiencing unusual behavior in SOLR4.0.
> The same query is working properly in solr 1.4 and it is throwing SEVERE:
> null:java.lang.IllegalStateException: Form too large1611387>20 error in
> solr4.0.
>
> I have increased maxFormContentSize value in jetty.xml
> 
>   org.eclipse.jetty.server.Request.maxFormContentSize
>   50
> 
>
> But still i am facing same issue.
>
> Can some one please help me to resolve this issue.
>
> Full Stack trace:
>
> Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
> SEVERE: null:java.lang.IllegalStateException: Form too large1611387>20
> at
> org.eclipse.jetty.server.Request.extractParameters(Request.java:279)
> at
> org.eclipse.jetty.server.Request.getParameterMap(Request.java:705)
> at
> org.apache.solr.request.ServletSolrParams.(ServletSolrParams.java:29)
> at
>
> org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:394)
> at
>
> org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
> at
>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
> at
>
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
> at
>
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
> at
>
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> at
>
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
> at
>
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
> at
>
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
> at
>
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
> at org.eclipse.jetty.server.Server.handle(Server.java:351)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
> at
>
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
> at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
>
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
> at
>
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
> at
>
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
> at java.lang.Thread.run(Thread.java:662)
>
> Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Server at
> http://localhost:8983/solr/core0 returned non ok status:500,
> message:Server
> Error
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
> at org.apache.solr.handler.component.HttpShardHandler$1.call(Htt
>
> Thanks,
> Ravi
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Form-too-large-error-in-SOLR4-0-tp4012868.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


RE: Wild card searching - well sort of

2012-10-10 Thread Markus Jelsma
Hi - The WordDelimiterFilter can help you get *-BAAN-* for A100-BAAN-C20 but 
only because BAAN is surrounded with characters the filter splits and combines 
upon.
 
-Original message-
> From:Kissue Kissue 
> Sent: Wed 10-Oct-2012 14:20
> To: solr-user@lucene.apache.org
> Subject: Wild card searching - well sort of
> 
> Hi,
> 
> I am wondering if there is a way i can get Solr to do this:
> 
> I have added the string: *-BAAN-* to the index to a field called pattern
> which is a string type. Now i want to be able to search for A100-BAAN-C20
> or ZA20-BAAN-300 and have Solr return *-BAAN-*.
> 
> Any ideas how i can accomplish something like this? I am currently using
> Solr 3.5 with solrJ.
> 
> Thanks.
> 


Re: Installing Solr on a shared hosting server?

2012-10-10 Thread simon
some time back I used dreamhost for a Solr based project. Looks as though
all their offerings, including shared  hosting have Java support - see
http://wiki.dreamhost.com/What_We_Support. I was very happy with their
service and support.

-Simon

On Tue, Oct 9, 2012 at 10:44 AM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:

> Bluehost doesn't seem to support Java processes, so unfortunately the
> answer seems to be no.
>
> You might want to look into getting a Linode or some other similar VPS
> hosting. Solr needs RAM to function well, though, so you're not going
> to be able to go with the cheapest option.
>
> Michael Della Bitta
>
> 
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Tue, Oct 9, 2012 at 9:27 AM, caiod  wrote:
> > I was wondering if I can install Solr on bluehost's shared hosting to
> use as
> > a website search, and also how do I do so? Thank you...
> >
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Installing-Solr-on-a-shared-hosting-server-tp4012708.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr - Make Exact Search on Field with Fuzzy Query

2012-10-10 Thread meghana
 0 down vote favorite


We are using solr 3.6.

We have field named Description. We want searching feature with stemming and
also without stemming (exact word/phrase search), with highlighting in both
.

For that , we had made lot of research and come to conclusion, to use the
copy field with data type which doesn't have stemming factory. it is working
fine at now.

(main field has stemming and copy field has not.)

The data for that field is very large and we are having millions of
documents; and as we want, both searching and highlighting on them; we need
to keep this copy field stored and indexed both. which will increase index
size a lot.

we need to eliminate this duplication if possible any how.

>From the recent research, we read that combining fuzzy search with dismax
will fulfill our requirement. (we have tried a bit but not getting success.)

Please let me know , if this is possible, or any other solutions to make
this happen.

Thanks in Advance




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using additional dictionary with DirectSolrSpellChecker

2012-10-10 Thread O. Klein
Is there some way to supplement the DirectSolrSpellChecker with a dictionary?

(In some cases terms are not used because of threshold, but should be
offered as spellcheck suggestion)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Using-additional-dictionary-with-DirectSolrSpellChecker-tp4012873.html
Sent from the Solr - User mailing list archive at Nabble.com.


Form too large error in SOLR4.0

2012-10-10 Thread ravicv
Hi,

Recently we have upgraded solr 1.4 version to 4.0 version. After upgrading
we are experiencing unusual behavior in SOLR4.0. 
The same query is working properly in solr 1.4 and it is throwing SEVERE:
null:java.lang.IllegalStateException: Form too large1611387>20 error in
solr4.0.

I have increased maxFormContentSize value in jetty.xml

  org.eclipse.jetty.server.Request.maxFormContentSize
  50


But still i am facing same issue.

Can some one please help me to resolve this issue.

Full Stack trace:

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.IllegalStateException: Form too large1611387>20
at
org.eclipse.jetty.server.Request.extractParameters(Request.java:279)
at
org.eclipse.jetty.server.Request.getParameterMap(Request.java:705)
at
org.apache.solr.request.ServletSolrParams.(ServletSolrParams.java:29)
at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:394)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Thread.java:662)

Oct 10, 2012 3:20:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Server at
http://localhost:8983/solr/core0 returned non ok status:500, message:Server
Error
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
at org.apache.solr.handler.component.HttpShardHandler$1.call(Htt

Thanks,
Ravi




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Form-too-large-error-in-SOLR4-0-tp4012868.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: search by multiple 'LIKE' operator connected with 'AND' operator

2012-10-10 Thread gremlin
I'm also unable to config that type of search through schema.xml. As I use
SOLR in drupal, I've implement that in hook_search_api_solr_query_alter by
exploding my search string on two (or more) chunks and now search works
well.

Strangely that couldn'y do it through SOLR configuration.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/search-by-multiple-LIKE-operator-connected-with-AND-operator-tp4012536p4012861.html
Sent from the Solr - User mailing list archive at Nabble.com.