Re: Solr - Make Exact Search on Field with Fuzzy Query

2012-10-11 Thread meghana
Hi Erickson,

Thanks for your valuable reply. 

Actually we had tried with just storing one field and highlighting on that
field all the time , whether we search on it or not.

It sometimes occurs issue , like if i search with the term : 'hospitality' .
and I use field for highlighting , which having stemming applied. it returns
me highlights with 'hospital' , 'hospitality'. whether it should return
highlighting only on 'hospitality' as I am doing exact term search, can you
suggest anything on this?? If we can eliminate this issue while highlighting
on original field (having applied stemming on it). 

The other solutions are sounds really good, but as you said they are hard to
implement and we at this point , wanted to implement inbuilt solutions if
possible. 

Please suggest if we can eliminate above explained issue on highlighting.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888p4013067.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question about MoreLikeThis query with solrj

2012-10-11 Thread Dominique Bejean

Hi,

Are you using a correct stopword file for the French language ? It is 
very importante in order the the MLT component works fine.

You should also take a look at this document.
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/

MLT support in SolrJ is a an old story. May be this can help also.
https://issues.apache.org/jira/browse/SOLR-1085

Regards

--
Dominique
www.eolya.fr
www.crawl-anywhere.com
www.mysolrserver.com




Le 02/10/12 18:14, G.Long a écrit :

Hi :)

I'm using Solr 3.6.1 and i'm trying to use the similarity features of 
lucene/solr to compare texts.


The content of my documents is in french so I defined a field like :

field name=content_mlt type=text_fr termVectors=true 
indexed=true stored=true/


(it uses the default text_fr fieldType provided with the default 
schema.xml file)


i'm using the following method to query my index :

SolrQuery sQuery = new SolrQuery();
sQuery.setQueryType(/ + MoreLikeThisParams.MLT);
sQuery.set(MoreLikeThisParams.MATCH_INCLUDE, false);
sQuery.set(MoreLikeThisParams.MIN_DOC_FREQ, 1);
sQuery.set(MoreLikeThisParams.MIN_TERM_FREQ, 1);
sQuery.set(MoreLikeThisParams.MAX_QUERY_TERMS, 50);
sQuery.set(MoreLikeThisParams.SIMILARITY_FIELDS, field);
sQuery.set(fl, *,id,score);
sQuery.setRows(5);
sQuery.setQuery(content_mlt:/the content to find/);

QueryResponse rsp = server.query(sQuery);
return rsp.getResults();

The problem is that the returned results and the associated scores 
look strange to me.


I indexed the three following texts :

sample 1 :
Le 1° de l'article 81 du CGI exige que les allocations pour frais 
soient utilisées conformément à leur objet
pour être affranchies de l'impôt. Lorsque la réalité du versement des 
allocations est établie,
le bénéficiaire doit cependant être en mesure de justifier de leur 
utilisation;


sample 2:
Le premier alinéa du 1° de l'article 81 du CGI prévoit que les 
rémunérations des journalistes,
rédacteurs, photographes, directeurs de journaux et critiques 
dramatiques et musicaux
perçues ès qualités constituent des allocations pour frais d'emploi 
affranchies d'impôt

à concurrence de 7 650 EUR.;

sample 3:
Par ailleurs, lorsque leur montant est fixé par voie législative, les 
allocations
pour frais prévues au 1° de l'article 81 du CGI sont toujours réputées 
utilisées
conformément à leur objet et ne peuvent donner lieu à aucune 
vérification de la part de l'administration.
Il s'agit d'une présomption irréfragable, qui ne peut donc pas être 
renversée par la preuve contraire qui
serait apportée par l'administration d'une utilisation non conforme à 
son objet de l'allocation concernée.
Pour que le deuxième alinéa du 1° de l'article 81 du CGI s'applique, 
deux conditions doivent être réunies
simultanément : - la nature d'allocation spéciale inhérente à la 
fonction ou à l'emploi résulte directement de la loi ;

- son montant est fixé par la loi;

I tried to query the index by passing the first sample as the content 
to query and the result is the following :

MLT result: id: dc3 - score: 0.114195324 (correspond to the sample 3)
MLT result: id: dc2 - score: 0.035233106 (correspond to the sample 2)

The results don't even contain the first sample, although it is 
exactly the same text as the one put into the query :/


Any idea of why I get these results?
Maybe the query parameters are incorrect or there is something to 
change in the solr config?


Thanks :)

Gary









Re: Unique terms without faceting

2012-10-11 Thread Toke Eskildsen
On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote:
 I know that you can use a facet query to get the unique terms for a
 field taking account of any q or fq parameters but for our use case the
 counts are not needed. So is there a more efficient way of finding 
 just unique terms for a field?

Short answer: Not at this moment.


If the amount of unique terms is large (millions), a fair amount of
temporary memory could be spared by just keeping track of matched terms
with a boolean vs. the full int for standard faceting. Reduced memory
requirements means less garbage collection and faster processing due to
better cache utilization. So yes, there is a more efficient way.

Guessing from your other posts, you are building a social network and
need to query on surnames and similar large fields. Question is of
course how large the payoff will be and if it is worth the investment in
development hours. I would suggest hacking the current faceting code to
use OpenBitSet instead of int[] and doing performance tests on that.
PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts
seems to be the right places to look in Solr 4.

Regards,
Toke Eskildsen, State and University Library, Denmark



Re: segment number during optimize of index

2012-10-11 Thread jame vaalet
Hi Lance,
My earlier point may be misleading
   1. Segments are independent sub-indexes in seperate file, while
| indexing
| its better to create new segment as it doesnt have to modify an
| existing
| file. where as while searching, *smaller the segment* the better
| it is
|  since
| you open x (not exactly x but xn a value proportional to x)
| physical
|  files
| to search if you have got x segments in the index.

The smallerwas referencing to the segment number rather than segment
size.

When you said Large Pages does it mean segment size should be less than a
threshold for a better performance from OS point of view?  My main concern
here is what would be the main disadvantage (indexing  or searching) if i
merge my entire 150 GB index (right now 100 segments) into a single segment
?





On 11 October 2012 07:28, Lance Norskog goks...@gmail.com wrote:

 Study index merging. This is awesome.

 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 Jame- opening lots of segments is not a problem. A major performance
 problem you will find is 'Large Pages'. This is an operating-system
 strategy for managing servers with 10s of gigabytes of memory. Without it,
 all large programs run much more slowly than they could. It is not a Solr
 or JVM problem.


 - Original Message -
 | From: jun Wang wangjun...@gmail.com
 | To: solr-user@lucene.apache.org
 | Sent: Wednesday, October 10, 2012 6:36:09 PM
 | Subject: Re: segment number during optimize of index
 |
 | I have an other question, does the number of segment affect speed for
 | update index?
 |
 | 2012/10/10 jame vaalet jamevaa...@gmail.com
 |
 |  Guys,
 |  thanks for all the inputs, I was continuing my research to know
 |  more about
 |  segments in Lucene. Below are my conclusion, please correct me if
 |  am wrong.
 | 
 | 1. Segments are independent sub-indexes in seperate file, while
 | indexing
 | its better to create new segment as it doesnt have to modify an
 | existing
 | file. where as while searching, smaller the segment the better
 | it is
 |  since
 | you open x (not exactly x but xn a value proportional to x)
 | physical
 |  files
 | to search if you have got x segments in the index.
 | 2. since lucene has memory map concept, for each file/segment in
 | index a
 | new m-map file is created and mapped to the physcial file in
 | disk. Can
 | someone explain or correct this in detail, i am sure there are
 | lot many
 | people wondering how m-map works while you merge or optimze
 | index
 |  segments.
 | 
 | 
 | 
 |  On 6 October 2012 07:41, Otis Gospodnetic
 |  otis.gospodne...@gmail.com
 |  wrote:
 | 
 |   If I were you and not knowing all your details...
 |  
 |   I would optimize indices that are static (not being modified) and
 |   would optimize down to 1 segment.
 |   I would do it when search traffic is low.
 |  
 |   Otis
 |   --
 |   Search Analytics -
 |   http://sematext.com/search-analytics/index.html
 |   Performance Monitoring - http://sematext.com/spm/index.html
 |  
 |  
 |   On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet
 |   jamevaa...@gmail.com
 |  wrote:
 |Hi Eric,
 |I  am in a major dilemma with my index now. I have got 8 cores
 |each
 |   around
 |300 GB in size and half of them are deleted documents in it and
 |above
 |   that
 |each has got around 100 segments as well. Do i issue a
 |expungeDelete
 |  and
 |allow the merge policy to take care of the segments or optimize
 |them
 |  into
 |single segment. Search performance is not at par compared to
 |usual solr
 |speed.
 |If i have to optimize what segment number should i choose? my
 |RAM size
 |around 120 GB and JVM heap is around 45 GB (oldGen being 30
 |GB). Pleas
 |advice !
 |   
 |thanks.
 |   
 |   
 |On 6 October 2012 00:00, Erick Erickson
 |erickerick...@gmail.com
 |  wrote:
 |   
 |because eventually you'd run out of file handles. Imagine a
 |long-running server with 100,000 segments. Totally
 |unmanageable.
 |   
 |I think shawn was emphasizing that RAM requirements don't
 |depend on the number of segments. There are other
 |resources that file consume however.
 |   
 |Best
 |Erick
 |   
 |On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet
 |jamevaa...@gmail.com
 |   wrote:
 | hi Shawn,
 | thanks for the detailed explanation.
 | I have got one doubt, you said it doesn matter how many
 | segments
 |  index
 |have
 | but then why does solr has this merge policy which merges
 | segments
 | frequently?  why can it leave the segments as it is rather
 | than
 |   merging
 | smaller one's into bigger one?
 |
 | thanks
 | .
 |
 | On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org
 | wrote:
 |
 | On 10/4/2012 3:22 PM, jame vaalet wrote:
 |
 | so 

Re: Auto Correction?

2012-10-11 Thread Ahmet Arslan
 so other than commercial solutions,
 it seems like i need to have plugin
 right? i couldnt find any open source solutions yet...

Yes you need to implement custom SearchComponent (plugin).   
http://wiki.apache.org/solr/SearchComponent

Or alternatively you can re-search suggestions at client time.


SLOR And OpenNlp integration

2012-10-11 Thread ahmed
Hello,
I am a new user of apache solr and i have to integrate opennlp avec solr
.The problem is that i dont find a tutorial to do this integration .so i am
asking if there is someone who can help me to do this integration ?
thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SLOR And OpenNlp integration

2012-10-11 Thread Markus Jelsma
Hi - the wiki page will get you up and running quickly:
http://wiki.apache.org/solr/OpenNLP

 
 
-Original message-
 From:ahmed ahmed.missaoui...@gmail.com
 Sent: Thu 11-Oct-2012 13:32
 To: solr-user@lucene.apache.org
 Subject: SLOR And OpenNlp integration
 
 Hello,
 I am a new user of apache solr and i have to integrate opennlp avec solr
 .The problem is that i dont find a tutorial to do this integration .so i am
 asking if there is someone who can help me to do this integration ?
 thanks,
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 


RE: SLOR And OpenNlp integration

2012-10-11 Thread ahmed
Hi, Thanks for reply
i fact i tried this tutorial but when i execute  'ant compile' i have
probleme taht class not found despite the class a re their.I dont know wats
the probleme



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013101.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr - Make Exact Search on Field with Fuzzy Query

2012-10-11 Thread Erick Erickson
Right, and going the other way (storing and highlighting on the non-stemmed
field) would be unsatisfactory due because you'd get a hit on hospital in the
stemmed field, but wouldn't highlight it if you searched on hospitality.

I really don't see a good solution here. Highlighting seems to be one of those
things that's easy in concept but has a zillion ways to go wrong.

I guess I'd really just go with the copyField approach unless you can prove that
it's really a problem. Perhaps lost in my first e-mail is that storing
the field twice
doesn't really affect search speed or _search_ requirements at all. Take a
look here:
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/fileformats.html#file-names

note that the *.fdt and *.fdx files are where the original raw copy goes
(i.e. where data gets written when you specify stored=true)
and they are completely independent of the files that contain the searchable
data. So unless you're disk-space constrained, the additional storage really
doesn't cost you much.

Best
Erick

On Thu, Oct 11, 2012 at 2:31 AM, meghana meghana.rav...@amultek.com wrote:
 Hi Erickson,

 Thanks for your valuable reply.

 Actually we had tried with just storing one field and highlighting on that
 field all the time , whether we search on it or not.

 It sometimes occurs issue , like if i search with the term : 'hospitality' .
 and I use field for highlighting , which having stemming applied. it returns
 me highlights with 'hospital' , 'hospitality'. whether it should return
 highlighting only on 'hospitality' as I am doing exact term search, can you
 suggest anything on this?? If we can eliminate this issue while highlighting
 on original field (having applied stemming on it).

 The other solutions are sounds really good, but as you said they are hard to
 implement and we at this point , wanted to implement inbuilt solutions if
 possible.

 Please suggest if we can eliminate above explained issue on highlighting.

 Thanks.



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888p4013067.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone have any clues about this exception

2012-10-11 Thread Erick Erickson
Well, you'll actually be able to optimize, it's just called forceMerge.

But the point is that optimize seems like something that _of course_
you want to do, when in reality it's not something you usually should
do at all. Optimize does two things:
1 merges all the segments into one (usually)
2 removes all of the info associated with deleted documents.

Of the two, point 2 is the one that really counts and that's done
whenever segment merging is done anyway. So unless you have
a very large number of deletes (or updates of the same document),
optimize buys you very little. You can tell this by the difference
between numDocs and maxDoc in the admin page.

So what happens if you just don't bother to optimize? Take a look at
merge policy to help control how merging happens perhaps as an
alternative.

Best
Erick

On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert rober...@buy.com wrote:
 You could be right.  Going back in the logs, I noticed it used to happen less 
 frequently and always towards the end of an optimize operation.  It is 
 probably my indexer timing out waiting for updates to occur during optimizes. 
  The errors grew recently due to my upping the indexer threadcount to 22 
 threads, so there's a lot more timeouts occurring now.  Also our index has 
 grown to double the old size so the optimize operation has started taking a 
 lot longer, also contributing to what I'm seeing.   I have just changed my 
 optimize frequency from three times a day to one time a day after reading the 
 following:

 Here they are talking about completely deprecating the optimize command in 
 the next version of solr…
 https://issues.apache.org/jira/browse/SOLR-3141c


 -Original Message-
 From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
 Sent: Wednesday, October 10, 2012 11:10 AM
 To: solr-user@lucene.apache.org
 Subject: Re: anyone have any clues about this exception

 Something timed out, the other end closed the connection. This end tried to 
 write to closed pipe and died, something tried to catch that exception and 
 write its own and died even worse? Just making it up really, but sounds good 
 (plus a 3-year Java tech-support hunch).

 If it happens often enough, see if you can run WireShark on that machine's 
 network interface and catch the whole network conversation in action. Often, 
 there is enough clues there by looking at tcp packets and/or stuff 
 transmitted. WireShark is a power-tool, so takes a little while the first 
 time, but the learning will pay for itself over and over again.

 Regards,
Alex.

 Personal blog: http://blog.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at once. 
 Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert rober...@buy.com wrote:
 Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
 instance contains lots of these exceptions but solr itself seems to be doing 
 fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
 servers btw, just the master where we do our indexing only.



 Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve
 invoke
 SEVERE: Servlet.service() for servlet default threw exception
 java.lang.IllegalStateException
 at 
 org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
 com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
 at java.lang.Thread.run(Unknown Source)



Re: unsuscribe

2012-10-11 Thread Erick Erickson
Please follow the instructions here:
https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists



On Wed, Oct 10, 2012 at 6:03 PM, zMk Bnc zig...@hotmail.com wrote:

 unsuscribe


Re: SLOR And OpenNlp integration

2012-10-11 Thread Koji Sekiguchi

(12/10/11 20:40), ahmed wrote:

Hi, Thanks for reply
i fact i tried this tutorial but when i execute  'ant compile' i have
probleme taht class not found despite the class a re their.I dont know wats
the probleme



I think if you attach the error you got helps us to understand your problem.
Also before then what do you want to do with Solr and OpenNLP integration?

koji
--
http://soleami.com/blog/starting-lab-work.html


Re: Unique terms without faceting

2012-10-11 Thread Otis Gospodnetic
Hi,

Are you lookig for
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/HighFreqTerms.html
?

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 11, 2012 at 4:40 AM, Toke Eskildsen t...@statsbiblioteket.dk 
wrote:
 On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote:
 I know that you can use a facet query to get the unique terms for a
 field taking account of any q or fq parameters but for our use case the
 counts are not needed. So is there a more efficient way of finding
 just unique terms for a field?

 Short answer: Not at this moment.


 If the amount of unique terms is large (millions), a fair amount of
 temporary memory could be spared by just keeping track of matched terms
 with a boolean vs. the full int for standard faceting. Reduced memory
 requirements means less garbage collection and faster processing due to
 better cache utilization. So yes, there is a more efficient way.

 Guessing from your other posts, you are building a social network and
 need to query on surnames and similar large fields. Question is of
 course how large the payoff will be and if it is worth the investment in
 development hours. I would suggest hacking the current faceting code to
 use OpenBitSet instead of int[] and doing performance tests on that.
 PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts
 seems to be the right places to look in Solr 4.

 Regards,
 Toke Eskildsen, State and University Library, Denmark



Re: SLOR And OpenNlp integration

2012-10-11 Thread ahmed
in fact 
i dowload the source of solr using svn client
then, i execute the path of the opennlp
then i do ant compile -lib /usr/share/ivy

i got the error 

[javac]   public synchronized Span[] splitSentences(String line) {
[javac]   ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:36:
cannot find symbol
[javac] symbol  : class Tokenizer
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   private final Tokenizer tokenizer;
[javac] ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:38:
cannot find symbol
[javac] symbol  : class TokenizerModel
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   public NLPTokenizerOp(TokenizerModel model) {
[javac] ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:46:
cannot find symbol
[javac] symbol  : class Span
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   public synchronized Span[] getTerms(String sentence) {
[javac]   ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/OpenNLPTokenizerFactory.java:26:
package opennlp.tools.util does not exist
[javac] import opennlp.tools.util.InvalidFormatException;
[javac]  ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/OpenNLPOpsFactory.java:9:
package opennlp.tools.chunker does not exist
[javac] import opennlp.tools.chunker.ChunkerModel;
[javac] ^
[javac] 100 errors

BUILD FAILED
/home/pfe/Téléchargements/dev/trunk/build.xml:112: The following error
occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:419: The following
error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:410: The following
error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:418: The
following error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:1482: Compile
failed; see the compiler error output for details.

I want to apply a sematique analyses for the document thet will be indexed
using solr .So solr will index and then analyse content using opennlp
instead of tika. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: displaying search results in map

2012-10-11 Thread Gora Mohanty
On 11 October 2012 23:16, Harish Rawat harish.s.ra...@gmail.com wrote:

 Hi

 I am working on a project to display the search results on the map. The
 idea is to divide the map into N*N grids and show counts for each grid and
 allow users to view top result on each grid

 any suggestions on how best to accomplish this with solr?


Your description is not very clear. What search results
are you seeking to display on what kind of a map? Are you
talking about a geographical map, or something like a 3D
histogram (which is what you N x N grid seems to refer to)?
Please clarify.

In either case, it is quite unlikely that Solr will handle the
presentation for you. Solr is a search engine that will return
you desired search results. What to do with the search results
is an issue for a presentation layer.

Regards,
Gora


Re: displaying search results in map

2012-10-11 Thread Gora Mohanty
On 11 October 2012 23:55, Harish Rawat harish.s.ra...@gmail.com wrote:

 Sorry for not being clear. Here are more details

 1.) The results are displayed in geographical map
 2.) Each document has latitude,  longitude field and other fields that can
 be searched on
 3.) The search will be done for all documents within a lat/long range.
 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each
 grid we want following
 a) no. of documents in that grid
 b.) top K documents in that grid
 c.) avg of latitude and longitude value for all results in that grid

 In lucene I can implement my own custom collector and do all the
 calculations listed in #4. I wanted to understand the best way to implement
 (or use existing if any :) this logic in solr

[...]

Hmm, I am  not that familiar with Lucene, so maybe someone
else will chip in with advice.

However, what you describe in point 4 seems to be a clustering
strategy for geographical points. Typically, we use pre-defined
strategies from OpenLayers ( http://openlayers.org ), or custom
strategies.

Regards,
Gora


Re: displaying search results in map

2012-10-11 Thread Jamie Johnson
Did you look at
http://stackoverflow.com/questions/11319465/geoclusters-in-solr?  This
sounds similar to what you're asking for based on geohashes of the
points of interest.

On Thu, Oct 11, 2012 at 2:25 PM, Harish Rawat harish.s.ra...@gmail.com wrote:
 Sorry for not being clear. Here are more details

 1.) The results are displayed in geographical map
 2.) Each document has latitude,  longitude field and other fields that can
 be searched on
 3.) The search will be done for all documents within a lat/long range.
 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each
 grid we want following
 a) no. of documents in that grid
 b.) top K documents in that grid
 c.) avg of latitude and longitude value for all results in that grid

 In lucene I can implement my own custom collector and do all the
 calculations listed in #4. I wanted to understand the best way to implement
 (or use existing if any :) this logic in solr

 Regards
 Harish



 On Thu, Oct 11, 2012 at 11:08 AM, Gora Mohanty g...@mimirtech.com wrote:

 On 11 October 2012 23:16, Harish Rawat harish.s.ra...@gmail.com wrote:

  Hi
 
  I am working on a project to display the search results on the map. The
  idea is to divide the map into N*N grids and show counts for each grid
 and
  allow users to view top result on each grid
 
  any suggestions on how best to accomplish this with solr?
 

 Your description is not very clear. What search results
 are you seeking to display on what kind of a map? Are you
 talking about a geographical map, or something like a 3D
 histogram (which is what you N x N grid seems to refer to)?
 Please clarify.

 In either case, it is quite unlikely that Solr will handle the
 presentation for you. Solr is a search engine that will return
 you desired search results. What to do with the search results
 is an issue for a presentation layer.

 Regards,
 Gora



SolrJ, optimize, maxSegments

2012-10-11 Thread Shawn Heisey
Currently my indexing code calls optimize.  Once a night, one of my six 
large shards is optimized, so each one only gets optimized once every 
six days. Here is the SolrJ call, server is an instance of HttpSolrServer:


UpdateResponse ur = server.optimize();

I only do this because I want deleted documents regularly removed from 
the index.  Whatever speed gains I might see from getting down to one 
segment are just an added bonus.  After watching all the discussion on 
the -dev list regarding what to do in Solr due to the Lucene forceMerge 
rename, I am considering changing this to something like the following:


UpdateResponse ur = server.optimize(true, true, 20);

What happens with this if I am already below 20 segments? Will it still 
expunge all of my (typically several thousand) deleted documents?  I am 
hoping that what it will do is rebuild any segment that contains deleted 
documents and leave the other segments alone.  Possibly irrelevant info: 
I'm using the following MP config:


  mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce35/int
int name=segmentsPerTier35/int
int name=maxMergeAtOnceExplicit105/int
  /mergePolicy

Thanks,
Shawn



Issue using SpatialRecursivePrefixTreeFieldType

2012-10-11 Thread Eric Khoury

Hi David, I'm defining my field as such: fieldType name=rectangle 
class=solr.SpatialRecursivePrefixTreeFieldType geo=false distErrPct=0 
maxDetailDist=1 worldBounds=0 0 10916173 20/ When I create a large 
rectangle, say 10 10 500 11, Solr seems to freeze for quite some time.  I 
haven't looked at your code, but I can imagine the algorithm basically fills in 
some sort of indexing matrix, and that's what's taking so long for large 
rectangles? Is there a limit to how big the worldBounds should be?Thanks!Eric.
  

Open Source Social (London) - 23rd Oct

2012-10-11 Thread Richard Marr
Hi all,

The next Open Source Search Social is on the 23rd Oct at The Plough, in
Bloomsbury.

We usually get a good mix of regulars and newcomers, and a good mix of
backgrounds and experience levels, so please come along if you can. As
usual the format is completely open so we'll be talking about whatever is
most interesting at any one particular moment... ooo, a shiny thing...

Details and RSVP options on the Meetup page:
http://www.meetup.com/london-search-social/events/86580442/

Hope to see you there,

Richard

@richmarr


Re: Custom html headers/footers to solr admin console

2012-10-11 Thread Erick Erickson
Uhhmmm, why do you want to do this? The admin screen is pretty
much purely intended for developers/in-house use. Mostly I just
want to be sure you aren't thinking about letting users, say, see
this page. Consider
/update?stream.body=deletequery*:*/querydelete/

Best
Erick

On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman newman...@gmail.com wrote:
 Hello all,


 I was just poking around in my solr distribution and I noticed some files:
 admin-extra.html
 admin-extra.menu-top.html
 admin-extra.menu-bottom.html


 I was really hoping that that was html inserted into the solr admin
 page and I could modify the:
 admin-extra.menu-top.html
 admin-extra.menu-bottom.html

 files to make a header/footer.

 I un-commented out admin-extra.html and can now see that html in the
 admin extras section for my core so not exactly what I was looking
 for.

 Are the top/bottom html files used and are they really inserted at the
 top and bottom of the page?

 Any way to get some headers in the static admin page?  I would usually
 just modify the html, but in this case there might already be
 something I can use.

 Thanks,
 Billy


NewSearcher old cache

2012-10-11 Thread shreejay
Hello Everyone, 

I was configuring a Solr installation and had a few queries about
NewSearcher. As I understand a NewSearcher event will be triggered if there
is an already existing registered searcher. 

Q1) 
As soon as a new searcher is opened, the caches begin populating from the
older caches. What happens if the NewSearcher event has queries defined in
them? does these queries ignore the old cache altogether and load only
results of the queries defined in the listener event? Or do these get added
after the new caches have been warmed by old caches? 

Q2) 
I am running edismax queries on the Solr Server. Can I specify these queries
in NewSearcher and FirstSearcher also? Or are the queries supposed to be
simple queries? 

Thanks. 

--Shreejay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: NewSearcher old cache

2012-10-11 Thread Tomás Fernández Löbbe

 Q1)
 As soon as a new searcher is opened, the caches begin populating from the
 older caches. What happens if the NewSearcher event has queries defined in
 them? does these queries ignore the old cache altogether and load only
 results of the queries defined in the listener event? Or do these get added
 after the new caches have been warmed by old caches?


Those queries are going to be executed after the cache auto-warm and before
the searcher is registered.


 Q2)
 I am running edismax queries on the Solr Server. Can I specify these
 queries
 in NewSearcher and FirstSearcher also? Or are the queries supposed to be
 simple queries?


You can use all the parameters you want here. You can use your custom
request handler configuration if you want. With these queries you should
try to warm those things that are not warmed in the caches autowarm
process, for example a good idea here is to facet in all the fields where
your real users will be faceting. The same thing with sorting.

Be careful with warming time, in relation to your commit frequency (or open
searcher frequency really). If you are going to use NRT, you may not want
to warm caches.

Also, the whole idea of warming caches is to avoid making your users pay
the penalty of searching with empty caches resulting in slow queries, make
sure the resources you spend warming are not causing worse query times.

Tomás


 Thanks.

 --Shreejay



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: SolrJ, optimize, maxSegments

2012-10-11 Thread Shawn Heisey

On 10/11/2012 2:02 PM, Shawn Heisey wrote:

UpdateResponse ur = server.optimize(true, true, 20);

What happens with this if I am already below 20 segments? Will it 
still expunge all of my (typically several thousand) deleted 
documents?  I am hoping that what it will do is rebuild any segment 
that contains deleted documents and leave the other segments alone.


I have just tried this on a test system with 11 segments via curl, not 
SolrJ.  I don't expect that it would be any different with SolrJ, though.


curl 
'http://localhost:8981/solr/s0live/update?optimize=truemaxSegments=20expungeDeletes=truewaitFlush=true'


It didn't work.  When I changed maxSegments to 10, it did reduce the 
index from 11 segments to 10, but there are still deleted documents in 
the index -- maxDoc  numDocs on the statistics screen.


numDocs : 12782762
maxDoc : 12788156

I don't think expungeDeletes is actually a valid parameter for optimize, 
but I included it anyway.  I also tried doing a commit with 
expungeDeletes=true and that didn't work either.


Is this a bug?  The server is 3.5.0.  Because I haven't finished getting 
my configuration worked out, I don't have the ability right now to try 
this on 4.0.0.


Thanks,
Shawn



Re: Custom html headers/footers to solr admin console

2012-10-11 Thread Billy Newman
I take that answer as a no ;)

And no admin only page. But you can query from that page. And the data returned 
could be sensitive. As such our company requires us to flag in a header/footer 
that the contents of the page could could be sensitive. So even though it will 
just be for admin access I still need those headers. 

Sound like I am gonna have to dive into the HTML and make custom changes. 

Thanks for the quick response. 
Billy

Sent from my iPhone

On Oct 11, 2012, at 3:26 PM, Erick Erickson erickerick...@gmail.com wrote:

 Uhhmmm, why do you want to do this? The admin screen is pretty
 much purely intended for developers/in-house use. Mostly I just
 want to be sure you aren't thinking about letting users, say, see
 this page. Consider
 /update?stream.body=deletequery*:*/querydelete/
 
 Best
 Erick
 
 On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman newman...@gmail.com wrote:
 Hello all,
 
 
 I was just poking around in my solr distribution and I noticed some files:
 admin-extra.html
 admin-extra.menu-top.html
 admin-extra.menu-bottom.html
 
 
 I was really hoping that that was html inserted into the solr admin
 page and I could modify the:
 admin-extra.menu-top.html
 admin-extra.menu-bottom.html
 
 files to make a header/footer.
 
 I un-commented out admin-extra.html and can now see that html in the
 admin extras section for my core so not exactly what I was looking
 for.
 
 Are the top/bottom html files used and are they really inserted at the
 top and bottom of the page?
 
 Any way to get some headers in the static admin page?  I would usually
 just modify the html, but in this case there might already be
 something I can use.
 
 Thanks,
 Billy


Any filter to map mutiple tokens into one ?

2012-10-11 Thread T. Kuro Kurosaka
I am looking for a way to fold a particular sequence of tokens into one 
token.
Concretely, I'd like to detect a three-token sequence of *, : and 
*, and replace it with a token of the text *:*.
I tried SynonymFIlter but it seems it can only deal with a single input 
token. * : * = *:* seems to be interpreted

as one input token of 5 characters *, space, :, space and *.

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence *:* into 3 tokens 
of one character each.
The edismax parser, when given the query *:*, i.e. find every doc, 
seems to pass the entire string *:* to the query analyzer  (I suspect 
a bug.),

and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:

lst name=debug
str name=rawquerystring*:*/str
str name=querystring*:*/str
str name=parsedquery+MatchAllDocsQuery(*:*) 
DisjunctionMaxQuery((body:* : *~100^0.5 | title:* : 
*~100^1.2)~0.01)/str
str name=parsedquery_toString+*:* (body:* : *~100^0.5 | title:* : 
*~100^1.2)~0.01/str


Notice that there is a space between * and : in 
DisjunctionMaxQuery((body:* : * )


Probably because of this, the hit score is as low as 0.109, while it is 
1.000 if an analyzer that doesn't break *:* is used.
So I'd like to stitch together *, :, * into *:* again to make 
DisjunctionMaxQuery happy.



Thanks.


T. Kuro Kurosaka




Re: Any filter to map mutiple tokens into one ?

2012-10-11 Thread Jack Krupansky
The : which normally separates a field name from a term (or quoted string 
or parenthesized sub-query) is parsed by the query parser before analysis 
gets called, and *:* is recognized before analysis as well. So, any 
attempt to recreate *:* in analysis will be too late to affect query 
parsing and other pre-analysis processing.


But, what is it you are really trying to do? What's the real problem? (This 
sounds like a proverbial XY Problem.)


-- Jack Krupansky

-Original Message- 
From: T. Kuro Kurosaka

Sent: Thursday, October 11, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Any filter to map mutiple tokens into one ?

I am looking for a way to fold a particular sequence of tokens into one
token.
Concretely, I'd like to detect a three-token sequence of *, : and
*, and replace it with a token of the text *:*.
I tried SynonymFIlter but it seems it can only deal with a single input
token. * : * = *:* seems to be interpreted
as one input token of 5 characters *, space, :, space and *.

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence *:* into 3 tokens
of one character each.
The edismax parser, when given the query *:*, i.e. find every doc,
seems to pass the entire string *:* to the query analyzer  (I suspect
a bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:

lst name=debug
str name=rawquerystring*:*/str
str name=querystring*:*/str
str name=parsedquery+MatchAllDocsQuery(*:*)
DisjunctionMaxQuery((body:* : *~100^0.5 | title:* :
*~100^1.2)~0.01)/str
str name=parsedquery_toString+*:* (body:* : *~100^0.5 | title:* :
*~100^1.2)~0.01/str

Notice that there is a space between * and : in
DisjunctionMaxQuery((body:* : * )

Probably because of this, the hit score is as low as 0.109, while it is
1.000 if an analyzer that doesn't break *:* is used.
So I'd like to stitch together *, :, * into *:* again to make
DisjunctionMaxQuery happy.


Thanks.


T. Kuro Kurosaka



Re: Does Zookeeper notify slave to replication about record update in master

2012-10-11 Thread Otis Gospodnetic
Hi,

I could be mistaken, but there is no pull-replication in Solr 4 unless
one is trying to catch up using traditional Java replicatoin that
pulls from one node to the other.  I believe replication is push
style, immediate, and replicas don't talk to ZK for that.  Master and
slaves are also a thing of the past and now we have leaders and
replicas.  See http://wiki.apache.org/solr/SolrCloud

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 11, 2012 at 11:10 PM, Zeng Lames lezhi.z...@gmail.com wrote:
 Dear All,

 We are POC for Solr 4.0 with Zookeeper, wanna to know that whether
 Zookeeper will notify slave to pull when master get record update? if no,
 does it mean there is a time gap of data out-of-sync between master and
 slave node.

 thanks a lot!

 Best Wishes!