Re: solr highlight problem

2012-10-11 Thread Maurizio Cucchiara
What do you mean by missing vicky?
Did you mean a second fragment?

Sent from my mobile device, so please excuse typos and brevity.

Maurizio Cucchiara
Il giorno 12/ott/2012 08.42, "rayvicky"  ha scritto:

> sb.append("title:test vicky");
> SolrQuery query = new SolrQuery();
> query.setQuery(sb.toString());
>
> query.setHighlight(true);
> query.addHighlightField("title");
> query.setHighlightSimplePre("");
> query.setHighlightSimplePost("");
> query.setHighlightSnippets(2);
> query.setHighlightFragsize(500);
>
> title: my name test is vicky
> result:my name test is vicky
>
> why missing vicky?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/solr-highlight-problem-tp4013273.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


solr highlight problem

2012-10-11 Thread rayvicky
sb.append("title:test vicky");
SolrQuery query = new SolrQuery();  
query.setQuery(sb.toString());

query.setHighlight(true);
query.addHighlightField("title");
query.setHighlightSimplePre("");
query.setHighlightSimplePost("");
query.setHighlightSnippets(2);
query.setHighlightFragsize(500);

title: my name test is vicky
result:my name test is vicky

why missing vicky?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-highlight-problem-tp4013273.html
Sent from the Solr - User mailing list archive at Nabble.com.


Search in specific website

2012-10-11 Thread Tolga

Hi,

I use nutch to crawl my website and index to solr. However, how can I 
search for piece of content in a specific website? I use multiple URL's


Regards,


It's there any way to specify config name for core in solr.xml?

2012-10-11 Thread jun Wang
Hi, all

I have two collections, and two machines. So, my deployment is like
|machine a  |machine b  |
|core a1 | core a2 | core b1 | core b2|

core a1 is for collection 1 shard1, core a2 is for collection 1 shard2.
config for collection is config 1.
core b1 is for collection 2 shard1, core b2 is for collection 2
shard2.  config for collection if config 2.

It's there any way to specify core config in solr.xml to start up two shard
in every machine whit correct config name?

-- 
from Jun Wang


Re: Does Zookeeper notify slave to replication about record update in master

2012-10-11 Thread Otis Gospodnetic
Hi,

I could be mistaken, but there is no pull-replication in Solr 4 unless
one is trying to catch up using traditional Java replicatoin that
pulls from one node to the other.  I believe replication is push
style, immediate, and replicas don't talk to ZK for that.  Master and
slaves are also a thing of the past and now we have leaders and
replicas.  See http://wiki.apache.org/solr/SolrCloud

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 11, 2012 at 11:10 PM, Zeng Lames  wrote:
> Dear All,
>
> We are POC for Solr 4.0 with Zookeeper, wanna to know that whether
> Zookeeper will notify slave to pull when master get record update? if no,
> does it mean there is a time gap of data out-of-sync between master and
> slave node.
>
> thanks a lot!
>
> Best Wishes!


Re: Any filter to map mutiple tokens into one ?

2012-10-11 Thread Jack Krupansky
The ":" which normally separates a field name from a term (or quoted string 
or parenthesized sub-query) is "parsed" by the query parser before analysis 
gets called, and "*:*" is recognized before analysis as well. So, any 
attempt to recreate "*:*" in analysis will be too late to affect query 
parsing and other pre-analysis processing.


But, what is it you are really trying to do? What's the real problem? (This 
sounds like a proverbial "XY Problem".)


-- Jack Krupansky

-Original Message- 
From: T. Kuro Kurosaka

Sent: Thursday, October 11, 2012 7:35 PM
To: solr-user@lucene.apache.org
Subject: Any filter to map mutiple tokens into one ?

I am looking for a way to fold a particular sequence of tokens into one
token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single input
token. "* : * => *:*" seems to be interpreted
as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc,
seems to pass the entire string "*:*" to the query analyzer  (I suspect
a bug.),
and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:


*:*
*:*
+MatchAllDocsQuery(*:*)
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01)
+*:* (body:"* : *"~100^0.5 | title:"* :
*"~100^1.2)~0.01

Notice that there is a space between * and : in
DisjunctionMaxQuery((body:"* : *" )

Probably because of this, the hit score is as low as 0.109, while it is
1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make
DisjunctionMaxQuery happy.


Thanks.


T. "Kuro" Kurosaka



Any filter to map mutiple tokens into one ?

2012-10-11 Thread T. Kuro Kurosaka
I am looking for a way to fold a particular sequence of tokens into one 
token.
Concretely, I'd like to detect a three-token sequence of "*", ":" and 
"*", and replace it with a token of the text "*:*".
I tried SynonymFIlter but it seems it can only deal with a single input 
token. "* : * => *:*" seems to be interpreted

as one input token of 5 characters "*", space, ":", space and "*".

I'm using Solr 3.5.

Background:
My tokenizer separate the three character sequence "*:*" into 3 tokens 
of one character each.
The edismax parser, when given the query "*:*", i.e. find every doc, 
seems to pass the entire string "*:*" to the query analyzer  (I suspect 
a bug.),

and feed the tokenized result to DisjunctionMaxQuery object,
according to this debug output:


*:*
*:*
+MatchAllDocsQuery(*:*) 
DisjunctionMaxQuery((body:"* : *"~100^0.5 | title:"* : 
*"~100^1.2)~0.01)
+*:* (body:"* : *"~100^0.5 | title:"* : 
*"~100^1.2)~0.01


Notice that there is a space between * and : in 
DisjunctionMaxQuery((body:"* : *" )


Probably because of this, the hit score is as low as 0.109, while it is 
1.000 if an analyzer that doesn't break "*:*" is used.
So I'd like to stitch together "*", ":", "*" into "*:*" again to make 
DisjunctionMaxQuery happy.



Thanks.


T. "Kuro" Kurosaka




Re: Custom html headers/footers to solr admin console

2012-10-11 Thread Billy Newman
I take that answer as a no ;)

And no admin only page. But you can query from that page. And the data returned 
could be sensitive. As such our company requires us to flag in a header/footer 
that the contents of the page could could be sensitive. So even though it will 
just be for admin access I still need those headers. 

Sound like I am gonna have to dive into the HTML and make custom changes. 

Thanks for the quick response. 
Billy

Sent from my iPhone

On Oct 11, 2012, at 3:26 PM, Erick Erickson  wrote:

> Uhhmmm, why do you want to do this? The admin screen is pretty
> much purely intended for developers/in-house use. Mostly I just
> want to be sure you aren't thinking about letting users, say, see
> this page. Consider
> /update?stream.body=*:*
> 
> Best
> Erick
> 
> On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman  wrote:
>> Hello all,
>> 
>> 
>> I was just poking around in my solr distribution and I noticed some files:
>> admin-extra.html
>> admin-extra.menu-top.html
>> admin-extra.menu-bottom.html
>> 
>> 
>> I was really hoping that that was html inserted into the solr admin
>> page and I could modify the:
>> admin-extra.menu-top.html
>> admin-extra.menu-bottom.html
>> 
>> files to make a header/footer.
>> 
>> I un-commented out admin-extra.html and can now see that html in the
>> admin extras section for my core so not exactly what I was looking
>> for.
>> 
>> Are the top/bottom html files used and are they really inserted at the
>> top and bottom of the page?
>> 
>> Any way to get some headers in the static admin page?  I would usually
>> just modify the html, but in this case there might already be
>> something I can use.
>> 
>> Thanks,
>> Billy


Re: SolrJ, optimize, maxSegments

2012-10-11 Thread Shawn Heisey

On 10/11/2012 2:02 PM, Shawn Heisey wrote:

UpdateResponse ur = server.optimize(true, true, 20);

What happens with this if I am already below 20 segments? Will it 
still expunge all of my (typically several thousand) deleted 
documents?  I am hoping that what it will do is rebuild any segment 
that contains deleted documents and leave the other segments alone.


I have just tried this on a test system with 11 segments via curl, not 
SolrJ.  I don't expect that it would be any different with SolrJ, though.


curl 
'http://localhost:8981/solr/s0live/update?optimize=true&maxSegments=20&expungeDeletes=true&waitFlush=true'


It didn't work.  When I changed maxSegments to 10, it did reduce the 
index from 11 segments to 10, but there are still deleted documents in 
the index -- maxDoc > numDocs on the statistics screen.


numDocs : 12782762
maxDoc : 12788156

I don't think expungeDeletes is actually a valid parameter for optimize, 
but I included it anyway.  I also tried doing a commit with 
expungeDeletes=true and that didn't work either.


Is this a bug?  The server is 3.5.0.  Because I haven't finished getting 
my configuration worked out, I don't have the ability right now to try 
this on 4.0.0.


Thanks,
Shawn



Re: NewSearcher old cache

2012-10-11 Thread Tomás Fernández Löbbe
>
> Q1)
> As soon as a new searcher is opened, the caches begin populating from the
> older caches. What happens if the NewSearcher event has queries defined in
> them? does these queries ignore the old cache altogether and load only
> results of the queries defined in the listener event? Or do these get added
> after the new caches have been warmed by old caches?
>

Those queries are going to be executed after the cache auto-warm and before
the searcher is registered.

>
> Q2)
> I am running edismax queries on the Solr Server. Can I specify these
> queries
> in NewSearcher and FirstSearcher also? Or are the queries supposed to be
> simple queries?
>

You can use all the parameters you want here. You can use your custom
request handler configuration if you want. With these queries you should
try to warm those things that are not warmed in the caches "autowarm"
process, for example a good idea here is to facet in all the fields where
your real users will be faceting. The same thing with sorting.

Be careful with warming time, in relation to your commit frequency (or open
searcher frequency really). If you are going to use NRT, you may not want
to warm caches.

Also, the whole idea of warming caches is to avoid making your users pay
the penalty of searching with empty caches resulting in slow queries, make
sure the resources you spend warming are not causing worse query times.

Tomás


> Thanks.
>
> --Shreejay
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


NewSearcher old cache

2012-10-11 Thread shreejay
Hello Everyone, 

I was configuring a Solr installation and had a few queries about
NewSearcher. As I understand a NewSearcher event will be triggered if there
is an already existing registered searcher. 

Q1) 
As soon as a new searcher is opened, the caches begin populating from the
older caches. What happens if the NewSearcher event has queries defined in
them? does these queries ignore the old cache altogether and load only
results of the queries defined in the listener event? Or do these get added
after the new caches have been warmed by old caches? 

Q2) 
I am running edismax queries on the Solr Server. Can I specify these queries
in NewSearcher and FirstSearcher also? Or are the queries supposed to be
simple queries? 

Thanks. 

--Shreejay



--
View this message in context: 
http://lucene.472066.n3.nabble.com/NewSearcher-old-cache-tp4013225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom html headers/footers to solr admin console

2012-10-11 Thread Erick Erickson
Uhhmmm, why do you want to do this? The admin screen is pretty
much purely intended for developers/in-house use. Mostly I just
want to be sure you aren't thinking about letting users, say, see
this page. Consider
/update?stream.body=*:*

Best
Erick

On Thu, Oct 11, 2012 at 4:57 PM, Billy Newman  wrote:
> Hello all,
>
>
> I was just poking around in my solr distribution and I noticed some files:
> admin-extra.html
> admin-extra.menu-top.html
> admin-extra.menu-bottom.html
>
>
> I was really hoping that that was html inserted into the solr admin
> page and I could modify the:
> admin-extra.menu-top.html
> admin-extra.menu-bottom.html
>
> files to make a header/footer.
>
> I un-commented out admin-extra.html and can now see that html in the
> admin extras section for my core so not exactly what I was looking
> for.
>
> Are the top/bottom html files used and are they really inserted at the
> top and bottom of the page?
>
> Any way to get some headers in the static admin page?  I would usually
> just modify the html, but in this case there might already be
> something I can use.
>
> Thanks,
> Billy


Open Source Social (London) - 23rd Oct

2012-10-11 Thread Richard Marr
Hi all,

The next Open Source Search Social is on the 23rd Oct at The Plough, in
Bloomsbury.

We usually get a good mix of regulars and newcomers, and a good mix of
backgrounds and experience levels, so please come along if you can. As
usual the format is completely open so we'll be talking about whatever is
most interesting at any one particular moment... ooo, a shiny thing...

Details and RSVP options on the Meetup page:
http://www.meetup.com/london-search-social/events/86580442/

Hope to see you there,

Richard

@richmarr


Issue using SpatialRecursivePrefixTreeFieldType

2012-10-11 Thread Eric Khoury

Hi David, I'm defining my field as such:  When I create a large 
rectangle, say "10 10 500 11", Solr seems to freeze for quite some time.  I 
haven't looked at your code, but I can imagine the algorithm basically fills in 
some sort of indexing matrix, and that's what's taking so long for large 
rectangles? Is there a limit to how big the worldBounds should be?Thanks!Eric.
  

SolrJ, optimize, maxSegments

2012-10-11 Thread Shawn Heisey
Currently my indexing code calls optimize.  Once a night, one of my six 
large shards is optimized, so each one only gets optimized once every 
six days. Here is the SolrJ call, server is an instance of HttpSolrServer:


UpdateResponse ur = server.optimize();

I only do this because I want deleted documents regularly removed from 
the index.  Whatever speed gains I might see from getting down to one 
segment are just an added bonus.  After watching all the discussion on 
the -dev list regarding what to do in Solr due to the Lucene forceMerge 
rename, I am considering changing this to something like the following:


UpdateResponse ur = server.optimize(true, true, 20);

What happens with this if I am already below 20 segments? Will it still 
expunge all of my (typically several thousand) deleted documents?  I am 
hoping that what it will do is rebuild any segment that contains deleted 
documents and leave the other segments alone.  Possibly irrelevant info: 
I'm using the following MP config:


  
35
35
105
  

Thanks,
Shawn



Re: displaying search results in map

2012-10-11 Thread Jamie Johnson
Did you look at
http://stackoverflow.com/questions/11319465/geoclusters-in-solr?  This
sounds similar to what you're asking for based on geohashes of the
points of interest.

On Thu, Oct 11, 2012 at 2:25 PM, Harish Rawat  wrote:
> Sorry for not being clear. Here are more details
>
> 1.) The results are displayed in geographical map
> 2.) Each document has latitude,  longitude field and other fields that can
> be searched on
> 3.) The search will be done for all documents within a lat/long range.
> 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each
> grid we want following
> a) no. of documents in that grid
> b.) top K documents in that grid
> c.) avg of latitude and longitude value for all results in that grid
>
> In lucene I can implement my own custom collector and do all the
> calculations listed in #4. I wanted to understand the best way to implement
> (or use existing if any :) this logic in solr
>
> Regards
> Harish
>
>
>
> On Thu, Oct 11, 2012 at 11:08 AM, Gora Mohanty  wrote:
>
>> On 11 October 2012 23:16, Harish Rawat  wrote:
>>
>> > Hi
>> >
>> > I am working on a project to display the search results on the map. The
>> > idea is to divide the map into N*N grids and show counts for each grid
>> and
>> > allow users to view top result on each grid
>> >
>> > any suggestions on how best to accomplish this with solr?
>> >
>>
>> Your description is not very clear. What search results
>> are you seeking to display on what kind of a map? Are you
>> talking about a geographical map, or something like a 3D
>> histogram (which is what you N x N grid seems to refer to)?
>> Please clarify.
>>
>> In either case, it is quite unlikely that Solr will handle the
>> presentation for you. Solr is a search engine that will return
>> you desired search results. What to do with the search results
>> is an issue for a presentation layer.
>>
>> Regards,
>> Gora
>>


Re: displaying search results in map

2012-10-11 Thread Gora Mohanty
On 11 October 2012 23:55, Harish Rawat  wrote:

> Sorry for not being clear. Here are more details
>
> 1.) The results are displayed in geographical map
> 2.) Each document has latitude,  longitude field and other fields that can
> be searched on
> 3.) The search will be done for all documents within a lat/long range.
> 4.) The lat/lon range is divided into N*N (lets say 64 grids) and for each
> grid we want following
> a) no. of documents in that grid
> b.) top K documents in that grid
> c.) avg of latitude and longitude value for all results in that grid
>
> In lucene I can implement my own custom collector and do all the
> calculations listed in #4. I wanted to understand the best way to implement
> (or use existing if any :) this logic in solr
>
[...]

Hmm, I am  not that familiar with Lucene, so maybe someone
else will chip in with advice.

However, what you describe in point 4 seems to be a clustering
strategy for geographical points. Typically, we use pre-defined
strategies from OpenLayers ( http://openlayers.org ), or custom
strategies.

Regards,
Gora


Re: displaying search results in map

2012-10-11 Thread Gora Mohanty
On 11 October 2012 23:16, Harish Rawat  wrote:

> Hi
>
> I am working on a project to display the search results on the map. The
> idea is to divide the map into N*N grids and show counts for each grid and
> allow users to view top result on each grid
>
> any suggestions on how best to accomplish this with solr?
>

Your description is not very clear. What search results
are you seeking to display on what kind of a map? Are you
talking about a geographical map, or something like a 3D
histogram (which is what you N x N grid seems to refer to)?
Please clarify.

In either case, it is quite unlikely that Solr will handle the
presentation for you. Solr is a search engine that will return
you desired search results. What to do with the search results
is an issue for a presentation layer.

Regards,
Gora


Re: SLOR And OpenNlp integration

2012-10-11 Thread ahmed
in fact 
i dowload the source of solr using svn client
then, i execute the path of the opennlp
then i do ant compile -lib /usr/share/ivy

i got the error 

[javac]   public synchronized Span[] splitSentences(String line) {
[javac]   ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:36:
cannot find symbol
[javac] symbol  : class Tokenizer
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   private final Tokenizer tokenizer;
[javac] ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:38:
cannot find symbol
[javac] symbol  : class TokenizerModel
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   public NLPTokenizerOp(TokenizerModel model) {
[javac] ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/NLPTokenizerOp.java:46:
cannot find symbol
[javac] symbol  : class Span
[javac] location: class org.apache.solr.analysis.opennlp.NLPTokenizerOp
[javac]   public synchronized Span[] getTerms(String sentence) {
[javac]   ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/OpenNLPTokenizerFactory.java:26:
package opennlp.tools.util does not exist
[javac] import opennlp.tools.util.InvalidFormatException;
[javac]  ^
[javac]
/home/pfe/Téléchargements/dev/trunk/solr/contrib/opennlp/src/java/org/apache/solr/analysis/opennlp/OpenNLPOpsFactory.java:9:
package opennlp.tools.chunker does not exist
[javac] import opennlp.tools.chunker.ChunkerModel;
[javac] ^
[javac] 100 errors

BUILD FAILED
/home/pfe/Téléchargements/dev/trunk/build.xml:112: The following error
occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:419: The following
error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/solr/common-build.xml:410: The following
error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:418: The
following error occurred while executing this line:
/home/pfe/Téléchargements/dev/trunk/lucene/common-build.xml:1482: Compile
failed; see the compiler error output for details.

I want to apply a sematique analyses for the document thet will be indexed
using solr .So solr will index and then analyse content using opennlp
instead of tika. 





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013144.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Unique terms without faceting

2012-10-11 Thread Otis Gospodnetic
Hi,

Are you lookig for
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/HighFreqTerms.html
?

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 11, 2012 at 4:40 AM, Toke Eskildsen  
wrote:
> On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote:
>> I know that you can use a facet query to get the unique terms for a
>> field taking account of any q or fq parameters but for our use case the
>> counts are not needed. So is there a more efficient way of finding
>> just unique terms for a field?
>
> Short answer: Not at this moment.
>
>
> If the amount of unique terms is large (millions), a fair amount of
> temporary memory could be spared by just keeping track of matched terms
> with a boolean vs. the full int for standard faceting. Reduced memory
> requirements means less garbage collection and faster processing due to
> better cache utilization. So yes, there is a more efficient way.
>
> Guessing from your other posts, you are building a social network and
> need to query on surnames and similar large fields. Question is of
> course how large the payoff will be and if it is worth the investment in
> development hours. I would suggest hacking the current faceting code to
> use OpenBitSet instead of int[] and doing performance tests on that.
> PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts
> seems to be the right places to look in Solr 4.
>
> Regards,
> Toke Eskildsen, State and University Library, Denmark
>


Re: SLOR And OpenNlp integration

2012-10-11 Thread Koji Sekiguchi

(12/10/11 20:40), ahmed wrote:

Hi, Thanks for reply
i fact i tried this tutorial but when i execute  'ant compile' i have
probleme taht class not found despite the class a re their.I dont know wats
the probleme



I think if you attach the error you got helps us to understand your problem.
Also before then what do you want to do with Solr and OpenNLP integration?

koji
--
http://soleami.com/blog/starting-lab-work.html


Re: unsuscribe

2012-10-11 Thread Erick Erickson
Please follow the instructions here:
https://wiki.apache.org/solr/Unsubscribing%20from%20mailing%20lists



On Wed, Oct 10, 2012 at 6:03 PM, zMk Bnc  wrote:
>
> unsuscribe


Re: anyone have any clues about this exception

2012-10-11 Thread Erick Erickson
Well, you'll actually be able to optimize, it's just called forceMerge.

But the point is that optimize seems like something that _of course_
you want to do, when in reality it's not something you usually should
do at all. Optimize does two things:
1> merges all the segments into one (usually)
2> removes all of the info associated with deleted documents.

Of the two, point <2> is the one that really counts and that's done
whenever segment merging is done anyway. So unless you have
a very large number of deletes (or updates of the same document),
optimize buys you very little. You can tell this by the difference
between numDocs and maxDoc in the admin page.

So what happens if you just don't bother to optimize? Take a look at
merge policy to help control how merging happens perhaps as an
alternative.

Best
Erick

On Wed, Oct 10, 2012 at 3:04 PM, Petersen, Robert  wrote:
> You could be right.  Going back in the logs, I noticed it used to happen less 
> frequently and always towards the end of an optimize operation.  It is 
> probably my indexer timing out waiting for updates to occur during optimizes. 
>  The errors grew recently due to my upping the indexer threadcount to 22 
> threads, so there's a lot more timeouts occurring now.  Also our index has 
> grown to double the old size so the optimize operation has started taking a 
> lot longer, also contributing to what I'm seeing.   I have just changed my 
> optimize frequency from three times a day to one time a day after reading the 
> following:
>
> Here they are talking about completely deprecating the optimize command in 
> the next version of solr…
> https://issues.apache.org/jira/browse/SOLR-3141c
>
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Wednesday, October 10, 2012 11:10 AM
> To: solr-user@lucene.apache.org
> Subject: Re: anyone have any clues about this exception
>
> Something timed out, the other end closed the connection. This end tried to 
> write to closed pipe and died, something tried to catch that exception and 
> write its own and died even worse? Just making it up really, but sounds good 
> (plus a 3-year Java tech-support hunch).
>
> If it happens often enough, see if you can run WireShark on that machine's 
> network interface and catch the whole network conversation in action. Often, 
> there is enough clues there by looking at tcp packets and/or stuff 
> transmitted. WireShark is a power-tool, so takes a little while the first 
> time, but the learning will pay for itself over and over again.
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at once. 
> Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Wed, Oct 10, 2012 at 11:31 PM, Petersen, Robert  wrote:
>> Tomcat localhost log (not the catalina log) for my  solr 3.6.1 (master) 
>> instance contains lots of these exceptions but solr itself seems to be doing 
>> fine... any ideas?  I'm not seeing these exceptions being logged on my slave 
>> servers btw, just the master where we do our indexing only.
>>
>>
>>
>> Oct 9, 2012 5:34:11 PM org.apache.catalina.core.StandardWrapperValve
>> invoke
>> SEVERE: Servlet.service() for servlet default threw exception
>> java.lang.IllegalStateException
>> at 
>> org.apache.catalina.connector.ResponseFacade.sendError(ResponseFacade.java:407)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:389)
>> at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:291)
>> at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>> at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>> at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>> at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>> at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>> at 
>> com.googlecode.psiprobe.Tomcat60AgentValve.invoke(Tomcat60AgentValve.java:30)
>> at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>> at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>> at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
>> at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
>> at 
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>> at 
>> org.apache.tomcat.util.net.JIoEndpoin

Re: Solr - Make Exact Search on Field with Fuzzy Query

2012-10-11 Thread Erick Erickson
Right, and going the other way (storing and highlighting on the non-stemmed
field) would be unsatisfactory due because you'd get a hit on "hospital" in the
stemmed field, but wouldn't highlight it if you searched on "hospitality".

I really don't see a good solution here. Highlighting seems to be one of those
things that's easy in concept but has a zillion ways to go wrong.

I guess I'd really just go with the copyField approach unless you can prove that
it's really a problem. Perhaps lost in my first e-mail is that storing
the field twice
doesn't really affect search speed or _search_ requirements at all. Take a
look here:
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_0/fileformats.html#file-names

note that the *.fdt and *.fdx files are where the original raw copy goes
(i.e. where data gets written when you specify stored="true")
and they are completely independent of the files that contain the searchable
data. So unless you're disk-space constrained, the additional storage really
doesn't cost you much.

Best
Erick

On Thu, Oct 11, 2012 at 2:31 AM, meghana  wrote:
> Hi Erickson,
>
> Thanks for your valuable reply.
>
> Actually we had tried with just storing one field and highlighting on that
> field all the time , whether we search on it or not.
>
> It sometimes occurs issue , like if i search with the term : 'hospitality' .
> and I use field for highlighting , which having stemming applied. it returns
> me highlights with 'hospital' , 'hospitality'. whether it should return
> highlighting only on 'hospitality' as I am doing exact term search, can you
> suggest anything on this?? If we can eliminate this issue while highlighting
> on original field (having applied stemming on it).
>
> The other solutions are sounds really good, but as you said they are hard to
> implement and we at this point , wanted to implement inbuilt solutions if
> possible.
>
> Please suggest if we can eliminate above explained issue on highlighting.
>
> Thanks.
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Make-Exact-Search-on-Field-with-Fuzzy-Query-tp4012888p4013067.html
> Sent from the Solr - User mailing list archive at Nabble.com.


RE: SLOR And OpenNlp integration

2012-10-11 Thread ahmed
Hi, Thanks for reply
i fact i tried this tutorial but when i execute  'ant compile' i have
probleme taht class not found despite the class a re their.I dont know wats
the probleme



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094p4013101.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: SLOR And OpenNlp integration

2012-10-11 Thread Markus Jelsma
Hi - the wiki page will get you up and running quickly:
http://wiki.apache.org/solr/OpenNLP

 
 
-Original message-
> From:ahmed 
> Sent: Thu 11-Oct-2012 13:32
> To: solr-user@lucene.apache.org
> Subject: SLOR And OpenNlp integration
> 
> Hello,
> I am a new user of apache solr and i have to integrate opennlp avec solr
> .The problem is that i dont find a tutorial to do this integration .so i am
> asking if there is someone who can help me to do this integration ?
> thanks,
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


SLOR And OpenNlp integration

2012-10-11 Thread ahmed
Hello,
I am a new user of apache solr and i have to integrate opennlp avec solr
.The problem is that i dont find a tutorial to do this integration .so i am
asking if there is someone who can help me to do this integration ?
thanks,



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SLOR-And-OpenNlp-integration-tp4013094.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Auto Correction?

2012-10-11 Thread Ahmet Arslan
> so other than commercial solutions,
> it seems like i need to have plugin
> right? i couldnt find any open source solutions yet...

Yes you need to implement custom SearchComponent (plugin).   
http://wiki.apache.org/solr/SearchComponent

Or alternatively you can re-search suggestions at client time.


Re: segment number during optimize of index

2012-10-11 Thread jame vaalet
Hi Lance,
My earlier point may be misleading
"   1. Segments are independent sub-indexes in seperate file, while
| >indexing
| >its better to create new segment as it doesnt have to modify an
| >existing
| >file. where as while searching, *smaller the segment* the better
| >it is
| > since
| >you open x (not exactly x but xn a value proportional to x)
| >physical
| > files
| >to search if you have got x segments in the index."

The "smaller"was referencing to the segment number rather than segment
size.

When you said "Large Pages" does it mean segment size should be less than a
threshold for a better performance from OS point of view?  My main concern
here is what would be the main disadvantage (indexing  or searching) if i
merge my entire 150 GB index (right now 100 segments) into a single segment
?





On 11 October 2012 07:28, Lance Norskog  wrote:

> Study index merging. This is awesome.
>
> http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
>
> Jame- opening lots of segments is not a problem. A major performance
> problem you will find is 'Large Pages'. This is an operating-system
> strategy for managing servers with 10s of gigabytes of memory. Without it,
> all large programs run much more slowly than they could. It is not a Solr
> or JVM problem.
>
>
> - Original Message -
> | From: "jun Wang" 
> | To: solr-user@lucene.apache.org
> | Sent: Wednesday, October 10, 2012 6:36:09 PM
> | Subject: Re: segment number during optimize of index
> |
> | I have an other question, does the number of segment affect speed for
> | update index?
> |
> | 2012/10/10 jame vaalet 
> |
> | > Guys,
> | > thanks for all the inputs, I was continuing my research to know
> | > more about
> | > segments in Lucene. Below are my conclusion, please correct me if
> | > am wrong.
> | >
> | >1. Segments are independent sub-indexes in seperate file, while
> | >indexing
> | >its better to create new segment as it doesnt have to modify an
> | >existing
> | >file. where as while searching, smaller the segment the better
> | >it is
> | > since
> | >you open x (not exactly x but xn a value proportional to x)
> | >physical
> | > files
> | >to search if you have got x segments in the index.
> | >2. since lucene has memory map concept, for each file/segment in
> | >index a
> | >new m-map file is created and mapped to the physcial file in
> | >disk. Can
> | >someone explain or correct this in detail, i am sure there are
> | >lot many
> | >people wondering how m-map works while you merge or optimze
> | >index
> | > segments.
> | >
> | >
> | >
> | > On 6 October 2012 07:41, Otis Gospodnetic
> | >  | > >wrote:
> | >
> | > > If I were you and not knowing all your details...
> | > >
> | > > I would optimize indices that are static (not being modified) and
> | > > would optimize down to 1 segment.
> | > > I would do it when search traffic is low.
> | > >
> | > > Otis
> | > > --
> | > > Search Analytics -
> | > > http://sematext.com/search-analytics/index.html
> | > > Performance Monitoring - http://sematext.com/spm/index.html
> | > >
> | > >
> | > > On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet
> | > > 
> | > wrote:
> | > > > Hi Eric,
> | > > > I  am in a major dilemma with my index now. I have got 8 cores
> | > > > each
> | > > around
> | > > > 300 GB in size and half of them are deleted documents in it and
> | > > > above
> | > > that
> | > > > each has got around 100 segments as well. Do i issue a
> | > > > expungeDelete
> | > and
> | > > > allow the merge policy to take care of the segments or optimize
> | > > > them
> | > into
> | > > > single segment. Search performance is not at par compared to
> | > > > usual solr
> | > > > speed.
> | > > > If i have to optimize what segment number should i choose? my
> | > > > RAM size
> | > > > around 120 GB and JVM heap is around 45 GB (oldGen being 30
> | > > > GB). Pleas
> | > > > advice !
> | > > >
> | > > > thanks.
> | > > >
> | > > >
> | > > > On 6 October 2012 00:00, Erick Erickson
> | > > > 
> | > wrote:
> | > > >
> | > > >> because eventually you'd run out of file handles. Imagine a
> | > > >> long-running server with 100,000 segments. Totally
> | > > >> unmanageable.
> | > > >>
> | > > >> I think shawn was emphasizing that RAM requirements don't
> | > > >> depend on the number of segments. There are other
> | > > >> resources that file consume however.
> | > > >>
> | > > >> Best
> | > > >> Erick
> | > > >>
> | > > >> On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet
> | > > >> 
> | > > wrote:
> | > > >> > hi Shawn,
> | > > >> > thanks for the detailed explanation.
> | > > >> > I have got one doubt, you said it doesn matter how many
> | > > >> > segments
> | > index
> | > > >> have
> | > > >> > but then why does solr has this merge policy which merges
> | > > >> > segments
> | > > >> > frequently?  why can it leave the segments as it is rather
> | > > >> > than
> | 

Re: Unique terms without faceting

2012-10-11 Thread Toke Eskildsen
On Wed, 2012-10-10 at 17:45 +0200, Phil Hoy wrote:
> I know that you can use a facet query to get the unique terms for a
> field taking account of any q or fq parameters but for our use case the
> counts are not needed. So is there a more efficient way of finding 
> just unique terms for a field?

Short answer: Not at this moment.


If the amount of unique terms is large (millions), a fair amount of
temporary memory could be spared by just keeping track of matched terms
with a boolean vs. the full int for standard faceting. Reduced memory
requirements means less garbage collection and faster processing due to
better cache utilization. So yes, there is a more efficient way.

Guessing from your other posts, you are building a social network and
need to query on surnames and similar large fields. Question is of
course how large the payoff will be and if it is worth the investment in
development hours. I would suggest hacking the current faceting code to
use OpenBitSet instead of int[] and doing performance tests on that.
PerSegmentSingleValuedFaceting.SegFacet and UnivertedField.getCounts
seems to be the right places to look in Solr 4.

Regards,
Toke Eskildsen, State and University Library, Denmark