RE: out of memory during indexing do to large incoming queue

2013-06-17 Thread Yoni Amir
Thanks Shawn,
This was very helpful. Indeed I had some terminology problem regarding the 
segment merging. In any case, I tweaked those parameters that you recommended 
and it helped a lot.

I was wondering about your recommendation to use facet.method=enum? Can you 
explain what is the trade-off here? I understand that I gain a benefit by using 
less memory, but what with I lose? Is it speed?

Also, do you know if there is an answer to my original question in this thread? 
Solr has a queue of incoming requests, which, in my case, kept on growing. I 
looked at the code but couldn't find it, I think maybe it is an implicit queue 
in the form of Java's concurrent thread pool or something like that.

Is it possible to limit the size of this queue, or to determine its size during 
runtime? This is the last issue that I am trying to figure out right now.

Also, to answer your question about the field all_text: all the fields are 
stored in order to support partial-update of documents. Most of the fields are 
used for highlighting, all_text is used for searching. I'll gladly omit 
all_text from being stored, but then partial-update won't work.
The reason I didn't use edismax to search all the fields, is because the list 
of all fields is very long. Can edismax handle several hundred fields in the 
list? What about dynamic fields? Edismax requires the list to be fixed in the 
configuration file, so I can't include dynamic fields there. I can pass along 
the full list in the 'qf' parameter in every search request, but this seems 
like a waste? Also, what about performance? I was told that the best practice 
in this case (you have lots of fields and want to search everything) is to copy 
everything to a catch-all field.

Thanks again,
Yoni

-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Monday, June 03, 2013 17:08
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/3/2013 1:06 AM, Yoni Amir wrote:
> Solrconfig.xml -> http://apaste.info/dsbv
> 
> Schema.xml -> http://apaste.info/67PI
> 
> This solrconfig.xml file has optimization enabled. I had another file which I 
> can't locate at the moment, in which I defined a custom merge scheduler in 
> order to disable optimization.
> 
> When I say 1000 segments, I mean that's the number I saw in Solr UI. I assume 
> there were much more files than that.

I think we have a terminology problem happening here.  There's nothing you can 
put in a solrconfig.xml file to enable optimization.  Solr will only optimize 
when you explicitly send an optimize command to it.  There is segment merging, 
but that's not the same thing.  Segment merging is completely normal.  Normally 
it's in the background and indexing will continue while it's occurring, but if 
you get too many merges happening at once, that can stop indexing.  I have a 
solution for that:

At the following URL s my indexConfig section, geared towards heavy indexing.  
The TieredMergePolicy settings are the equivalent of a legacy mergeFactor of 
35.  I've gone with a lower-than-default ramBufferSizeMB here, to reduce memory 
usage.  The default value for this setting as of version 4.1 is 100:

http://apaste.info/4gaD

One thing that this configuration does which might directly impact on your 
setup is increase the maxMergeCount.  I believe the default value for this is 
3.  This means that if you get more than three "levels" of merging happening at 
the same time, indexing will stop until until the number of levels drops.  
Because Solr always does the biggest merge first, this can really take a long 
time.  The combination of a large mergeFactor and a larger-than-normal 
maxMergeCount will ensure that this situation never happens.

If you are not using SSD, don't increase maxThreadCount beyond one.  The 
random-access characteristics of regular hard disks will make things go slower 
with more threads, not faster.  With SSD, increasing the threads can make 
things go faster.

There's a few high memory use things going on in your config/schema.

The first thing that jumped out at me is facets.  They use a lot of memory.  
You can greatly reduce the memory use by adding &facet.method=enum to the 
query.  The default for the method is fc, which means fieldcache.  The size of 
the Lucene fieldcache cannot be directly controlled by Solr, unlike Solr's own 
caches.  It gets as big as it needs to be, and facets using the fc method will 
put all the facet data for the entire index in the fieldcache.

The second thing that jumped out at me is the fact that all_text is being 
stored.  Apparently this is for highlighting.  I will admit that I do not know 
anything about highlighting, so you might need separate help there.  You are 
using edismax for your query parser, which is perfectly capable of searchi

statistics about segments

2013-06-11 Thread Yoni Amir
In the Solr UI, there is the number of segments in the core.
Is there information in Solr (in the UI or in xml responses) about their 
histogram? I.e., their size, how many are there in each size level? Are there 
any that are currently being merged, etc.?
Thanks,
Yoni

Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.


RE: out of memory during indexing do to large incoming queue

2013-06-03 Thread Yoni Amir
Solrconfig.xml -> http://apaste.info/dsbv

Schema.xml -> http://apaste.info/67PI

This solrconfig.xml file has optimization enabled. I had another file which I 
can't locate at the moment, in which I defined a custom merge scheduler in 
order to disable optimization.

When I say 1000 segments, I mean that's the number I saw in Solr UI. I assume 
there were much more files than that.

Thanks,
Yoni



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Sunday, June 02, 2013 22:53
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 12:25 PM, Yoni Amir wrote:
> Hi Shawn and Shreejay, thanks for the response.
> Here is some more information:
> 1) The machine is a virtual machine on ESX server. It has 4 CPUs and 
> 8GB of RAM. I don't remember what CPU but something modern enough. It 
> is running Java 7 without any special parameters, and 4GB allocated 
> for Java (-Xmx)
> 2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
> This is the size after it was optimized.
> 3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
> available at the time that we had a release deadline.
> 4) The setup with master-slave replication, not Solr Cloud. The server that I 
> am discussing is the indexing server, and in these tests there were actually 
> no slaves involved, and virtually zero searches performed.
> 5) Attached is my configuration. I tried to disable the warm-up and opening 
> of searchers, it didn't change anything. The commits are done by Solr, using 
> autocommit. The client sends the updates without a commit command.
> 6) I want to disable optimization, but when I disabled it, the OOME occurred 
> even faster. The number of segments reached around a thousand within an hour 
> or so. I don't know if it's normal or not, but at that point if I restarted 
> Solr it immediately took about 1GB of heap space just on start-up, instead of 
> the usual 50MB or so.
> 
> If I commit less frequently, don't I increase the risk of losing data, e.g., 
> if the power goes down, etc.?
> If I disable optimization, is it necessary to avoid such a large number of 
> segments? Is it possible?

Last part first: Losing data is much less of a risk with Solr 4.x, if you have 
enabled the updateLog.

We'll need some more info.  See the end of the message for specifics.

Right off the bat, I can tell you that with an index that's 117GB, you're going 
to need a LOT of RAM.

Each of my 4.2.1 servers has 42GB of index and about 37 million documents 
between all the index shards.  The web application never uses facets, which 
tend to use a lot of memory.  My index is a lot smaller than yours, and I need 
a 6GB heap, seeing OOM errors if it's only 4GB.
You probably need at least an 8GB heap, and possibly larger.

Beyond the amount of memory that Solr itself uses, for good performance you 
will also need a large amount of memory for OS disk caching.  Unless the server 
is using SSD, you need to allocate at least 64GB of real memory to the virtual 
machine.  If you've got your index on SSD, 32GB might be enough.  I've got 64GB 
total on my servers.

http://wiki.apache.org/solr/SolrPerformanceProblems

When you say that there are over 1000 segments, are you seeing 1000 files, or 
are there literally 1000 segments, giving you between 12000 and 15000 files?  
Even if your mergeFactor were higher than the default 10, that just shouldn't 
happen.

Can you share your solrconfig.xml and schema.xml?  Use a paste website like 
http://apaste.info and share the URLs.

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.



RE: out of memory during indexing do to large incoming queue

2013-06-02 Thread Yoni Amir
Hi Shawn and Shreejay, thanks for the response.
Here is some more information:
1) The machine is a virtual machine on ESX server. It has 4 CPUs and 8GB of 
RAM. I don't remember what CPU but something modern enough. It is running Java 
7 without any special parameters, and 4GB allocated for Java (-Xmx)
2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
This is the size after it was optimized.
3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
available at the time that we had a release deadline.
4) The setup with master-slave replication, not Solr Cloud. The server that I 
am discussing is the indexing server, and in these tests there were actually no 
slaves involved, and virtually zero searches performed.
5) Attached is my configuration. I tried to disable the warm-up and opening of 
searchers, it didn't change anything. The commits are done by Solr, using 
autocommit. The client sends the updates without a commit command.
6) I want to disable optimization, but when I disabled it, the OOME occurred 
even faster. The number of segments reached around a thousand within an hour or 
so. I don't know if it's normal or not, but at that point if I restarted Solr 
it immediately took about 1GB of heap space just on start-up, instead of the 
usual 50MB or so.

If I commit less frequently, don't I increase the risk of losing data, e.g., if 
the power goes down, etc.?
If I disable optimization, is it necessary to avoid such a large number of 
segments? Is it possible?

Thanks again,
Yoni



-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Sunday, June 02, 2013 18:05
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 8:16 AM, Yoni Amir wrote:
> Hello,
> I am receiving OutOfMemoryError during indexing, and after investigating the 
> heap dump, I am still missing some information, and I thought this might be a 
> good place for help.
> 
> I am using Solr 4.0 beta, and I have 5 threads that send update requests to 
> Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my 
> goal is to index around 2.5 million documents.
> Solr is configured to do a hard-commit every 10 seconds, so initially I 
> thought that it can only accumulate in memory 10 seconds worth of updates, 
> but that's not the case. I can see in a profiler how it accumulates memory 
> over time, even with 4 to 6 GB of memory. It is also configured to optimize 
> with mergeFactor=10.

4.0-BETA came out several months ago.  Even at the time, support for the alpha 
and beta releases was limited.  Now it has been superseded by 4.0.0, 4.1.0, 
4.2.0, 4.2.1, and 4.3.0, all of which are full releases.
There is a 4.3.1 release currently in the works.  Please upgrade.

Ten seconds is a very short interval for hard commits, even if you have 
openSearcher=false.  Frequent hard commits can cause a whole host of problems.  
It's better to have an interval of several minutes, and I wouldn't go less than 
a minute.  Soft commits can be much more frequent, but if you are frequently 
opening new searchers, you'll probably want to disable cache warming.

On optimization: don't do it unless you absolutely must.  Most of the time, 
optimization is only needed if you delete a lot of documents and you need to 
get them removed from your index.  If you must optimize to get rid of deleted 
documents, do it on a very long interval (once a day, once a week) and pause 
indexing during optimization.

You haven't said anything about your index size, java heap size, total RAM, 
etc.  With those numbers I could offer some guesses about what you need, but 
I'll warn you that they would only be guesses - watching a system with real 
data under load is the only way to get concrete information.  Here are some 
basic guidelines on performance problems and RAM information:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.


	LUCENE_40

	${solr.data.dir:}

	

	

		
		128
		

		
		10

	

	

	
	
		 
			1 
			true 
		
		
			${solr.data.dir:}
		
	

	

		3072

		

		

		

out of memory during indexing do to large incoming queue

2013-06-02 Thread Yoni Amir
Hello,
I am receiving OutOfMemoryError during indexing, and after investigating the 
heap dump, I am still missing some information, and I thought this might be a 
good place for help.

I am using Solr 4.0 beta, and I have 5 threads that send update requests to 
Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my 
goal is to index around 2.5 million documents.
Solr is configured to do a hard-commit every 10 seconds, so initially I thought 
that it can only accumulate in memory 10 seconds worth of updates, but that's 
not the case. I can see in a profiler how it accumulates memory over time, even 
with 4 to 6 GB of memory. It is also configured to optimize with mergeFactor=10.

At first I thought that optimization is a blocking, synchronous operation. It 
is, in the sense that the index can't be updated during optimization. However, 
it is not synchronous, in the sense that the update request coming from my code 
is not blocked - Solr just returns an OK response, even while the index is 
optimizing.
This indicates that Solr has an internal queue of inbound requests, and that 
the OK response just means that it is in the queue. I get confirmation for this 
from a friend who is a Solr expert (or so I hope).

My main question is: how can I put a bound on this internal queue, and make 
update requests synchronous in case the queue is full? Put it another way, I 
need to know if Solr is really ready to receive more requests, so I don't 
overload it and cause OOME.

I performed several tests, with slow and fast disks, and on the really fasts 
disk the problem didn't occur. However, I can't demand such fast disk from all 
the clients, and also even with a fast disk the problem will occur eventually 
when I try to index 10 million documents.
I also tried to perform indexing with optimization disabled, but it didn't help.

Thanks,
Yoni

Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.


problem with hl.mergeContinuous

2012-10-04 Thread Yoni Amir
I am using a configuration roughly as follows (with solr 4 beta):

   true
   true
   4
   true

The fragment/snippet size is 100 by default. I found a strange case as follows:

The word that I search for appears in a field somewhere between the 300th and 
400th characters. Solr, instead of returning a snippet of 100 characters, 
returns 400 characters, from the beginning of the text and up to the word that 
is highlighted and a bit further on the text. This happens even though in the 
first 300 characters there is no hit.

I found out that the length of the snippet (400) is proportional to the number 
of snippets (in this case, 100 times 4).
This is a problem because I want to show the user only around 250~ characters.

Is it a bug? Is it configurable?

Thanks,
Yoni


RE: Solr 4.0alpha: edismax complaints on certain characters

2012-09-06 Thread Yoni Amir
As far as I understand, / is a special character and needs to be escaped.
Maybe "foo\/bar" should work?

I found this when I looked at the code of ClientUtils.escapeQueryChars:

// These characters are part of the query syntax and must be escaped
  if (c == '\\' || c == '+' || c == '-' || c == '!'  || c == '(' || c == 
')' || c == ':'
|| c == '^' || c == '[' || c == ']' || c == '\"' || c == '{' || c == 
'}' || c == '~'
|| c == '*' || c == '?' || c == '|' || c == '&'  || c == ';' || c == '/'
|| Character.isWhitespace(c)) {
sb.append('\\');


-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, September 06, 2012 4:35 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.0alpha: edismax complaints on certain characters

Hello,

I was under the impression that edismax was supposed to be crash proof and just 
ignore bad syntax. But I am either misconfiguring it or hit a weird bug. I 
basically searched for text containing '/' and got this:

{
  'responseHeader'=>{
'status'=>400,
'QTime'=>9,
'params'=>{
  'qf'=>'TitleEN DescEN',
  'indent'=>'true',
  'wt'=>'ruby',
  'q'=>'foo/bar',
  'defType'=>'edismax'}},
  'error'=>{
'msg'=>'org.apache.lucene.queryparser.classic.ParseException:
Cannot parse \'foo/bar \': Lexical error at line 1, column 9.
Encountered:  after : "/bar "',
'code'=>400}}

Is that normal? If it is, is there a known list of characters I need to escape 
or do I just have to catch the exception and tell user to not do this again?

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at once. 
Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


RE: exception in highlighter when using phrase search

2012-09-05 Thread Yoni Amir
I think I found the cause for this. It is partially my fault, because I sent 
solr a field with empty value, but this is also a configuration problem.

https://issues.apache.org/jira/browse/SOLR-3792


-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Tuesday, September 04, 2012 3:53 PM
To: solr-user@lucene.apache.org
Subject: exception in highlighter when using phrase search

I got this problem with solr 4 beta and the highlighting component.

When I search for a phrase, such as "foo bar", everything works ok.
When I add highlighting, I get this exception below.
You can see according to the first log line that I am searching only one field  
(all_text), but what is not visible in the log is that I am highlighting on all 
fields in the document, with hl.requireFieldMatch=false and hl.fl=*.

INFO  (SolrCore.java:1670) - [rcmCore] webapp=/solr path=/select 
params={fq={!edismax}module:"Alerts"+and+bu:"abcd+Region1"&qf=attachment&qf=all_text&version=2&rows=20&wt=javabin&start=0&q="foo
 bar"} hits=103 status=500 QTime=38 ERROR (SolrException.java:104) - 
null:java.lang.NullPointerException
   at 
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:191)
   at 
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:152)
   at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:209)
   at 
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50)
   at 
org.apache.lucene.analysis.miscellaneous.RemoveDuplicatesTokenFilter.incrementToken(RemoveDuplicatesTokenFilter.java:54)
   at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
   at 
org.apache.solr.highlight.TokenOrderingFilter.incrementToken(DefaultSolrHighlighter.java:629)
   at 
org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:78)
   at 
org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:50)
   at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:225)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
   at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
   at java.lang.Thread.run(Thread.java:736)

Any idea?

Thanks,
Yoni


exception in highlighter when using phrase search

2012-09-04 Thread Yoni Amir
I got this problem with solr 4 beta and the highlighting component.

When I search for a phrase, such as "foo bar", everything works ok.
When I add highlighting, I get this exception below.
You can see according to the first log line that I am searching only one field  
(all_text), but what is not visible in the log is that I am highlighting on all 
fields in the document, with hl.requireFieldMatch=false and hl.fl=*.

INFO  (SolrCore.java:1670) - [rcmCore] webapp=/solr path=/select 
params={fq={!edismax}module:"Alerts"+and+bu:"abcd+Region1"&qf=attachment&qf=all_text&version=2&rows=20&wt=javabin&start=0&q="foo
 bar"} hits=103 status=500 QTime=38 
ERROR (SolrException.java:104) - null:java.lang.NullPointerException
   at 
org.apache.lucene.analysis.util.CharacterUtils$Java5CharacterUtils.fill(CharacterUtils.java:191)
   at 
org.apache.lucene.analysis.util.CharTokenizer.incrementToken(CharTokenizer.java:152)
   at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:209)
   at 
org.apache.lucene.analysis.util.FilteringTokenFilter.incrementToken(FilteringTokenFilter.java:50)
   at 
org.apache.lucene.analysis.miscellaneous.RemoveDuplicatesTokenFilter.incrementToken(RemoveDuplicatesTokenFilter.java:54)
   at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:54)
   at 
org.apache.solr.highlight.TokenOrderingFilter.incrementToken(DefaultSolrHighlighter.java:629)
   at 
org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:78)
   at 
org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:50)
   at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:225)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:510)
   at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:401)
   at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:136)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1656)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:454)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454)
   at java.lang.Thread.run(Thread.java:736)

Any idea?

Thanks,
Yoni


RE: solrj api for partial document update

2012-09-02 Thread Yoni Amir
In the solrj api, the value of a SolrInputField can be a map, in which case 
solrj adds an additional attribute to the field's xml element.
For example,
This code:

SolrInputDocument doc = new SolrInputDocument();
Map partialUpdate = new HashMap();
partialUpdate.put("set", "foo");
doc.addField("id", "test_123");
doc.addField("description", partialUpdate);

yields this document:


test_123
foo


In this example I used the value "set" for this additional attribute, but it 
doesn't work. Solr doesn't update the field as I expected.
According to this link: 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/
valid values are "set" and "add".

Any idea?

Thanks,
Yoni


-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Saturday, September 01, 2012 1:48 PM
To: solr-user@lucene.apache.org
Subject: RE: solrj api for partial document update

Any word on this?
I inspected the solrj code an found nothing. It's a shame if the GA version 
comes out without such an api.
Thanks again,
Yoni

-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Thursday, August 30, 2012 8:48 AM
To: solr-user@lucene.apache.org
Subject: solrj api for partial document update

Is there a solrj api for partial document update in solr 4?

It is described here: 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

That article explains how the xml structure should be. I want to use solrj api, 
but I can't figure out if it is supported.

Thanks,
Yoni



RE: solrj api for partial document update

2012-09-01 Thread Yoni Amir
Any word on this?
I inspected the solrj code an found nothing. It's a shame if the GA version 
comes out without such an api.
Thanks again,
Yoni

-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Thursday, August 30, 2012 8:48 AM
To: solr-user@lucene.apache.org
Subject: solrj api for partial document update

Is there a solrj api for partial document update in solr 4?

It is described here: 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

That article explains how the xml structure should be. I want to use solrj api, 
but I can't figure out if it is supported.

Thanks,
Yoni

-Original Message-
From: aniljayanti [mailto:anil.jaya...@gmail.com]
Sent: Thursday, August 30, 2012 7:41 AM
To: solr-user@lucene.apache.org
Subject: Re: AW: AW: auto completion search with solr using NGrams in SOLR

Hi,

thanks,

I added "PatternReplaceFilterFactory" like below.Getting results 
differently(not like suggester). You suggested to remove 
"KeywordTokenizerFactory" , "PatternReplace" is a FilterFactory, then which 
"TokenizerFactory" need to use ? 

  

  
   

  
  
   
   
  
 
  
 
   
  

Result :


- 
- 
  0
  2
  
  
- 
- 
- 
  10
  0
  7
- 
  *michael
  michael
  michael "
  michael j
  michael ja
  michael jac
  michael jack
  michael jacks
  michael jackso
  michael jackson*
  
  
- 
  10
  8
  10
- 
  *ja
  ja
  jag
  jam
  jami
  jamp
  jampp
  jamppa
  jamppa
  jamppa t*
  
  
  michael ja
  
  
  

Please suggest me if anything missing?

Regards,

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4004231.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrj api for partial document update

2012-08-29 Thread Yoni Amir
Is there a solrj api for partial document update in solr 4.

It is described here: 
http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/

That article explains how the xml structure should be. I want to use solrj api, 
but I can't figure out if it is supported.

Thanks,
Yoni

-Original Message-
From: aniljayanti [mailto:anil.jaya...@gmail.com] 
Sent: Thursday, August 30, 2012 7:41 AM
To: solr-user@lucene.apache.org
Subject: Re: AW: AW: auto completion search with solr using NGrams in SOLR

Hi,

thanks,

I added "PatternReplaceFilterFactory" like below.Getting results
differently(not like suggester). You suggested to remove
"KeywordTokenizerFactory" , "PatternReplace" is a FilterFactory, then which
"TokenizerFactory" need to use ? 

  

  
   

  
  
   
   
  
 
  
 
   
  

Result :

 
- 
- 
  0 
  2 
  
   
- 
- 
- 
  10 
  0 
  7 
- 
  *michael 
  michael 
  michael " 
  michael j 
  michael ja 
  michael jac 
  michael jack 
  michael jacks 
  michael jackso 
  michael jackson* 
  
  
- 
  10 
  8 
  10 
- 
  *ja 
  ja 
  jag 
  jam 
  jami 
  jamp 
  jampp 
  jamppa 
  jamppa 
  jamppa t* 
  
  
  michael ja 
  
  
  

Please suggest me if anything missing?

Regards,

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559p4004231.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: hl.fl is ignored in solr 4 beta?

2012-08-27 Thread Yoni Amir
Thanks for your help. Using the regex trick actually worked and this is the 
direction we are taking now, but I think I'll open an enhancement request as 
well. I'll try to see if we can improve that code locally for our product first.
Thanks,
Yoni

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Monday, August 27, 2012 3:19 PM
To: solr-user@lucene.apache.org
Subject: Re: hl.fl is ignored in solr 4 beta?

Highlighting does work on dynamic fields. I verified that there is no bug there.

It is glob that does not work the way you used it that is the issue. Whether 
that is a sloppy doc description that needs to get cleaned up or a bug in the 
code is an open question. I would suggest that the doc needs to be updated and 
maybe then you have an "improvement" request. So, it is really two issues, I 
think. But, feel free to open a bug for "Glob does not work as expected for 
hl.fl field list".

I wouldn't suggest digging into that code, but if you really want to, see the 
"getHighlightFields" method here:
http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/highlight/SolrHighlighter.java?view=markup

-- Jack Krupansky

-Original Message-
From: Yoni Amir
Sent: Monday, August 27, 2012 3:31 AM
To: solr-user@lucene.apache.org
Subject: RE: hl.fl is ignored in solr 4 beta?

Thanks for this information. Can you please send me to the right place in the 
code, I'll check it out.

Regardless, it sounds like a bug to me, highlighting should work on dynamic 
fields too. Should I open a bug for this?

Thanks,
Yoni

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Sunday, August 26, 2012 8:11 PM
To: solr-user@lucene.apache.org
Subject: Re: hl.fl is ignored in solr 4 beta?

I am unable to reproduce this scenario of all fields being highlighted. In 
fact, from looking at the code, I don't see a way that could happen.
Further, the code, both 4.0-BETA and 3.6.1 as well as trunk, does NOT support 
"glob" patterns within a comma/space-delimited list of fields. Only a single 
glob pattern is supported, such as "*" or "*_custom_txt".

Actually, the code has an undocumented feature that almost full regular 
expressions are supported. If your single hl.fl parameter contains at least one 
"*", all the code does is take your hl.fl value and replace all "*" with ".*". 
So, you might be able to do what you want as follows:

identifier|type owner_name|*_custom_txt|...

Where "..." is a list of field names separated by vertical bars. But, realize 
that this is an undocumented feature, at least for now.

Actually, it turns out that the vertical bar can also be used for many 
parameters where space and comma are supported as delimiters. But for hl.fl, 
glob is only supported using the vertical bar as a list delimiter.

-- Jack Krupansky

-Original Message-
From: Yoni Amir
Sent: Sunday, August 26, 2012 5:07 AM
To: solr-user@lucene.apache.org
Subject: RE: hl.fl is ignored in solr 4 beta?

I managed to narrow it down to the presence of dynamic field "*_custom_txt"
in the hl.fl list. If the list contains only regular fields the highlighting 
works fine.
However, I also want to highlight some dynamic fields.

Is this a bug?

Thanks,
Yoni

-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com]
Sent: Sunday, August 26, 2012 11:31 AM
To: solr-user@lucene.apache.org
Subject: hl.fl is ignored in solr 4 beta?

I have a setup for /select handler which, regarding highlighting, looks roughly 
like this:

  edismax all_text ...
id, module, identifier, type, category, ... , score ...
on
identifier type owner_name *_custom_txt ... false
true
true
10
true
-1




As you can see, I search only on one field "all_text". I use 
hl.requireFieldMatch=false in order to highlight other fields. However, I limit 
the number of other fields using hl.fl parameter. However, it is not working. 
Solr returns highlights for a number of other fields which are not in the list 
(one of those is the field all_text itself, but other fields get highlight too).

Is this a known bug?

Thanks,
Yoni 



RE: hl.fl is ignored in solr 4 beta?

2012-08-27 Thread Yoni Amir
Thanks for this information. Can you please send me to the right place in the 
code, I'll check it out.

Regardless, it sounds like a bug to me, highlighting should work on dynamic 
fields too. Should I open a bug for this?

Thanks,
Yoni

-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Sunday, August 26, 2012 8:11 PM
To: solr-user@lucene.apache.org
Subject: Re: hl.fl is ignored in solr 4 beta?

I am unable to reproduce this scenario of all fields being highlighted. In 
fact, from looking at the code, I don't see a way that could happen. 
Further, the code, both 4.0-BETA and 3.6.1 as well as trunk, does NOT support 
"glob" patterns within a comma/space-delimited list of fields. Only a single 
glob pattern is supported, such as "*" or "*_custom_txt".

Actually, the code has an undocumented feature that almost full regular 
expressions are supported. If your single hl.fl parameter contains at least one 
"*", all the code does is take your hl.fl value and replace all "*" with ".*". 
So, you might be able to do what you want as follows:

identifier|type owner_name|*_custom_txt|...

Where "..." is a list of field names separated by vertical bars. But, realize 
that this is an undocumented feature, at least for now.

Actually, it turns out that the vertical bar can also be used for many 
parameters where space and comma are supported as delimiters. But for hl.fl, 
glob is only supported using the vertical bar as a list delimiter.

-- Jack Krupansky

-Original Message-
From: Yoni Amir
Sent: Sunday, August 26, 2012 5:07 AM
To: solr-user@lucene.apache.org
Subject: RE: hl.fl is ignored in solr 4 beta?

I managed to narrow it down to the presence of dynamic field "*_custom_txt" 
in the hl.fl list. If the list contains only regular fields the highlighting 
works fine.
However, I also want to highlight some dynamic fields.

Is this a bug?

Thanks,
Yoni

-Original Message-
From: Yoni Amir [mailto:yoni.a...@actimize.com]
Sent: Sunday, August 26, 2012 11:31 AM
To: solr-user@lucene.apache.org
Subject: hl.fl is ignored in solr 4 beta?

I have a setup for /select handler which, regarding highlighting, looks roughly 
like this:

  edismax all_text ...
id, module, identifier, type, category, ... , score ...
on
identifier type owner_name *_custom_txt ... false
true
true
10
true
-1




As you can see, I search only on one field "all_text". I use 
hl.requireFieldMatch=false in order to highlight other fields. However, I limit 
the number of other fields using hl.fl parameter. However, it is not working. 
Solr returns highlights for a number of other fields which are not in the list 
(one of those is the field all_text itself, but other fields get highlight too).

Is this a known bug?

Thanks,
Yoni 



RE: hl.fl is ignored in solr 4 beta?

2012-08-26 Thread Yoni Amir
I managed to narrow it down to the presence of dynamic field "*_custom_txt" in 
the hl.fl list. If the list contains only regular fields the highlighting works 
fine.
However, I also want to highlight some dynamic fields.

Is this a bug?

Thanks,
Yoni

-Original Message-----
From: Yoni Amir [mailto:yoni.a...@actimize.com] 
Sent: Sunday, August 26, 2012 11:31 AM
To: solr-user@lucene.apache.org
Subject: hl.fl is ignored in solr 4 beta?

I have a setup for /select handler which, regarding highlighting, looks roughly 
like this:



edismax
all_text
...
id, module, identifier, type, category, 
... , score
...
on
identifier type owner_name 
*_custom_txt ...
false
true
true
10
true
-1




As you can see, I search only on one field "all_text". I use 
hl.requireFieldMatch=false in order to highlight other fields. However, I limit 
the number of other fields using hl.fl parameter. However, it is not working. 
Solr returns highlights for a number of other fields which are not in the list 
(one of those is the field all_text itself, but other fields get highlight too).

Is this a known bug?

Thanks,
Yoni


hl.fl is ignored in solr 4 beta?

2012-08-26 Thread Yoni Amir
I have a setup for /select handler which, regarding highlighting, looks roughly 
like this:



edismax
all_text
...
id, module, identifier, type, category, 
... , score
...
on
identifier type owner_name 
*_custom_txt ...
false
true
true
10
true
-1




As you can see, I search only on one field "all_text". I use 
hl.requireFieldMatch=false in order to highlight other fields. However, I limit 
the number of other fields using hl.fl parameter. However, it is not working. 
Solr returns highlights for a number of other fields which are not in the list 
(one of those is the field all_text itself, but other fields get highlight too).

Is this a known bug?

Thanks,
Yoni


highlighting tint fields

2012-08-05 Thread Yoni Amir
Hello,
(sorry for the empty message earlier, that was by mistake)

I am experiencing a strange problem with highlighting. I have a simple 
configuration roughly as follows:

   edismax
   all_text
   ...
   on
   *
   false

I run the search on a single catch-all field called "all_text", and I want the 
highlighting to work on other fields in the document.
All the other fields that are meant to be highlighted are indexed and stored, 
and they are copied to all_text with the  directive.

This works fine for text fields, however if the field is of type tint or tdate 
Solr doesn't return any highlight information for them.
E.g., I have this field:



and:



But no highlighting for it.

Thanks,
Yoni