Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Solrj.


On Tue, Feb 19, 2013 at 9:08 PM, Erick Erickson wrote:

> Well, your commits may have to wait until any merges are done, which _may_
> be merging your entire index into a single segment. Possibly this could
> take more than 60 seconds.
>
> _How_ are you doing this? DIH? SolrJ? post.jar?
>
> Best
> Erick
>
>
> On Tue, Feb 19, 2013 at 8:00 PM, Siping Liu  wrote:
>
> > Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get
> plenty
> > memory.
> >
> >
> >
> > On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
> > wrote:
> >
> > > Which version of Solr?
> > > Are you sure you did not run out of memory half way through import?
> > >
> > > Regards,
> > >Alex.
> > >
> > > Personal blog: http://blog.outerthoughts.com/
> > > LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> > > - Time is the quality of nature that keeps events from happening all at
> > > once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> > >
> > >
> > > On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu 
> wrote:
> > >
> > > > Hi,
> > > > we have an index with 2mil documents in it. From time to time we
> > rewrite
> > > > about 1/10 of the documents (just under 200k). No autocommit. At the
> > end
> > > we
> > > > a single commit and got time out after 60 sec. My questions are:
> > > > 1. is it normal to have the commit of this size takes more than
> 1min? I
> > > > know it's probably depend on the server ...
> > > > 2. I know there're a few parameters I can set in
> CommonsHttpSolrServer
> > > > class: setConnectionManagerTimeout(), setConnectionTimeout(),
> > > > setSoTimeout(). Which should I use?
> > > >
> > > > TIA
> > > >
> > >
> >
>


Re: HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Thanks for the quick response. It's Solr 3.4. I'm pretty sure we get plenty
memory.



On Tue, Feb 19, 2013 at 7:50 PM, Alexandre Rafalovitch
wrote:

> Which version of Solr?
> Are you sure you did not run out of memory half way through import?
>
> Regards,
>Alex.
>
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Tue, Feb 19, 2013 at 7:44 PM, Siping Liu  wrote:
>
> > Hi,
> > we have an index with 2mil documents in it. From time to time we rewrite
> > about 1/10 of the documents (just under 200k). No autocommit. At the end
> we
> > a single commit and got time out after 60 sec. My questions are:
> > 1. is it normal to have the commit of this size takes more than 1min? I
> > know it's probably depend on the server ...
> > 2. I know there're a few parameters I can set in CommonsHttpSolrServer
> > class: setConnectionManagerTimeout(), setConnectionTimeout(),
> > setSoTimeout(). Which should I use?
> >
> > TIA
> >
>


HELP: CommonsHttpSolrServer.commit() time out after 1min

2013-02-19 Thread Siping Liu
Hi,
we have an index with 2mil documents in it. From time to time we rewrite
about 1/10 of the documents (just under 200k). No autocommit. At the end we
a single commit and got time out after 60 sec. My questions are:
1. is it normal to have the commit of this size takes more than 1min? I
know it's probably depend on the server ...
2. I know there're a few parameters I can set in CommonsHttpSolrServer
class: setConnectionManagerTimeout(), setConnectionTimeout(),
setSoTimeout(). Which should I use?

TIA


Re: custom sorter

2012-07-22 Thread Siping Liu
Hi -- thanks for the response. It's the right direction. However on closer
look I don't think I can use it directly. The reason is that in my case,
the query string is always "*:*", we use filter query to get different
results. When fq=(field1:"xyz") we want to boost one document and let sort=
to take care of the rest results, and when field1 has other value, sort=
takes care of all results.

Maybe I can define my own SearchComponent class, and specify it in

  my_search_component

I have to try and see if that'd work.

thanks.


On Fri, Jul 20, 2012 at 3:24 AM, Lee Carroll
wrote:

> take a look at
> http://wiki.apache.org/solr/QueryElevationComponent
>
> On 20 July 2012 03:48, Siping Liu  wrote:
>
> > Hi,
> > I have requirements to place a document to a pre-determined  position for
> > special filter query values, for instance when filter query is
> > fq=(field1:"xyz") place document abc as first result (the rest of the
> > result set will be ordered by sort=field2). I guess I have to plug in my
> > Java code as a custom sorter. I'd appreciate it if someone can shed light
> > on this (how to add custom sorter, etc.)
> > TIA.
> >
>


custom sorter

2012-07-19 Thread Siping Liu
Hi,
I have requirements to place a document to a pre-determined  position for
special filter query values, for instance when filter query is
fq=(field1:"xyz") place document abc as first result (the rest of the
result set will be ordered by sort=field2). I guess I have to plug in my
Java code as a custom sorter. I'd appreciate it if someone can shed light
on this (how to add custom sorter, etc.)
TIA.


match to non tokenizable word ("helloworld")

2010-05-16 Thread siping liu

I get no match when searching for "helloworld", even though I have "hello 
world" in my index. How do people usually deal with this? Write a custom 
analyzer, with help from a collection of all dictionary words?

 

thanks for suggestions/comments.
  
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

weird problem with solr.DateField

2009-11-11 Thread siping liu

Hi,

I'm using Solr 1.4 (from nightly build about 2 months ago) and have this 
defined in solrconfig:





 

and following code that get executed once every night:

CommonsHttpSolrServer solrServer = new CommonsHttpSolrServer("http://...";);
solrServer.setRequestWriter(new BinaryRequestWriter());

solrServer.add(documents);
solrServer.commit();

UpdateResponse deleteResult = solrServer.deleteByQuery("lastUpdate:[* TO 
NOW-2HOUR]");
solrServer.commit();

 

The purpose is to refresh index with latest data (in "documents").

This works fine, except that after a few days I start to see a few documents 
with no "lastUpdate" field (query "-lastUpdate:[* TO *]") -- how can that be 
possible?

 

thanks in advance.

 
  
_
Windows 7: Unclutter your desktop.
http://go.microsoft.com/?linkid=9690331&ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009

RE: Solr and Garbage Collection

2009-10-02 Thread siping liu

Hi,

I read pretty much all posts on this thread (before and after this one). Looks 
like the main suggestion from you and others is to keep max heap size (-Xmx) as 
small as possible (as long as you don't see OOM exception). This brings more 
questions than answers (for me at least. I'm new to Solr).

 

First, our environment and problem encountered: Solr1.4 (nightly build, 
downloaded about 2 months ago), Sun JDK1.6, Tomcat 5.5, running on 
Solaris(multi-cpu/cores). The cache setting is from the default solrconfig.xml 
(looks very small). At first we used minimum JAVA_OPTS and quickly run into the 
problem similar to the one orignal poster reported -- long pause (seconds to 
minutes) under load test. jconsole showed that it pauses on GC. So more 
JAVA_OPTS get added: "-XX:+UseConcMarkSweepGC -XX:+UseParNewGC 
-XX:ParallelGCThreads=8 -XX:SurvivorRatio=2 -XX:NewSize=128m 
-XX:MaxNewSize=512m -XX:MaxGCPauseMillis=200", the thinking is with 
mutile-cpu/cores we can get over with GC as quickly as possibe. With the new 
setup, it works fine until Tomcat reaches heap size, then it blocks and takes 
minutes on "full GC" to get more space from "tenure generation". We tried 
different Xmx (from very small to large), no difference in long GC time. We 
never run into OOM.

 

Questions:

* In general various cachings are good for performance, we have more RAM to use 
and want to use more caching to boost performance, isn't your suggestion (of 
lowering heap limit) going against that?

* Looks like Solr caching made its way into tenure-generation on heap, that's 
good. But why they get GC'ed eventually?? I did a quick check of Solr code 
(Solr 1.3, not 1.4), and see a single instance of using WeakReference. Is that 
what is causing all this? This seems to suggest a design flaw in Solr's memory 
management strategy (or just my ignorance about Solr?). I mean, wouldn't this 
be the "right" way of doing it -- you allow user to specify the cache size in 
solrconfig.xml, then user can set up heap limit in JAVA_OPTS accordingly, and 
no need to use WeakReference (BTW, why not SoftReference)??

* Right now I have a single Tomcat hosting Solr and other applications. I guess 
now it's better to have Solr on its own Tomcat, given that it's tricky to 
adjust the java options.

 

thanks.


 
> From: wun...@wunderwood.org
> To: solr-user@lucene.apache.org
> Subject: RE: Solr and Garbage Collection
> Date: Fri, 25 Sep 2009 09:51:29 -0700
> 
> 30ms is not better or worse than 1s until you look at the service
> requirements. For many applications, it is worth dedicating 10% of your
> processing time to GC if that makes the worst-case pause short.
> 
> On the other hand, my experience with the IBM JVM was that the maximum query
> rate was 2-3X better with the concurrent generational GC compared to any of
> their other GC algorithms, so we got the best throughput along with the
> shortest pauses.
> 
> Solr garbage generation (for queries) seems to have two major components:
> per-request garbage and cache evictions. With a generational collector,
> these two are handled by separate parts of the collector. Per-request
> garbage should completely fit in the short-term heap (nursery), so that it
> can be collected rapidly and returned to use for further requests. If the
> nursery is too small, the per-request allocations will be made in tenured
> space and sit there until the next major GC. Cache evictions are almost
> always in long-term storage (tenured space) because an LRU algorithm
> guarantees that the garbage will be old.
> 
> Check the growth rate of tenured space (under constant load, of course)
> while increasing the size of the nursery. That rate should drop when the
> nursery gets big enough, then not drop much further as it is increased more.
> 
> After that, reduce the size of tenured space until major GCs start happening
> "too often" (a judgment call). A bigger tenured space means longer major GCs
> and thus longer pauses, so you don't want it oversized by too much.
> 
> Also check the hit rates of your caches. If the hit rate is low, say 20% or
> less, make that cache much bigger or set it to zero. Either one will reduce
> the number of cache evictions. If you have an HTTP cache in front of Solr,
> zero may be the right choice, since the HTTP cache is cherry-picking the
> easily cacheable requests.
> 
> Note that a commit nearly doubles the memory required, because you have two
> live Searcher objects with all their caches. Make sure you have headroom for
> a commit.
> 
> If you want to test the tenured space usage, you must test with real world
> queries. Those are the only way to get accurate cache eviction rates.
> 
> wunder
  
_
Bing™  brings you maps, menus, and reviews organized in one place.   Try it now.
http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1

anyway to get Document update time stamp

2009-09-17 Thread siping liu

I understand there's no "update" in Solr/lucene, it's really delete+insert. Is 
there anyway to get a Document's insert time stamp, w/o explicitely creating 
such a data field in the document? If so, how can I query it, for instance "get 
all documents that are older than 24 hours"? Thanks.
_
Hotmail: Free, trusted and rich email service.
http://clk.atdmt.com/GBL/go/171222984/direct/01/

DisMaxRequestHandler usage

2009-06-16 Thread siping liu

Hi,

I have this standard query:

q=(field1:hello OR field2:hello) AND (field3:world)

 

Can I use dismax handler for this (applying the same search term on field1 and 
field2, but keep field3 with something separate)? If it can be done, what's the 
advantage of doing it this way over using the standard query?

 

thanks.

_
Microsoft brings you a new way to search the web.  Try  Bing™ now
http://www.bing.com?form=MFEHPG&publ=WLHMTAG&crea=TEXT_MFEHPG_Core_tagline_try 
bing_1x1

Query faceting

2009-06-08 Thread siping liu

Hi,

I have a field called "service" with following values:

- Shuttle Services
- Senior Discounts
- Laundry Rooms

- ...

 

When I conduct query with "facet=true&facet.field=service&facet.limit=-1", I 
get something like this back:

- shuttle 2

- service 3

- senior 0

- laundry 0

- room 3

- ...

 

Questions:

- How not to break up fields values in words, so I can get something like 
"Shuttle Services 2" back?

- How to tell Solr not to return facet with 0 value? The query takes long time 
to finish, seemingly because of the long list of items with 0 count.

 

thanks for any advice.

_
Insert movie times and more without leaving Hotmail®. 
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009

RE: Creating a distributed search in a searchComponent

2009-05-21 Thread siping liu

I was looking for answer to the same question, and have similar concern. Looks 
like any serious customization work requires developing custom SearchComponent, 
but it's not clear to me how Solr designer wanted this to be done. I have more 
confident to either do it at Lucene level, or stay on client side and using 
something like Multi-core (as discussed here 
http://wiki.apache.org/solr/MultipleIndexes).


 
> Date: Wed, 20 May 2009 13:47:20 -0400
> Subject: RE: Creating a distributed search in a searchComponent
> From: nicholas.bai...@rackspace.com
> To: solr-user@lucene.apache.org
> 
> It seems I sent this out a bit too soon. After looking at the source it seems 
> there are two seperate paths for distributed and regular queries, however the 
> prepare method for for all components is run before the shards parameter is 
> checked. So I can build the shards portion by using the prepare method of the 
> my own search component. 
> 
> However I'm not sure if this is the greatest idea in case solr changes at 
> some point.
> 
> -Nick
> 
> -Original Message-
> From: "Nick Bailey" 
> Sent: Wednesday, May 20, 2009 1:29pm
> To: solr-user@lucene.apache.org
> Subject: Creating a distributed search in a searchComponent
> 
> Hi,
> 
> I am wondering if it is possible to basically add the distributed portion of 
> a search query inside of a searchComponent.
> 
> I am hoping to build my own component and add it as a first-component to the 
> StandardRequestHandler. Then hopefully I will be able to use this component 
> to build the "shards" parameter of the query and have the Handler then treat 
> the query as a distributed search. Anyone have any experience or know if this 
> is possible?
> 
> Thanks,
> Nick
> 
> 
> 

_
Hotmail® has ever-growing storage! Don’t worry about storage limits.
http://windowslive.com/Tutorial/Hotmail/Storage?ocid=TXT_TAGLM_WL_HM_Tutorial_Storage1_052009

adding plug-in after search is done

2009-04-27 Thread siping liu

trying to manipulate search result (like further filtering out unwanted), and 
ordering the results differently. Where is the suitable place for doing it? 
I've been using QueryResponseWriter but that doesn't seem to be the right place.

thanks.

_
Rediscover Hotmail®: Get quick friend updates right in your inbox. 
http://windowslive.com/RediscoverHotmail?ocid=TXT_TAGLM_WL_HM_Rediscover_Updates2_042009