Re: Multi CPU Cores

2011-10-16 Thread Mikhail Khludnev
when I'm puzzled by jvm's cpu consumption I use the following combo:

$ top -H -p 

gives you hottest threads PID, then convert them onto hex and find the
thread in the output of
$jstack 
as "nid=0x"


Regards

On Sun, Oct 16, 2011 at 4:25 PM, Rob Brown  wrote:

> Thanks, Java is completely new to me (Perl/C background), so a little
> guidance would be great with config options like this, while I get to
> grips with Java...
>
> Or pointing to a useful resource to start filling in these gaps too.
>
>
>
> -Original Message-
> From: Johannes Goll 
> Reply-to: solr-user@lucene.apache.org
> To: solr-user@lucene.apache.org 
> Subject: Re: Multi CPU Cores
> Date: Sun, 16 Oct 2011 08:18:47 -0400
>
> Try using -useParallelGc as vm option.
>
> Johannes
>
> On Oct 16, 2011, at 7:51 AM, Ken Krugler 
> wrote:
>
> >
> > On Oct 16, 2011, at 1:44pm, Rob Brown wrote:
> >
> >> Looks like I checked the load during a quiet period, ab -n 1 -c 1000
> >> saw a decent 40% load on each core.
> >>
> >> Still a little confused as to why 1 core stays at 100% constantly - even
> >> during the quiet periods?
> >
> > Could be background GC, depending on what you've got your JVM configured
> to use.
> >
> > Though that shouldn't stay at 100% for very long.
> >
> > -- Ken
> >
> >
> >> -Original Message-
> >> From: Johannes Goll 
> >> Reply-to: solr-user@lucene.apache.org
> >> To: solr-user@lucene.apache.org 
> >> Subject: Re: Multi CPU Cores
> >> Date: Sat, 15 Oct 2011 21:30:11 -0400
> >>
> >> Did you try to submit multiple search requests in parallel? The apache
> ab tool is great tool to simulate simultaneous load using (-n and -c).
> >> Johannes
> >>
> >> On Oct 15, 2011, at 7:32 PM, Rob Brown  wrote:
> >>
> >>> Hi,
> >>>
> >>> I'm running Solr on a machine with 16 CPU cores, yet watching "top"
> >>> shows that java is only apparently using 1 and maxing it out.
> >>>
> >>> Is there anything that can be done to take advantage of more CPU cores?
> >>>
> >>> Solr 3.4 under Tomcat
> >>>
> >>> [root@solr01 ~]# java -version
> >>> java version "1.6.0_20"
> >>> OpenJDK Runtime Environment (IcedTea6 1.9.8)
> >>> (rhel-1.22.1.9.8.el5_6-x86_64)
> >>> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
> >>>
> >>>
> >>> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24,
> >>> 1.08
> >>> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
> >>> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> >>> 0.0%st
> >>> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k
> >>> buffers
> >>> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
> >>>
> >>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> >>> COMMAND
> >>> 4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38
> >>> java
> >>> 6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71
> >>> openvpn
> >>> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08
> >>> top
> >>>  1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69
> >>> init
> >>>
> >>>
> >>>
> >>
> >
> > --
> > Ken Krugler
> > +1 530-210-6378
> > http://bixolabs.com
> > custom big data solutions & training
> > Hadoop, Cascading, Mahout & Solr
> >
> >
> >
>
>


-- 
Sincerely yours
Mikhail Khludnev
Developer
Grid Dynamics
tel. 1-415-738-8644
Skype: mkhludnev

 


Question about near query order

2011-10-16 Thread Jason, Kim
Hi, all

I have some near query like "analyze term"~2.
That is matched in that order.
But I want to search regardless of order.
So far, I just queried "analyze term"~2 OR "term analyze"~2.
Is there a better way than what i did?

Thanks in advance.
Jason.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Question-about-near-query-order-tp3427312p3427312.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple search analyzers on the same field type possible?

2011-10-16 Thread Victor van der Wolf
I don't think this will be a problem. I'll contact you tomorrow directly by
email for some details.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-search-analyzers-on-the-same-field-type-possible-tp3417898p3426678.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Field Collapsing and Record Filtering

2011-10-16 Thread Michael Sokolov

On 10/13/2011 5:04 PM, lee carroll wrote:

current: bool //for fq which searches only current versions
last_current_at: date time // for date range queries or group sorting
what was current for a given date

sorry if i've missed a requirement

lee c

Lee the idea of "last_current_at" is interesting; could you expand on 
what you mean by "group sorting" though?  Would that provide a means to 
get only the most recent version?  Say I have access to versions 1,3,4 
of some document and the current version is 5.  I'd like to get version 
4 as the result.  Would you use field collapsing/grouping for that? 
Something else?


-Mike


Re: In-document highlighting DocValues?

2011-10-16 Thread Michael Sokolov

On 10/14/2011 7:20 PM, Jan Høydahl wrote:

Hi,

The Highlighter is way too slow for this customer's particular use case - which 
is veery large documents. We don't need highlighted snippets for now, but we 
need to accurately decide what words (offsets) in the real HTML display of the 
resulting page to highlight. For this we only need offset info, not the 
snippets/fragments from the stored field.

But I have not looked at the Highlighter code. Perhaps we could fork it into a 
new search component which pulls out only the necessary meta info and payloads 
for us and returns it to client?

Jan I've looked into this, and I believe the slowness of Highlighter 
doesn't have to do with constructing the snippets as much as with the 
analysis that is required to find the locations of matching terms in the 
document text, so I think your problem is basically the same as 
highlighting.


There seem to be basically two approaches right now: one is Highlighter, 
which is a you point out is a bit slow because it has to basically 
re-analyze the entire document, but this does have the virtue of an 
exact match to the semantics of the original query.  
FastVectorHighlighter works by doing some cheap mimicry of the original 
query, extracting terms from the query (and also intersecting with the 
document too, if you have MultiTermQuery), and finding the offsets of 
those terms (which have to be stored in the index).  It is smart enough 
to respect phrase boundaries, but does not support every kind of Query; 
however it might be good enough, and is quite a bit faster than 
Highlighter (5-10x I think?).


The work in LUCENE-2878 is the only thing I know of that could represent 
an improvement.  I did some tests there including storing character 
offsets as payloads and got some additional speedup (maybe another 2x?) 
beyond FVH.  There doesn't seem to be a lot of energy into pushing that 
ahead right now though, and it requires some fundamental changes to the 
way that searching is done.


-Mike


Re: Solr Open File Descriptors

2011-10-16 Thread Shawn Heisey

On 10/16/2011 12:01 PM, samarth s wrote:

Hi,

Is it safe to assume that with a megeFactor of 10 the open file descriptors
required by solr would be around (1+ 10) * 10 = 110
ref: *http://onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed*
Solr wiki:
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerationsstates
that FD's required per segment is around 7.

Are these estimates appropriate. Does it in anyway depend on the size of the
index&  number of docs (assuming same number of segments in any case) as
well?


My index has 10 files per normal  segment (the usual 7 plus three more 
for termvectors).  Some of the segments also have a ".del" file, and 
there is a segments_* file and a segments.gen file.  Your servlet 
container and other parts of the OS will also have to open files.


I have personally seen three levels of segment merging taking place at 
the same time on a slow filesystem during a full-import, along with new 
content coming in at the same time.  With a mergefactor of 10, each 
merge is 11 segments - the ten that are being merged and the merged 
segment.  If you have three going on at the same time, that's 33 
segments, and you can have up to 10 more that are actively being built 
by ongoing index activity, so that's 43 potential segments.  If your 
filesystem is REALLY slow, you might end up with even more segments as 
existing merges are paused for new ones to start, but if you run into 
that, you'll want to udpate your hardware, so I won't consider it.


Multiplying 43 segments by 11 files per segment yields a working 
theoretical maximum of 473 files.  Add in the segments files, you're up 
to 475.


Most operating systems have a default FD limit that's at least 1024.  If 
you only have one index (core) on your Solr server, Solr is the only 
thing running on that server, and it's using the default mergeFactor of 
10, you should be fine with the default.  If you are going to have more 
than one index on your Solr server (such as a build core and a live 
core), you plan to run other things on the server, or you want to 
increase your mergeFactor significantly, you might need to adjust the OS 
configuration to allow more file descriptors.


Thanks,
Shawn



Re: Callback on starting solr?

2011-10-16 Thread Jan Høydahl
Your app-server will start listening to the port some time before the Solr 
webapp is ready, so you should check directly with Solr. You could also use JMX 
to check Solr's status. If you want help with your reindex failing issue, 
please provide more context. 25Mb is very low, please try give your VM more 
memory and see if indexing succeeds then.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 16. okt. 2011, at 20:38, Jithin wrote:

> I am doing something similar to that. checking netstat for any connection on
> port. Wanted to know if there is anything solr can do built in.
> 
> Also I notice that my reindex is failing when I have to reindex some 7k+
> docs. Solr is giving error in logs -
> 
> 
> Caused by: java.net.SocketException: Broken pipe
>at java.net.SocketOutputStream.socketWrite0(Native Method)
>at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
>at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
>at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
>at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
>at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
>at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
>... 25 more
> 
> 2011-10-16 18:05:05.431:WARN::Committed before 500
> null||org.mortbay.jetty.EofException|?at
> org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at
> org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
> org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at
> sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)|?at
> sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)|?at
> java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)|?at
> org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)|?at
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)|?at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)|?at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)|?at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)|?at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)|?at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)|?at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)|?at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)|?at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)|?at
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)|?at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)|?at
> org.mortbay.jetty.Server.handle(Server.java:326)|?at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)|?at
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)|?at
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)|?at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)|?at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)|?at
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)|?at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)|Caused
> by: java.net.SocketException: Broken pipe|?at
> java.net.SocketOutputStream.socketWrite0(Native Method)|?at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)|?at
> java.net.SocketOutputStream.write(SocketOutputStream.java:153)|?at
> org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)|?at
> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)|?at
> org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)|?at
> org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)|?... 25 more|
> 2011-10-16 18:05:05.432:WARN::/solr/core0/update/
> java.lang.IllegalStateException: Committed
> 
> 
> Is it a case where solr is not able to handle load? Currently solr is
> running with a max memory setting of 25MB. All the docs are very small. Each
> one contains just a few words.
> 
> On Sun, Oct 16, 2011 at 11:52 PM, Jan Høydahl / Cominvent [via Lucene] <
> ml-node+s472066n3426389...@n3.nabble.com> wrote:
> 
>> Hi,
>> 
>> This depends on your application server and config. A very simple option is
>> to let your client poll with a ping request
>> http://localhost:8983/solr/admin/ping/ until it succeeds.
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> Solr Training - www.solrtraining.com
>> 
>> On 16. okt. 2011, at 19:47, Jithin wrote:
>> 
>>> Hi,
>>> Is is possible to have a callback after solr starts listening on the
>>> configured port. What I have found is there is a certain delay by t

Re: Callback on starting solr?

2011-10-16 Thread Jithin
I am doing something similar to that. checking netstat for any connection on
port. Wanted to know if there is anything solr can do built in.

Also I notice that my reindex is failing when I have to reindex some 7k+
docs. Solr is giving error in logs -


Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)
at org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)
... 25 more

2011-10-16 18:05:05.431:WARN::Committed before 500
null||org.mortbay.jetty.EofException|?at
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:791)|?at
org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)|?at
org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)|?at
sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:296)|?at
sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:140)|?at
java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)|?at
org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)|?at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:344)|?at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265)|?at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)|?at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)|?at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)|?at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)|?at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)|?at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)|?at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)|?at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)|?at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)|?at
org.mortbay.jetty.Server.handle(Server.java:326)|?at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)|?at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)|?at
org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)|?at
org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)|?at
org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)|?at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)|?at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)|Caused
by: java.net.SocketException: Broken pipe|?at
java.net.SocketOutputStream.socketWrite0(Native Method)|?at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)|?at
java.net.SocketOutputStream.write(SocketOutputStream.java:153)|?at
org.mortbay.io.ByteArrayBuffer.writeTo(ByteArrayBuffer.java:368)|?at
org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:129)|?at
org.mortbay.io.bio.StreamEndPoint.flush(StreamEndPoint.java:161)|?at
org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:714)|?... 25 more|
2011-10-16 18:05:05.432:WARN::/solr/core0/update/
java.lang.IllegalStateException: Committed


Is it a case where solr is not able to handle load? Currently solr is
running with a max memory setting of 25MB. All the docs are very small. Each
one contains just a few words.

On Sun, Oct 16, 2011 at 11:52 PM, Jan Høydahl / Cominvent [via Lucene] <
ml-node+s472066n3426389...@n3.nabble.com> wrote:

> Hi,
>
> This depends on your application server and config. A very simple option is
> to let your client poll with a ping request
> http://localhost:8983/solr/admin/ping/ until it succeeds.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> On 16. okt. 2011, at 19:47, Jithin wrote:
>
> > Hi,
> > Is is possible to have a callback after solr starts listening on the
> > configured port. What I have found is there is a certain delay by the
> time
> > solr starts listening on the port after restarting solr is done.
> > So if I try to reindex solr it fails during this period. What I want is a
>
> > notification mechanism after solr starts listening on the port.
> > Is is doable?
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Callback-on-starting-solr-tp3426349p3426349.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://lucene.472066.n3.nabble.com/Callback-on-starting-solr-tp3426349p342638

Re: Callback on starting solr?

2011-10-16 Thread Jan Høydahl
Hi,

This depends on your application server and config. A very simple option is to 
let your client poll with a ping request http://localhost:8983/solr/admin/ping/ 
until it succeeds.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 16. okt. 2011, at 19:47, Jithin wrote:

> Hi,
> Is is possible to have a callback after solr starts listening on the
> configured port. What I have found is there is a certain delay by the time
> solr starts listening on the port after restarting solr is done.
> So if I try to reindex solr it fails during this period. What I want is a
> notification mechanism after solr starts listening on the port.
> Is is doable?
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Callback-on-starting-solr-tp3426349p3426349.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Solr Open File Descriptors

2011-10-16 Thread samarth s
Hi,

Is it safe to assume that with a megeFactor of 10 the open file descriptors
required by solr would be around (1+ 10) * 10 = 110
ref: *http://onjava.com/pub/a/onjava/2003/03/05/lucene.html#indexing_speed*
Solr wiki:
http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerationsstates
that FD's required per segment is around 7.

Are these estimates appropriate. Does it in anyway depend on the size of the
index & number of docs (assuming same number of segments in any case) as
well?


-- 
Regards,
Samarth


Callback on starting solr?

2011-10-16 Thread Jithin
Hi,
Is is possible to have a callback after solr starts listening on the
configured port. What I have found is there is a certain delay by the time
solr starts listening on the port after restarting solr is done.
So if I try to reindex solr it fails during this period. What I want is a
notification mechanism after solr starts listening on the port.
Is is doable?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Callback-on-starting-solr-tp3426349p3426349.html
Sent from the Solr - User mailing list archive at Nabble.com.


help with phrase query

2011-10-16 Thread Vijay Ramachandran
Hello. I have an application where I try to match longer queries (sentences)
to short documents (search phrases). Typically, the documents are 3-5 terms
in length. I am facing a problem where phrase match in the indicated phrase
fields via "pf" doesn't seem to match in most cases, and I am stumped.
Please help!

For instance, when my query is "should I buy a house now while the rates are
low. We filed BR 2 yrs ago. Rent now, w/ some sch loan debt"

I expect the document "buy a house" to match much higher than "house
loan rates".
However, the latter is the document which always matches higher.


I tried to do this the following way (solr 3.1):
1. Score phrase matches high
2. Score single word matches lower
3. Use dismax with a "mm" of 1, and very high boost for exact phrase match.

I used the s "text" definition in the schema for the single words, and the
following for the phrase:


  





  
  






  


and my schema fields look like this:

   

   
   

This is my search handler config:

  

 edismax
 explicit
 0.1
 
   kpid,advid,campaign,keywords
 
 1
 
   kw_stopped^1.0
 
 
   kw_phrases^50.0
 
 3
 3
 *:*
 
 keywords
 
 0
 
 title
 regex 

  

These are the match score debugQuery explanations:

8.480054E-4 = (MATCH) sum of:
  8.480054E-4 = (MATCH) product of:
0.0031093531 = (MATCH) sum of:
  0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of:
2.8209004E-4 = queryWeight(kw_stopped:hous), product of:
  5.514656 = idf(docFreq=25, maxDocs=2375)
  5.1152787E-5 = queryNorm
5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of:
  1.0 = tf(termFreq(kw_stopped:hous)=1)
  5.514656 = idf(docFreq=25, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
  8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of:
2.0471694E-4 = queryWeight(kw_stopped:rate), product of:
  4.002068 = idf(docFreq=117, maxDocs=2375)
  5.1152787E-5 = queryNorm
4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of:
  1.0 = tf(termFreq(kw_stopped:rate)=1)
  4.002068 = idf(docFreq=117, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
  7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of:
1.9382538E-4 = queryWeight(kw_stopped:loan), product of:
  3.7891462 = idf(docFreq=145, maxDocs=2375)
  5.1152787E-5 = queryNorm
3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product
of:
  1.0 = tf(termFreq(kw_stopped:loan)=1)
  3.7891462 = idf(docFreq=145, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
0.27272728 = coord(3/11)

for "house loan rates" vs

8.480054E-4 = (MATCH) sum of:
  8.480054E-4 = (MATCH) product of:
0.0031093531 = (MATCH) sum of:
  0.0015556295 = (MATCH) weight(kw_stopped:hous in 1812), product of:
2.8209004E-4 = queryWeight(kw_stopped:hous), product of:
  5.514656 = idf(docFreq=25, maxDocs=2375)
  5.1152787E-5 = queryNorm
5.514656 = (MATCH) fieldWeight(kw_stopped:hous in 1812), product of:
  1.0 = tf(termFreq(kw_stopped:hous)=1)
  5.514656 = idf(docFreq=25, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
  8.192911E-4 = (MATCH) weight(kw_stopped:rate in 1812), product of:
2.0471694E-4 = queryWeight(kw_stopped:rate), product of:
  4.002068 = idf(docFreq=117, maxDocs=2375)
  5.1152787E-5 = queryNorm
4.002068 = (MATCH) fieldWeight(kw_stopped:rate in 1812), product of:
  1.0 = tf(termFreq(kw_stopped:rate)=1)
  4.002068 = idf(docFreq=117, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
  7.344327E-4 = (MATCH) weight(kw_stopped:loan in 1812), product of:
1.9382538E-4 = queryWeight(kw_stopped:loan), product of:
  3.7891462 = idf(docFreq=145, maxDocs=2375)
  5.1152787E-5 = queryNorm
3.7891462 = (MATCH) fieldWeight(kw_stopped:loan in 1812), product
of:
  1.0 = tf(termFreq(kw_stopped:loan)=1)
  3.7891462 = idf(docFreq=145, maxDocs=2375)
  1.0 = fieldNorm(field=kw_stopped, doc=1812)
0.27272728 = coord(3/11)

for "buy a house".

Unless I try an exact phrase "buy a house" as the query, the kw_phrases
never shows up in the explanation.

What am I doing wrong? Please help!

thanks,
Vijay


Re: Multi CPU Cores

2011-10-16 Thread Johannes Goll
we use the the following in production

java -server -XX:+UseParallelGC -XX:+AggressiveOpts
-XX:+DisableExplicitGC -Xms3G -Xmx40G -Djetty.port=
-Dsolr.solr.home= jar start.jar

more information
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html

Johannes


Re: Multi CPU Cores

2011-10-16 Thread Li Li
for indexing, your can make use of multi cores easily by call
IndexWriter.addDocument with multi-threads
as far as I know, for searching, if there is only one request, you can't
make good use of cpus.

On Sat, Oct 15, 2011 at 9:37 PM, Rob Brown  wrote:

> Hi,
>
> I'm running Solr on a machine with 16 CPU cores, yet watching "top" shows
> that java is only apparently using 1 and maxing it out.
>
> Is there anything that can be done to take advantage of more CPU cores?
>
> Solr 3.4 under Tomcat
>
> [root@solr01 ~]# java -version
> java version "1.6.0_20"
> OpenJDK Runtime Environment (IcedTea6 1.9.8) (rhel-1.22.1.9.8.el5_6-x86_64)
> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>
>
> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24, 1.08
> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>  0.0%st
> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k buffers
> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
>
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
>  4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38 java
>  6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71 openvpn
> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08 top
>1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69 init
>
>


Re: Combine XML data with DIH

2011-10-16 Thread O. Klein

O. Klein wrote:
> 
> 
> O. Klein wrote:
>> 
>> I have folder with XML files
>> 
>> 1.xml contains:
>> http://www.site.com/1.html
>> blacontent
>> blatitle
>> 
>> 2.xml contains:
>> http://www.site.com/1.html
>> blatitle2
>> 
>> I want to  create document in Solr:
>> 
>> http://www.site.com/1.html
>> blacontent
>> blatitle2
>> 
>> 
> 
> I changed my problem in the quotes as it's a little different and
> hopefully easier to solve.
> 
> Can this be done with DIH? And how?
> 

Hmm, I tried to index all docs and JOIN them on id. This didn't work as it
only shows the fields in the linked document.

Is there some way to show all the fields of the combined documents?


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Combine-XML-data-with-DIH-tp3209413p3425844.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi CPU Cores

2011-10-16 Thread Rob Brown
Thanks, Java is completely new to me (Perl/C background), so a little
guidance would be great with config options like this, while I get to
grips with Java...

Or pointing to a useful resource to start filling in these gaps too.



-Original Message-
From: Johannes Goll 
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org 
Subject: Re: Multi CPU Cores
Date: Sun, 16 Oct 2011 08:18:47 -0400

Try using -useParallelGc as vm option. 

Johannes

On Oct 16, 2011, at 7:51 AM, Ken Krugler  wrote:

> 
> On Oct 16, 2011, at 1:44pm, Rob Brown wrote:
> 
>> Looks like I checked the load during a quiet period, ab -n 1 -c 1000
>> saw a decent 40% load on each core.
>> 
>> Still a little confused as to why 1 core stays at 100% constantly - even
>> during the quiet periods?
> 
> Could be background GC, depending on what you've got your JVM configured to 
> use.
> 
> Though that shouldn't stay at 100% for very long.
> 
> -- Ken
> 
> 
>> -Original Message-
>> From: Johannes Goll 
>> Reply-to: solr-user@lucene.apache.org
>> To: solr-user@lucene.apache.org 
>> Subject: Re: Multi CPU Cores
>> Date: Sat, 15 Oct 2011 21:30:11 -0400
>> 
>> Did you try to submit multiple search requests in parallel? The apache ab 
>> tool is great tool to simulate simultaneous load using (-n and -c).
>> Johannes
>> 
>> On Oct 15, 2011, at 7:32 PM, Rob Brown  wrote:
>> 
>>> Hi,
>>> 
>>> I'm running Solr on a machine with 16 CPU cores, yet watching "top"
>>> shows that java is only apparently using 1 and maxing it out.
>>> 
>>> Is there anything that can be done to take advantage of more CPU cores?
>>> 
>>> Solr 3.4 under Tomcat
>>> 
>>> [root@solr01 ~]# java -version
>>> java version "1.6.0_20"
>>> OpenJDK Runtime Environment (IcedTea6 1.9.8)
>>> (rhel-1.22.1.9.8.el5_6-x86_64)
>>> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>>> 
>>> 
>>> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24,
>>> 1.08
>>> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
>>> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k
>>> buffers
>>> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
>>> 
>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>>> COMMAND 
>>> 
>>>
>>> 4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38
>>> java
>>> 
>>>   
>>> 6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71
>>> openvpn 
>>> 
>>>
>>> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08
>>> top 
>>> 
>>>
>>>  1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69
>>> init 
>>> 
>>> 
>>> 
>> 
> 
> --
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
> 
> 
> 



Re: Multi CPU Cores

2011-10-16 Thread Johannes Goll
Try using -useParallelGc as vm option. 

Johannes

On Oct 16, 2011, at 7:51 AM, Ken Krugler  wrote:

> 
> On Oct 16, 2011, at 1:44pm, Rob Brown wrote:
> 
>> Looks like I checked the load during a quiet period, ab -n 1 -c 1000
>> saw a decent 40% load on each core.
>> 
>> Still a little confused as to why 1 core stays at 100% constantly - even
>> during the quiet periods?
> 
> Could be background GC, depending on what you've got your JVM configured to 
> use.
> 
> Though that shouldn't stay at 100% for very long.
> 
> -- Ken
> 
> 
>> -Original Message-
>> From: Johannes Goll 
>> Reply-to: solr-user@lucene.apache.org
>> To: solr-user@lucene.apache.org 
>> Subject: Re: Multi CPU Cores
>> Date: Sat, 15 Oct 2011 21:30:11 -0400
>> 
>> Did you try to submit multiple search requests in parallel? The apache ab 
>> tool is great tool to simulate simultaneous load using (-n and -c).
>> Johannes
>> 
>> On Oct 15, 2011, at 7:32 PM, Rob Brown  wrote:
>> 
>>> Hi,
>>> 
>>> I'm running Solr on a machine with 16 CPU cores, yet watching "top"
>>> shows that java is only apparently using 1 and maxing it out.
>>> 
>>> Is there anything that can be done to take advantage of more CPU cores?
>>> 
>>> Solr 3.4 under Tomcat
>>> 
>>> [root@solr01 ~]# java -version
>>> java version "1.6.0_20"
>>> OpenJDK Runtime Environment (IcedTea6 1.9.8)
>>> (rhel-1.22.1.9.8.el5_6-x86_64)
>>> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>>> 
>>> 
>>> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24,
>>> 1.08
>>> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
>>> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>>> 0.0%st
>>> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k
>>> buffers
>>> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
>>> 
>>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>>> COMMAND 
>>> 
>>>
>>> 4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38
>>> java
>>> 
>>>   
>>> 6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71
>>> openvpn 
>>> 
>>>
>>> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08
>>> top 
>>> 
>>>
>>>  1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69
>>> init 
>>> 
>>> 
>>> 
>> 
> 
> --
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> custom big data solutions & training
> Hadoop, Cascading, Mahout & Solr
> 
> 
> 


RE: Implement Custom Soundex

2011-10-16 Thread Momo..Lelo ..

Dear Gora, 

Thank you for the quick response. 

Actually I 
need to do Soundex for Arabic language. The code is already done in Java. But I 
couldn't understand how can I implement it as Solr filter. 

Regards,



> From: g...@mimirtech.com
> Date: Sun, 16 Oct 2011 16:19:48 +0530
> Subject: Re: Implement Custom Soundex
> To: solr-user@lucene.apache.org
> 
> 2011/10/16 Momo..Lelo .. :
> >
> > Dear,
> >
> > Does anyone there has an experience of developing a custom Soundex.
> >
> >  If you have an experience doing this and can offer some help and share 
> > experience I'd really appreciate it.
> 
> I presume that this is in the context of Solr, and spell-checking.
> We did this as an exercise for Indian-language words transliterated
> into English, hooking into the open-source spell-checking library,
> aspell, which provided us  with a soundex-like algorithm (the actual
> algorithm is quite different, but works better than soundex, at
> least for our use case). We were quite satisfied with the results,
> though unfortunately this never went into production.
> 
> Would be glad to help, though I am going to be really busy the
> next few days. Please do provide us with more details on your
> requirements.
> 
> Regards,
> Gora
  

Re: Multi CPU Cores

2011-10-16 Thread Ken Krugler

On Oct 16, 2011, at 1:44pm, Rob Brown wrote:

> Looks like I checked the load during a quiet period, ab -n 1 -c 1000
> saw a decent 40% load on each core.
> 
> Still a little confused as to why 1 core stays at 100% constantly - even
> during the quiet periods?

Could be background GC, depending on what you've got your JVM configured to use.

Though that shouldn't stay at 100% for very long.

-- Ken


> -Original Message-
> From: Johannes Goll 
> Reply-to: solr-user@lucene.apache.org
> To: solr-user@lucene.apache.org 
> Subject: Re: Multi CPU Cores
> Date: Sat, 15 Oct 2011 21:30:11 -0400
> 
> Did you try to submit multiple search requests in parallel? The apache ab 
> tool is great tool to simulate simultaneous load using (-n and -c).
> Johannes
> 
> On Oct 15, 2011, at 7:32 PM, Rob Brown  wrote:
> 
>> Hi,
>> 
>> I'm running Solr on a machine with 16 CPU cores, yet watching "top"
>> shows that java is only apparently using 1 and maxing it out.
>> 
>> Is there anything that can be done to take advantage of more CPU cores?
>> 
>> Solr 3.4 under Tomcat
>> 
>> [root@solr01 ~]# java -version
>> java version "1.6.0_20"
>> OpenJDK Runtime Environment (IcedTea6 1.9.8)
>> (rhel-1.22.1.9.8.el5_6-x86_64)
>> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
>> 
>> 
>> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24,
>> 1.08
>> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
>> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
>> 0.0%st
>> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k
>> buffers
>> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
>> 
>> PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
>> COMMAND  
>>  
>>  
>> 4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38
>> java 
>>  
>> 
>> 6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71
>> openvpn  
>>  
>>  
>> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08
>> top  
>>  
>>  
>>   1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69
>> init 
>> 
>> 
>> 
> 

--
Ken Krugler
+1 530-210-6378
http://bixolabs.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr





Re: Multi CPU Cores

2011-10-16 Thread Rob Brown
Looks like I checked the load during a quiet period, ab -n 1 -c 1000
saw a decent 40% load on each core.

Still a little confused as to why 1 core stays at 100% constantly - even
during the quiet periods?


-- 

IntelCompute
Web Design and Online Marketing

http://www.intelcompute.com


-Original Message-
From: Johannes Goll 
Reply-to: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org 
Subject: Re: Multi CPU Cores
Date: Sat, 15 Oct 2011 21:30:11 -0400

Did you try to submit multiple search requests in parallel? The apache ab tool 
is great tool to simulate simultaneous load using (-n and -c).
Johannes

On Oct 15, 2011, at 7:32 PM, Rob Brown  wrote:

> Hi,
> 
> I'm running Solr on a machine with 16 CPU cores, yet watching "top"
> shows that java is only apparently using 1 and maxing it out.
> 
> Is there anything that can be done to take advantage of more CPU cores?
> 
> Solr 3.4 under Tomcat
> 
> [root@solr01 ~]# java -version
> java version "1.6.0_20"
> OpenJDK Runtime Environment (IcedTea6 1.9.8)
> (rhel-1.22.1.9.8.el5_6-x86_64)
> OpenJDK 64-Bit Server VM (build 19.0-b09, mixed mode)
> 
> 
> top - 14:36:18 up 22 days, 21:54,  4 users,  load average: 1.89, 1.24,
> 1.08
> Tasks: 317 total,   1 running, 315 sleeping,   0 stopped,   1 zombie
> Cpu0  :  0.0%us,  0.0%sy,  0.0%ni, 99.6%id,  0.4%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu1  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu2  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu3  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu4  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu5  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu6  : 99.6%us,  0.4%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu8  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu9  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu10 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu11 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu12 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu13 :  0.7%us,  0.0%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu14 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu15 :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Mem:  132088928k total, 23760584k used, 108328344k free,   318228k
> buffers
> Swap: 25920868k total,0k used, 25920868k free, 18371128k cached
> 
>  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
> COMMAND   
>   
>
> 4466 tomcat20   0 31.2g 4.0g 171m S 101.0  3.2   2909:38
> java  
>   
>   
> 6495 root  15   0 42416 3892 1740 S  0.4  0.0   9:34.71
> openvpn   
>   
>
> 11456 root  16   0 12892 1312  836 R  0.4  0.0   0:00.08
> top   
>   
>
>1 root  15   0 10368  632  536 S  0.0  0.0   0:04.69
> init 
> 
> 
> 



Re: multiple document types in a core

2011-10-16 Thread lee carroll
Hi Chris thanks for the response

> It's an inverted index, so *tems* exist once (per segment) and those terms
> "point" to the documents -- so having the same terms (in the same fields)
> for multiple types of documents in one index is going to take up less
> overall space then having distinct collections for each type of document.

I'm not asking about the indexed terms but rather the stored values.
By having two doc types are we gaining anything by "storing"
attributes only for that doc type

cheers lee c


Re: Implement Custom Soundex

2011-10-16 Thread Gora Mohanty
2011/10/16 Momo..Lelo .. :
>
> Dear,
>
> Does anyone there has an experience of developing a custom Soundex.
>
>  If you have an experience doing this and can offer some help and share 
> experience I'd really appreciate it.

I presume that this is in the context of Solr, and spell-checking.
We did this as an exercise for Indian-language words transliterated
into English, hooking into the open-source spell-checking library,
aspell, which provided us  with a soundex-like algorithm (the actual
algorithm is quite different, but works better than soundex, at
least for our use case). We were quite satisfied with the results,
though unfortunately this never went into production.

Would be glad to help, though I am going to be really busy the
next few days. Please do provide us with more details on your
requirements.

Regards,
Gora


Implement Custom Soundex

2011-10-16 Thread Momo..Lelo ..

Dear,

Does anyone there has an experience of developing a custom Soundex.  

  If you have an experience doing this and can offer some help and share 
experience I'd really appreciate it.