date:20130929

Maximum solr processes per machine

2013-09-29 Thread adfel70

Hi,
I'm thinking of solr cluster architecture before purchasing machines.


My total index size is around 5TB. I want to have replication factor of 3.
total 15TB.
I've understood that I should  have 50-100% of the index size as ram, for OS
cache. Lets say we're talking about around 10TB of memory.
Now I need to split this memory to multiple servers and get the machine spec
I want to buy.
I'm thinking of running multiple solr processes per machine.
is there an upper limit of amount of solr processes per machine, assuming
that I make sure that the total size of indexes of all nodes in the machine
is within the RAM percentage i've defined?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Maximum-solr-processes-per-machine-tp4092568.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Maximum solr processes per machine

2013-09-29 Thread Erick Erickson

bq: is there an upper limit of amount of solr processes per machine,

No, assuming they're all in separate JVMs. I've see reports, though,
that increasing the number of JVMs past the number of CPU
cores gets into iffy territory.

And, depending on your disk storage they may all be contending for
disk access.

FWIW,
Erick

On Sun, Sep 29, 2013 at 9:21 AM, adfel70 adfe...@gmail.com wrote:
 Hi,
 I'm thinking of solr cluster architecture before purchasing machines.


 My total index size is around 5TB. I want to have replication factor of 3.
 total 15TB.
 I've understood that I should  have 50-100% of the index size as ram, for OS
 cache. Lets say we're talking about around 10TB of memory.
 Now I need to split this memory to multiple servers and get the machine spec
 I want to buy.
 I'm thinking of running multiple solr processes per machine.
 is there an upper limit of amount of solr processes per machine, assuming
 that I make sure that the total size of indexes of all nodes in the machine
 is within the RAM percentage i've defined?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Maximum-solr-processes-per-machine-tp4092568.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Maximum solr processes per machine

2013-09-29 Thread adfel70

How can I configure the disk storage so that disk access is optimized?
I'm considering having RAID-10
and I think I'll have arround 4-8 disks per machine.
Should I run each solr jvm to point on a datadir on differnet disks, or is
there some other way to optimize this?

Erick Erickson wrote
bq: is there an upper limit of amount of solr processes per machine,

No, assuming they're all in separate JVMs. I've see reports, though,
that increasing the number of JVMs past the number of CPU
cores gets into iffy territory.

And, depending on your disk storage they may all be contending for
disk access.

FWIW,
Erick

On Sun, Sep 29, 2013 at 9:21 AM, adfel70 lt;

adfel70@

gt; wrote:
Hi,
I'm thinking of solr cluster architecture before purchasing machines.

My total index size is around 5TB. I want to have replication factor of
3.
total 15TB.
I've understood that I should have 50-100% of the index size as ram, for
OS
cache. Lets say we're talking about around 10TB of memory.
Now I need to split this memory to multiple servers and get the machine
spec
I want to buy.
I'm thinking of running multiple solr processes per machine.
is there an upper limit of amount of solr processes per machine, assuming
that I make sure that the total size of indexes of all nodes in the
machine
is within the RAM percentage i've defined?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Maximum-solr-processes-per-machine-tp4092568.html
Sent from the Solr - User mailing list archive at Nabble.com.

--
View this message in context:
http://lucene.472066.n3.nabble.com/Maximum-solr-processes-per-machine-tp4092568p4092574.html
Sent from the Solr - User mailing list archive at Nabble.com.

ClusteringComponent under Tomcat 7

2013-09-29 Thread Lieberman, Ariel

Hi,

I'm trying to run Solr 4.3 (and 4.4) with -Dsolr.clustering.enabled=true

I've copied all relevant jars to ./lib directory under the instance.

With jetty it runs OK! But, under Tomcat I receives the error (exception) below.

Any idea/help?

Thanks,

-Ariel


org.apache.solr.common.SolrException: Error Instantiating SearchComponent, 
solr.clustering.ClusteringComponent failed to instantiate 
org.apache.solr.handler.component.SearchComponent
 at org.apache.solr.core.SolrCore.init(SolrCore.java:835)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
 at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
 at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
 at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
SearchComponent, solr.clustering.ClusteringComponent failed to instantiate 
org.apache.solr.handler.component.SearchComponent
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:551)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:586)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2173)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2167)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2200)
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1231)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:766)
 ... 13 more
Caused by: java.lang.ClassCastException: class 
org.apache.solr.handler.clustering.ClusteringComponent
 at java.lang.Class.asSubclass(Unknown Source)
 at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:443)
 at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:381)
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:530)
 ... 19 more
ERROR - 2013-09-29 05:58:13.519; org.apache.solr.common.SolrException; 
null:org.apache.solr.common.SolrException: Unable to create core: att150K
 at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1150)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:666)
 at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364)
 at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
 at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
 at java.util.concurrent.FutureTask.run(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
 at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
SearchComponent, solr.clustering.ClusteringComponent failed to instantiate 
org.apache.solr.handler.component.SearchComponent
 at org.apache.solr.core.SolrCore.init(SolrCore.java:835)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:629)
 at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:622)
 at org.apache.solr.core.CoreContainer.create(CoreContainer.java:657)
 ... 10 more
Caused by: org.apache.solr.common.SolrException: Error Instantiating 
SearchComponent, solr.clustering.ClusteringComponent failed to instantiate 
org.apache.solr.handler.component.SearchComponent
 at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:551)
 at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:586)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2173)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2167)
 at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2200)
 at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1231)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:766)
 ... 13 more
Caused by: java.lang.ClassCastException: class 
org.apache.solr.handler.clustering.ClusteringComponent
 at java.lang.Class.asSubclass(Unknown Source)
 at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:443)
 at

Re: Maximum solr processes per machine

2013-09-29 Thread Bram Van Dam


On 09/29/2013 04:03 PM, adfel70 wrote:

How can I configure the disk storage so that disk access is optimized?
I'm considering having RAID-10
and I think I'll have arround 4-8 disks per machine.
Should I run each solr jvm to point on a datadir on differnet disks, or is
there some other way to optimize this?


Best way to deal with this, is trial and error. There are many factors 
that can contribute to your hardware decisions. Will there be concurrent 
access on all solr instances? Will some be used more than others?


If you have a couple of highly used and many seldom used instances, then 
there's no problem in running in each in a different JVM. If you have 
more highly used instances/JVMs than CPU cores...you're in trouble.


Are you doing real time search? Or is the data mostly static? If the 
data doesn't change much, then good warming-up queries will be a lot 
more useful than trying to tie solr to specific disks.


If you're doing real time on a 5TB index then you'll probably want to 
throw your money at the fastest storage you can afford (SSDs vs spinning 
rust made a huge difference in our benchmarks) and the fastest CPUs you 
can get your hands on. Memory is important too, but in our benchmarks 
that didn't have as much impact as the other factors. Keeping a 5TB 
index in memory is going to be tricky, so in my opinion you'd be better 
off investing in faster disks instead.


 - Bram

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-29 Thread Andreas Owen

how dum can you get. obviously quite dum... i would have to analyze the 
html-pages with a nested instance like this:

entity name=rec processor=XPathEntityProcessor 
url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml 
forEach=/docs/doc dataSource=main 

entity name=htm processor=XPathEntityProcessor 
url=${rec.urlParse} forEach=/xhtml:html dataSource=dataUrl
field column=text xpath=//content /
field column=h_2 xpath=//body /
field column=text_nohtml xpath=//text /
field column=h_1 xpath=//h:h1 /
/entity
/entity

but i'm pretty sure the foreach is wrong and the xpath expressions. in the 
moment i getting the following error:

Caused by: java.lang.RuntimeException: 
org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.ClassCastException: 
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast to 
java.io.Reader





On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote:

 ok i see what your getting at but why doesn't the following work:
   
   field xpath=//h:h1 column=h_1 /
   field column=text xpath=/xhtml:html/xhtml:body /
 
 i removed the tiki-processor. what am i missing, i haven't found anything in 
 the wiki?
 
 
 On 28. Sep 2013, at 12:28 AM, P Williams wrote:
 
 I spent some more time thinking about this.  Do you really need to use the
 TikaEntityProcessor?  It doesn't offer anything new to the document you are
 building that couldn't be accomplished by the XPathEntityProcessor alone
 from what I can tell.
 
 I also tried to get the Advanced
 Parsinghttp://wiki.apache.org/solr/TikaEntityProcessorexample to
 work without success.  There are some obvious typos (document
 instead of /document) and an odd order to the pieces (dataSources is
 enclosed by document).  It also looks like
 FieldStreamDataSourcehttp://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.htmlis
 the one that is meant to work in this context. If Koji is still around
 maybe he could offer some help?  Otherwise this bit of erroneous
 instruction should probably be removed from the wiki.
 
 Cheers,
 Tricia
 
 $ svn diff
 Index:
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
 ===
 ---
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
(revision 1526990)
 +++
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
(working copy)
 @@ -99,13 +99,13 @@
runFullImport(getConfigHTML(identity));
assertQ(req(*:*), testsHTMLIdentity);
  }
 -
 +
  private String getConfigHTML(String htmlMapper) {
return
dataConfig +
  dataSource type='BinFileDataSource'/ +
  document +
 -entity name='Tika' format='xml'
 processor='TikaEntityProcessor'  +
 +entity name='Tika' format='html'
 processor='TikaEntityProcessor'  +
   url=' +
 getFile(dihextras/structured.html).getAbsolutePath() + '  +
((htmlMapper == null) ?  : ( htmlMapper=' + htmlMapper +
 ')) +  +
  field column='text'/ +
 @@ -114,4 +114,36 @@
/dataConfig;
 
  }
 +  private String[] testsHTMLH1 = {
 +  //*[@numFound='1']
 +  , //str[@name='h1'][contains(.,'H1 Header')]
 +  };
 +
 +  @Test
 +  public void testTikaHTMLMapperSubEntity() throws Exception {
 +runFullImport(getConfigSubEntity(identity));
 +assertQ(req(*:*), testsHTMLH1);
 +  }
 +
 +  private String getConfigSubEntity(String htmlMapper) {
 +return
 +dataConfig +
 +dataSource type='BinFileDataSource' name='bin'/ +
 +dataSource type='FieldStreamDataSource' name='fld'/ +
 +document +
 +entity name='tika' processor='TikaEntityProcessor' url=' +
 getFile(dihextras/structured.html).getAbsolutePath() + '
 dataSource='bin' format='html' rootEntity='false' +
 +!--Do appropriate mapping here  meta=\true\ means it is a
 metadata field -- +
 +field column='Author' meta='true' name='author'/ +
 +field column='title' meta='true' name='title'/ +
 +!--'text' is an implicit field emited by TikaEntityProcessor .
 Map it appropriately-- +
 +field name='text' column='text'/ +
 +entity name='detail' type='XPathEntityProcessor' forEach='/html'
 dataSource='fld' dataField='tika.text' rootEntity='true'  +
 +field xpath='//div'  column='foo'/ +
 +field xpath='//h1'  column='h1' / +
 +/entity +
 +/entity +
 +/document +
 +/dataConfig;
 +  }
 +
 }
 Index:

Re: Hello and help :)

2013-09-29 Thread Matheus Salvia

Thanks for the anwser. Yes, you understood it correctly.
The method you proposed should work perfectly, except I do have one more
requirement that I forgot to mention earlier, and I apologize for that.
The true problem we are facing is:
* find all documents for userID=x, where userID=x has more than y
documents in the index between dateA and dateB

And since dateA and dateB can be any dates, its impossible to save the
count, since we cannot foresee what date and what count will be requested.

2013/9/28 Upayavira u...@odoko.co.uk

To phrase your need more generically:

* find all documents for userID=x, where userID=x has more than y
documents in the index

Is that correct?

If it is, I'd probably do some work at index time. First guess, I'd keep
a separate core, which has a very small document per user, storing just:

* userID
* docCount

Then, when you add/delete a document, you use atomic updates to either
increase or decrease the docCount on that user doc.

Then you can use a pseudo join between these two cores relatively
easily.

q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x
+doc_count:[y TO *]

Worst case, if you don't want to mess with your indexing code, I wonder
if you could use a ScriptUpdateProcessor to do this work - not sure if
you can have one add an entirely new, additional, document to the list,
but may be possible.

Upayavira

On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote:
Sure, sorry for the inconvenience.

I'm having a little trouble trying to make a query in Solr. The problem
is:
I must be able retrieve documents that have the same value for a
specified
field, but they should only be retrieved if this value appeared more than
X
times for a specified user. In pseudosql it would be something like:

select user_id from documents
where my_field=my_value
and
(select count(*) from documents where my_field=my_value and
user_id=super.user_id) X

I Know that solr return a 'numFound' for each query you make, but I dont
know how to retrieve this value in a subquery.

My Solr is organized in a way that a user is a document, and the
properties
of the user (such as name, age, etc) are grouped in another document with
a
'root_id' field. So lets suppose the following query that gets all the
root
documents whose children have the prefix some_prefix.

is_root:true AND _query_:{!join from=root_id
to=id}requests_prefix:\some_prefix\

Now, how can I get the root documents (users in some sense) that have
more
than X children matching 'requests_prefix:some_prefix' or any other
condition? Is it possible?

P.S. It must be done in a single query, fields can be added at will, but
the root/children structure should be preserved (preferentially).

2013/9/27 Upayavira u...@odoko.co.uk

Mattheus,

Given these mails form a part of an archive that are themselves
self-contained, can you please post your actual question here? You're
more likely to get answers that way.

Thanks, Upayavira

On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote:
Hello everyone,
I'm having a problem regarding how to make a solr query, I've posted
it
on
stackoverflow.
Can someone help me?

http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter

Thanks in advance!

--
--
// Matheus Salvia
Desenvolvedor Mobile
Celular: +55 11 9-6446-2332
Skype: meta.faraday

Nagle's Algorithm

2013-09-29 Thread William Bell

How do I set TCP_NODELAY on the http sockets for Jetty in SOLR 4?

Is there an option in jetty.xml ?

/* Create new stream socket */

sock = *socket*( AF_INET, SOCK_STREAM, 0 );



/* Disable the Nagle (TCP No Delay) algorithm */

flag = 1;

ret = *setsockopt*( sock, IPPROTO_TCP, TCP_NODELAY, (char *)flag,
sizeof(flag) );




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

Re: Nagle's Algorithm

2013-09-29 Thread Dan Davis

I don't keep up with this list well enough to know whether anyone else
answered.  I don't know how to do it in jetty.xml, but you can certainly
tweak the code.   java.net.Socket has a method setTcpNoDelay() that
corresponds with the standard Unix system calls.

Long-time past, my suggestion of this made Apache Axis 2.0 250ms faster per
call (1).   Now I want to know whether Apache Solr sets it.

One common way to test the overhead portion of latency is to project the
latency for a zero size request based on larger requests.   What you do is
to warm requests (all in memory) for progressively fewer and fewer
rows.   You can make requests for 100, 90, 80, 70 ... 10 rows each more
than once so that all is warmed.   If you plot this, it should look like a
linear function latency(rows) = m(rows) + b since all is cached in
memory.   You have to control what else is going on on the server to get
the linear plot of course - it can be quite hard to get this to work right
on modern Linux.   But once you have it, you can simply calculate f(0) and
you have the latency for a theoretical 0 sized request.

This is a tangential answer at best - I wish I just knew a setting to give
you.

(1) Latency Performance of SOAP
Implementationshttp://citeseer.ist.psu.edu/viewdoc/similar?doi=10.1.1.21.8556type=ab


On Sun, Sep 29, 2013 at 9:22 PM, William Bell billnb...@gmail.com wrote:

 How do I set TCP_NODELAY on the http sockets for Jetty in SOLR 4?

 Is there an option in jetty.xml ?

 /* Create new stream socket */

 sock = *socket*( AF_INET, SOCK_STREAM, 0 );



 /* Disable the Nagle (TCP No Delay) algorithm */

 flag = 1;

 ret = *setsockopt*( sock, IPPROTO_TCP, TCP_NODELAY, (char *)flag,
 sizeof(flag) );




 --
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076

Re: Nagle's Algorithm

2013-09-29 Thread Michael Sokolov


I dunno, but this makes it look as if this may already be taken care of:

http://jira.codehaus.org/browse/JETTY-1196

On 9/29/2013 9:22 PM, William Bell wrote:

How do I set TCP_NODELAY on the http sockets for Jetty in SOLR 4?

Is there an option in jetty.xml ?

/* Create new stream socket */

sock = *socket*( AF_INET, SOCK_STREAM, 0 );



/* Disable the Nagle (TCP No Delay) algorithm */

flag = 1;

ret = *setsockopt*( sock, IPPROTO_TCP, TCP_NODELAY, (char *)flag,
sizeof(flag) );

Re: Maximum solr processes per machine

2013-09-29 Thread Shawn Heisey

On 9/29/2013 7:21 AM, adfel70 wrote:
 Hi,
 I'm thinking of solr cluster architecture before purchasing machines.
 
 
 My total index size is around 5TB. I want to have replication factor of 3.
 total 15TB.
 I've understood that I should  have 50-100% of the index size as ram, for OS
 cache. Lets say we're talking about around 10TB of memory.
 Now I need to split this memory to multiple servers and get the machine spec
 I want to buy.
 I'm thinking of running multiple solr processes per machine.

Running multiple solr instances per machine is a really bad idea.  One
Solr instance can run many indexes, and there will be far less memory
overhead if you're not running multiple servlet containers.  You can
also run all instances on the same TCP port - no need to figure out
different ports per instance.  Configuration and deployment are not as
complicated.

When you have multiple Solr instances per machine, the SolrCloud
collections API has a tendency to place some or all of the replicas for
each shard on the same machine, which means that it won't be fault
tolerant.  With one instance per machine, you can be absolutely sure
that created collections will have all replicas for each shard on
different machines.

I will echo the advice you've been given about using SSD.  You'll need
much less OS disk cache memory with SSD.

Thanks,
Shawn

Re: Nagle's Algorithm

2013-09-29 Thread Shawn Heisey

On 9/29/2013 7:22 PM, William Bell wrote:
 How do I set TCP_NODELAY on the http sockets for Jetty in SOLR 4?

The client usually makes that decision, not the server.  This parameter
is turned on by default for recent HttpClient versions, the library used
by SolrJ.  Even the JETTY issue uncovered by Michael Sokolov refers to a
client connection.

Thanks,
Shawn

Maximum solr processes per machine

Re: Maximum solr processes per machine

Re: Maximum solr processes per machine

ClusteringComponent under Tomcat 7

Re: Maximum solr processes per machine

Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

Re: Hello and help :)

Nagle's Algorithm

Re: Nagle's Algorithm

Re: Nagle's Algorithm

Re: Maximum solr processes per machine

Re: Nagle's Algorithm

12 matches

Site Navigation

Mail list logo

Footer information