subject:"\[jira\] Commented\: \(SOLR\-1044\) Use Hadoop RPC for inter Solr communication"

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-05 Thread Mike Klaas (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12679466#action_12679466
]

Mike Klaas commented on SOLR-1044:
--

{quote} I haven't yet seen a HTTP server serving more than around 1200 req/sec
(apache HTTPD). A call based server can serve 4k-5k messages easily. (I am
yet to test hadoop RPC) . The proliferation of a large no: of frameworks around
that is a testimony to the superiority of that approach. {/quote}

up to 50,000 req/sec, with keepalive:
http://www.litespeedtech.com/web-server-performance-comparison-litespeed-2.0-vs.html

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

Solr uses http for distributed search . We can make it a whole lot faster if
we use an RPC mechanism which is more lightweight/efficient.
Hadoop RPC looks like a good candidate for this.
The implementation should just have one protocol. It should follow the Solr's
idiom of making remote calls . A uri + params +[optional stream(s)] . The
response can be a stream of bytes.
To make this work we must make the SolrServer implementation pluggable in
distributed search. Users should be able to choose between the current
CommonshttpSolrServer, or a HadoopRpcSolrServer .

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-04 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678987#action_12678987
]

Yonik Seeley commented on SOLR-1044:

bq. An HTTP connection can be re-used only after the request-response is
complete. meanwhile, If there is another request to be fired to the same server
from the same client , a new connection will have to be created.

But the system quickly reaches steady state, right? That new connection will
be persistent and hang around for a while to be reused again when needed.

For a high-fanout distributed search, a more important part might actually be
message parsing (independent of transport used). I think we've done a decent
job with the binary protocol for both CPU and network bandwidth... the actual
requests themselves (hitting the lucene index, doing faceting and highlighting,
retrieving stored fields) should hopefully be the bottleneck.

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-04 Thread Noble Paul (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12679061#action_12679061
]

Noble Paul commented on SOLR-1044:
--

bq.But the system quickly reaches steady state, right? That new connection will
be persistent and hang around for a while to be reused again when needed.
the system reaches a steady state where the no:of connections would be slightly
greater than the maximum no:of parallel requests. whereas a system using a
message based RPC will still have only a single connection between 2 Solrs.

bq.For a high-fanout distributed search, a more important part might actually
be message parsing (independent of transport used)

As Solr move towards other applications such as mapreduce/mahout where the
operations do not involve disk IO and where the payload is small there can be a
problem.

My tests with hadoop RPC showed it outperforming tomcat when I used a small
payload (5 bytes)

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-03 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678585#action_12678585
 ] 

Noble Paul commented on SOLR-1044:
--

bq.do not know much about Solr needs there, but we are using one of prehistoric 
versions of hadoop RPC (no NIO version)

disclaimer : I am not a hadoop user . have you used the NIO version ? how is 
the perf?



 Use Hadoop RPC for inter Solr communication
 ---

 Key: SOLR-1044
 URL: https://issues.apache.org/jira/browse/SOLR-1044
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Noble Paul

 Solr uses http for distributed search . We can make it a whole lot faster if 
 we use an RPC mechanism which is more lightweight/efficient. 
 Hadoop RPC looks like a good candidate for this.  
 The implementation should just have one protocol. It should follow the Solr's 
 idiom of making remote calls . A uri + params +[optional stream(s)] . The 
 response can be a stream of bytes.
 To make this work we must make the SolrServer implementation pluggable in 
 distributed search. Users should be able to choose between the current 
 CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-03 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678587#action_12678587
]

Yonik Seeley commented on SOLR-1044:

We're using persistent HTTP connections, so socket creation overhead should not
be much of an issue.
As far as NIO - servlet containers often have NIO connectors (I guess so idle
persistent connections don't take up a thread to listen on them). That handles
the receive-side. On the sender-side, NIO shouldn't matter... all of our
clients need a thread to keep the request context anyway - we really have no
way of using NIO there.

There could be an issue surrounding the number of TCP connections in a large
cluster (that's an orthogonal issue to NIO), but modern OSs seem to handle high
numbers of connections efficiently do switches? Or perhaps the real limit
has to do with exhausting port numbers (65536)?

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-03 Thread Noble Paul (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678597#action_12678597
]

Noble Paul commented on SOLR-1044:
--

bq.We're using persistent HTTP connections, so socket creation overhead should
not be much of an issue.

An HTTP connection can be re-used only after the request-response is complete.
meanwhile, If there is another request to be fired to the same server from the
same client , a new connection will have to be created. So the no:of
connections we create will be quite high if we have a large no:of nodes in
distributed search .

I haven't yet seen a HTTP server serving more than around 1200 req/sec (apache
HTTPD). A call based server can serve 4k-5k messages easily. (I am yet to test
hadoop RPC) . The proliferation of a large no: of frameworks around that is a
testimony to the superiority of that approach.

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-03 Thread Walter Underwood (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678601#action_12678601
 ] 

Walter Underwood commented on SOLR-1044:


During the Oscars, the HTTP cache in front of our Solr farm had a 90% hit rate. 
I think a 10X reduction in server load is a testimony to the superiority of the 
HTTP approach.


 Use Hadoop RPC for inter Solr communication
 ---

 Key: SOLR-1044
 URL: https://issues.apache.org/jira/browse/SOLR-1044
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Noble Paul

 Solr uses http for distributed search . We can make it a whole lot faster if 
 we use an RPC mechanism which is more lightweight/efficient. 
 Hadoop RPC looks like a good candidate for this.  
 The implementation should just have one protocol. It should follow the Solr's 
 idiom of making remote calls . A uri + params +[optional stream(s)] . The 
 response can be a stream of bytes.
 To make this work we must make the SolrServer implementation pluggable in 
 distributed search. Users should be able to choose between the current 
 CommonshttpSolrServer, or a HadoopRpcSolrServer . 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-03 Thread Shalin Shekhar Mangar (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678605#action_12678605
]

Shalin Shekhar Mangar commented on SOLR-1044:
-

bq. During the Oscars, the HTTP cache in front of our Solr farm had a 90% hit
rate. I think a 10X reduction in server load is a testimony to the superiority
of the HTTP approach.

Nobody is replacing HTTP with RPC :)

HTTP is great but on a distributed solr deployment, it can be a bottleneck, I
guess. I think if we do find RPC giving a better throughput than HTTP, the
distributed search part is the right place to start using it. We do not need to
move to non-HTTP communication (at least not now).

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Yonik Seeley (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678105#action_12678105
]

Yonik Seeley commented on SOLR-1044:

Is our use of HTTP really a bottleneck?

My feeling has been that if we go to a call mechanism, it should be based on
something more standard that will have many off the shelf bindings - perl,
python, php, C, etc.

On the plus side of hadoop RPC, it could handle multiple requests per socket.
That can also be a potential weakness though I think... a slow reader or writer
for one request/response hangs up all the others.

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Ken Krugler (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678108#action_12678108
]

Ken Krugler commented on SOLR-1044:
---

I agree with both of Yonik's points:

# We'd first want to measure real-world performance before deciding that using
something other than HTTP was important.
# Using something other than HTTP has related costs that should be considered.

At Krugle we used Hadoop RPC to handle remote searchers. In general it worked
well, but we did run into the problem similar to what Yonik voiced as a
potential concern - occasionally a remote searcher would hang, and when that
happened the socket would essentially become a zombie. Under very heavy load
testing this wound up eventually causing the entire system to lock up.

Though we heard that there were subsequent changes to the Hadoop RPC that fixed
a number of similar bugs. Not sure about any details, though, and we never
re-ran tests with the latest Hadoop (at that time, which was about a year ago).

If there are performance issues, I would be curious if using a long-lasting
connection via keep-alive significantly reduces the overhead. I know that Jetty
(for example) has a very efficient implementation of the Comet web app model,
where you don't wind up needing a gazillion threads to handle many
requests/second.

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Eks Dev (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678121#action_12678121
]

Eks Dev commented on SOLR-1044:
---

I do not know much about Solr needs there, but we are using one of prehistoric
versions of hadoop RPC (no NIO version) as everything else proved to eat far
to much time (in 800+ rq/sec environment every millisecond counts). Creating
new Sockets is not working there as OSs start having problems to keep up with
this rate (especially with java , slower Socket release due to gc() latency).

We are anyhow contemplating to give etch (or thrift) a try. Etch looks like
really good peace of work, with great flexibility. Someone tried it?

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

2009-03-02 Thread Noble Paul (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12678242#action_12678242
]

Noble Paul commented on SOLR-1044:
--

bq.Is our use of HTTP really a bottleneck?
we are limited by the servlet engine's ability to serve requests . I guess it
would easily peak out at 600-800 req/sec .Whereas a NIO based system can serve
far more with lower latency (http://www.jboss.org/netty/performance.html). If
we have a request served out of cache (no lucene search involved) the only
overhead will be that of the HTTP . Then there is the overhead of servlet
engine itself . Moreover HTTP is not a very efficient for large volume small
sized requests

bq.My feeling has been that if we go to a call mechanism, it should be based on
something more standard that will have many off the shelf bindings - perl,
python, php, C, etc.

I agree. Hadoop looked like a simple RPC mechanism .

bq. That can also be a potential weakness though I think... a slow reader or
writer for one request/response hangs up all the others.

The requests on the server are served by multiple handlers (each one is a
thread). One request will not block another if there are enough
handlers/threads

Use Hadoop RPC for inter Solr communication
---

Key: SOLR-1044
URL: https://issues.apache.org/jira/browse/SOLR-1044
Project: Solr
Issue Type: New Feature
Components: search
Reporter: Noble Paul

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication

12 matches

Site Navigation

Mail list logo

Footer information