subject:"inconsistent results"

This might explain another thing I'm seeing. If I take a node down,
clusterstate.json still shows it as active. Also if I'm running 4 nodes,
take one down and assign it a new port, clusterstate.json will show 5 nodes
running.

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller markrmil...@gmail.com wrote:

 Nodes talk to ZooKeeper as well as to each other. You can see the
 addresses they are trying to use to communicate with each other in the
 'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
 the detected default may not be an address that other nodes can reach. As a
 limited example: for some reason my mac cannot talk to my linux box with
 its default detected host address of halfmetal:8983/solr - but the mac can
 reach my linux box if I use halfmetal.Local - so I have to override the
 published address of my linux box using the host attribute if I want to
 setup a cluster between my macbook and linux box.

 Each nodes talks to ZooKeeper to learn about the other nodes, including
 their addresses. Recovery is then done node to node using the appropriate
 addresses.


 - Mark Miller
 lucidimagination.com

 On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

  I'm still having issues replicating in my work environment. Can anyone
  explain how the replication mechanism works? Is it communicating across
  ports or through zookeeper to manager the process?
 
 
 
 
  On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker 
  mpar...@apogeeintegration.com wrote:
 
  All,
 
  I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
  apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
 Manifold
  using its crawler, and it looks like it's replicating fine once the
  documents are committed.
 
  This must be related to my environment somehow. Thanks for your help.
 
  Regards,
 
  Matt
 
  On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Matt:
 
  Just for paranoia's sake, when I was playing around with this (the
  _version_ thing was one of my problems too) I removed the entire data
  directory as well as the zoo_data directory between experiments (and
  recreated just the data dir). This included various index.2012
  files and the tlog directory on the theory that *maybe* there was some
  confusion happening on startup with an already-wonky index.
 
  If you have the energy and tried that it might be helpful information,
  but it may also be a total red-herring
 
  FWIW
  Erick
 
  On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com
  wrote:
  I assuming the windows configuration looked correct?
 
  Yeah, so far I can not spot any smoking gun...I'm confounded at the
  moment. I'll re read through everything once more...
 
  - Mark
 
 
 















--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Mark Miller

From every node in your cluster you can hit http://MACHINE1:8084/solr in your
browser and get a response?

On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:

My cloud instance finally tried to sync. It looks like it's having connection
issues, but I can bring the SOLR instance up in the browser so I'm not sure
why it cannot connect to it. I got the following condensed log output:

org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
I/O exception (java.net.ConnectException) caught when processing request:
Connection refused: connect

Retrying request

shard update error StdNode:
http://MACHINE1:8084/solr/:org.apache.solr.client.solrj.SolrServerException:
http://MACHINE1:8084/solr
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:
483)
..
..
..
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.connect0(Native Method)
..
..
..

try and ask http://MACHINE1:8084/solr to recover

Could not tell a replica to recover

org.apache.solr.client.solrj.SolrServerException: http://MACHINE1:8084/solr
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
...
...
...
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
..
..
..

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller markrmil...@gmail.com wrote:
Nodes talk to ZooKeeper as well as to each other. You can see the addresses
they are trying to use to communicate with each other in the 'cloud' view of
the Solr Admin UI. Sometimes you have to override these, as the detected
default may not be an address that other nodes can reach. As a limited
example: for some reason my mac cannot talk to my linux box with its default
detected host address of halfmetal:8983/solr - but the mac can reach my linux
box if I use halfmetal.Local - so I have to override the published address of
my linux box using the host attribute if I want to setup a cluster between my
macbook and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes, including their
addresses. Recovery is then done node to node using the appropriate addresses.

- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating across
ports or through zookeeper to manager the process?

On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

All,

I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson
erickerick...@gmail.comwrote:

Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog directory on the theory that *maybe* there was some
confusion happening on startup with an already-wonky index.

If you have the energy and tried that it might be helpful information,
but it may also be a total red-herring

FWIW
Erick

On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com
wrote:
I assuming the windows configuration looked correct?

Yeah, so far I can not spot any smoking gun...I'm confounded at the
moment. I'll re read through everything once more...

- Mark

--
This e-mail and any files transmitted with it may be proprietary. Please
note that any views or opinions presented in this e-mail are solely those of
the author and do not necessarily represent those of Apogee Integration.

- Mark Miller
lucidimagination.com

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

The cluster is running on one machine.

On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller markrmil...@gmail.com wrote:

From every node in your cluster you can hit http://MACHINE1:8084/solr in
your browser and get a response?

On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:

My cloud instance finally tried to sync. It looks like it's having
connection issues, but I can bring the SOLR instance up in the browser so
I'm not sure why it cannot connect to it. I got the following condensed log
output:

org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
I/O exception (java.net.ConnectException) caught when processing
request: Connection refused: connect

Retrying request

try and ask http://MACHINE1:8084/solr to recover

Could not tell a replica to recover

org.apache.solr.client.solrj.SolrServerException:
http://MACHINE1:8084/solr
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
...
...
...
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
..
..
..

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller markrmil...@gmail.com
wrote:
Nodes talk to ZooKeeper as well as to each other. You can see the
addresses they are trying to use to communicate with each other in the
'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
the detected default may not be an address that other nodes can reach. As a
limited example: for some reason my mac cannot talk to my linux box with
its default detected host address of halfmetal:8983/solr - but the mac can
reach my linux box if I use halfmetal.Local - so I have to override the
published address of my linux box using the host attribute if I want to
setup a cluster between my macbook and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes, including
their addresses. Recovery is then done node to node using the appropriate
addresses.

- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating across
ports or through zookeeper to manager the process?

On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

All,

I recreated the cluster on my machine at home (Windows 7, Java
1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson
erickerick...@gmail.comwrote:

Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog directory on the theory that *maybe* there was
some
confusion happening on startup with an already-wonky index.

If you have the energy and tried that it might be helpful
information,
but it may also be a total red-herring

FWIW
Erick

On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com
wrote:
I assuming the windows configuration looked correct?

Yeah, so far I can not spot any smoking gun...I'm confounded at the
moment. I'll re read through everything once more...

- Mark

--
This e-mail and any files transmitted with it may be proprietary.
Please note that any views or opinions presented in this e-mail are solely
those of the author and do not necessarily represent those of Apogee
Integration.

- Mark Miller
lucidimagination.com

--
This e-mail and any files transmitted with it may

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-18 Thread Darren Govoni

I think he's asking if all the nodes (same machine or not) return a
response. Presumably you have different ports for each node since they
are on the same machine.

On Sun, 2012-03-18 at 14:44 -0400, Matthew Parker wrote:
The cluster is running on one machine.

On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller markrmil...@gmail.com wrote:

From every node in your cluster you can hit http://MACHINE1:8084/solr in
your browser and get a response?

On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:

My cloud instance finally tried to sync. It looks like it's having
connection issues, but I can bring the SOLR instance up in the browser so
I'm not sure why it cannot connect to it. I got the following condensed log
output:

org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
I/O exception (java.net.ConnectException) caught when processing
request: Connection refused: connect

Retrying request

try and ask http://MACHINE1:8084/solr to recover

Could not tell a replica to recover

org.apache.solr.client.solrj.SolrServerException:
http://MACHINE1:8084/solr
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
...
...
...
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
..
..
..

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller markrmil...@gmail.com
wrote:
Nodes talk to ZooKeeper as well as to each other. You can see the
addresses they are trying to use to communicate with each other in the
'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
the detected default may not be an address that other nodes can reach. As a
limited example: for some reason my mac cannot talk to my linux box with
its default detected host address of halfmetal:8983/solr - but the mac can
reach my linux box if I use halfmetal.Local - so I have to override the
published address of my linux box using the host attribute if I want to
setup a cluster between my macbook and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes, including
their addresses. Recovery is then done node to node using the appropriate
addresses.

- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating across
ports or through zookeeper to manager the process?

On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

All,

I recreated the cluster on my machine at home (Windows 7, Java
1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson
erickerick...@gmail.comwrote:

Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog directory on the theory that *maybe* there was
some
confusion happening on startup with an already-wonky index.

If you have the energy and tried that it might be helpful
information,
but it may also be a total red-herring

FWIW
Erick

On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com
wrote:
I assuming the windows configuration looked correct?

Yeah, so far I can not spot any smoking gun...I'm confounded at the
moment. I'll re read through everything once more...

- Mark

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

I have nodes running on ports: 8081-8084

A couple of the other SOLR cloud nodes we complaining about not being talk
with 8081, which is the first node brought up in the cluster.

The startup process is:

1. start 3 zookeeper nodes

2. wait until complete

3. start first solr node.

4. wait until complete

5. start remaining 3 solr nodes.

I wiped the zookeper and solr nodes data directories to start fresh.

Another question: Would a Tika Exception cause the nodes not to replicate?
I can see the documents being commited on the first solr node, but nothing
replicates to the other 3.

On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller markrmil...@gmail.com wrote:

From every node in your cluster you can hit http://MACHINE1:8084/solr in
your browser and get a response?

On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:

My cloud instance finally tried to sync. It looks like it's having
connection issues, but I can bring the SOLR instance up in the browser so
I'm not sure why it cannot connect to it. I got the following condensed log
output:

org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
I/O exception (java.net.ConnectException) caught when processing
request: Connection refused: connect

Retrying request

try and ask http://MACHINE1:8084/solr to recover

Could not tell a replica to recover

org.apache.solr.client.solrj.SolrServerException:
http://MACHINE1:8084/solr
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
...
...
...
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
..
..
..

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller markrmil...@gmail.com
wrote:
Nodes talk to ZooKeeper as well as to each other. You can see the
addresses they are trying to use to communicate with each other in the
'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
the detected default may not be an address that other nodes can reach. As a
limited example: for some reason my mac cannot talk to my linux box with
its default detected host address of halfmetal:8983/solr - but the mac can
reach my linux box if I use halfmetal.Local - so I have to override the
published address of my linux box using the host attribute if I want to
setup a cluster between my macbook and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes, including
their addresses. Recovery is then done node to node using the appropriate
addresses.

- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating across
ports or through zookeeper to manager the process?

On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

All,

I recreated the cluster on my machine at home (Windows 7, Java
1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson
erickerick...@gmail.comwrote:

Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog directory on the theory that *maybe* there was
some
confusion happening on startup with an already-wonky index.

If you have the energy and tried that it might be helpful
information,
but it may also be a total red-herring

FWIW
Erick

On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com
wrote:
I assuming the windows configuration looked correct?

Yeah, so far I can

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

I had tried importing data from Manifold, and one document threw a Tika
Exception.

If I shut everything down and restart SOLR cloud, the system sync'd on
startup.

Could extraction errors be the issue?

On Sun, Mar 18, 2012 at 2:50 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

I have nodes running on ports: 8081-8084

A couple of the other SOLR cloud nodes we complaining about not being talk
with 8081, which is the first node brought up in the cluster.

The startup process is:

1. start 3 zookeeper nodes

2. wait until complete

3. start first solr node.

4. wait until complete

5. start remaining 3 solr nodes.

I wiped the zookeper and solr nodes data directories to start fresh.

Another question: Would a Tika Exception cause the nodes not to replicate?
I can see the documents being commited on the first solr node, but nothing
replicates to the other 3.

On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller markrmil...@gmail.comwrote:

From every node in your cluster you can hit http://MACHINE1:8084/solr in
your browser and get a response?

On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:

My cloud instance finally tried to sync. It looks like it's having
connection issues, but I can bring the SOLR instance up in the browser so
I'm not sure why it cannot connect to it. I got the following condensed log
output:

org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
I/O exception (java.net.ConnectException) caught when processing
request: Connection refused: connect

Retrying request

try and ask http://MACHINE1:8084/solr to recover

Could not tell a replica to recover

org.apache.solr.client.solrj.SolrServerException:
http://MACHINE1:8084/solr
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
...
...
...
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
..
..
..

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller markrmil...@gmail.com
wrote:
Nodes talk to ZooKeeper as well as to each other. You can see the
addresses they are trying to use to communicate with each other in the
'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
the detected default may not be an address that other nodes can reach. As a
limited example: for some reason my mac cannot talk to my linux box with
its default detected host address of halfmetal:8983/solr - but the mac can
reach my linux box if I use halfmetal.Local - so I have to override the
published address of my linux box using the host attribute if I want to
setup a cluster between my macbook and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes, including
their addresses. Recovery is then done node to node using the appropriate
addresses.

- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating
across
ports or through zookeeper to manager the process?

On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

All,

I recreated the cluster on my machine at home (Windows 7, Java
1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson
erickerick...@gmail.comwrote:

Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire
data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog directory on the theory that *maybe* there was
some
confusion happening on startup with an already-wonky index.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

That idea was short lived. I excluded the document. The cluster isn't
syncing even after shutting everything down and restarting.

On Sun, Mar 18, 2012 at 2:58 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

I had tried importing data from Manifold, and one document threw a Tika
Exception.

If I shut everything down and restart SOLR cloud, the system sync'd on
startup.

Could extraction errors be the issue?

On Sun, Mar 18, 2012 at 2:50 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

I have nodes running on ports: 8081-8084

A couple of the other SOLR cloud nodes we complaining about not being
talk with 8081, which is the first node brought up in the cluster.

The startup process is:

1. start 3 zookeeper nodes

2. wait until complete

3. start first solr node.

4. wait until complete

5. start remaining 3 solr nodes.

I wiped the zookeper and solr nodes data directories to start fresh.

Another question: Would a Tika Exception cause the nodes not to
replicate? I can see the documents being commited on the first solr node,
but nothing replicates to the other 3.

On Sun, Mar 18, 2012 at 2:07 PM, Mark Miller markrmil...@gmail.comwrote:

From every node in your cluster you can hit http://MACHINE1:8084/solrin
your browser and get a response?

On Mar 18, 2012, at 1:46 PM, Matthew Parker wrote:

My cloud instance finally tried to sync. It looks like it's having
connection issues, but I can bring the SOLR instance up in the browser so
I'm not sure why it cannot connect to it. I got the following condensed log
output:

org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
I/O exception (java.net.ConnectException) caught when processing
request: Connection refused: connect

Retrying request

try and ask http://MACHINE1:8084/solr to recover

Could not tell a replica to recover

org.apache.solr.client.solrj.SolrServerException:
http://MACHINE1:8084/solr
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:483)
...
...
...
Caused by: java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native method)
..
..
..

On Sat, Mar 17, 2012 at 10:10 PM, Mark Miller markrmil...@gmail.com
wrote:
Nodes talk to ZooKeeper as well as to each other. You can see the
addresses they are trying to use to communicate with each other in the
'cloud' view of the Solr Admin UI. Sometimes you have to override these, as
the detected default may not be an address that other nodes can reach. As a
limited example: for some reason my mac cannot talk to my linux box with
its default detected host address of halfmetal:8983/solr - but the mac can
reach my linux box if I use halfmetal.Local - so I have to override the
published address of my linux box using the host attribute if I want to
setup a cluster between my macbook and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes,
including their addresses. Recovery is then done node to node using the
appropriate addresses.

- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

I'm still having issues replicating in my work environment. Can
anyone
explain how the replication mechanism works? Is it communicating
across
ports or through zookeeper to manager the process?

On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:

All,

I recreated the cluster on my machine at home (Windows 7, Java
1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through
Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your
help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson
erickerick...@gmail.comwrote:

Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire
data
directory as well as the zoo_data directory between

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-17 Thread Mark Miller

Nodes talk to ZooKeeper as well as to each other. You can see the addresses 
they are trying to use to communicate with each other in the 'cloud' view of 
the Solr Admin UI. Sometimes you have to override these, as the detected 
default may not be an address that other nodes can reach. As a limited example: 
for some reason my mac cannot talk to my linux box with its default detected 
host address of halfmetal:8983/solr - but the mac can reach my linux box if I 
use halfmetal.Local - so I have to override the published address of my linux 
box using the host attribute if I want to setup a cluster between my macbook 
and linux box.

Each nodes talks to ZooKeeper to learn about the other nodes, including their 
addresses. Recovery is then done node to node using the appropriate addresses.


- Mark Miller
lucidimagination.com

On Mar 16, 2012, at 3:00 PM, Matthew Parker wrote:

 I'm still having issues replicating in my work environment. Can anyone
 explain how the replication mechanism works? Is it communicating across
 ports or through zookeeper to manager the process?
 
 
 
 
 On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker 
 mpar...@apogeeintegration.com wrote:
 
 All,
 
 I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
 apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
 using its crawler, and it looks like it's replicating fine once the
 documents are committed.
 
 This must be related to my environment somehow. Thanks for your help.
 
 Regards,
 
 Matt
 
 On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
 Matt:
 
 Just for paranoia's sake, when I was playing around with this (the
 _version_ thing was one of my problems too) I removed the entire data
 directory as well as the zoo_data directory between experiments (and
 recreated just the data dir). This included various index.2012
 files and the tlog directory on the theory that *maybe* there was some
 confusion happening on startup with an already-wonky index.
 
 If you have the energy and tried that it might be helpful information,
 but it may also be a total red-herring
 
 FWIW
 Erick
 
 On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com
 wrote:
 I assuming the windows configuration looked correct?
 
 Yeah, so far I can not spot any smoking gun...I'm confounded at the
 moment. I'll re read through everything once more...
 
 - Mark

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-16 Thread Matthew Parker

I'm still having issues replicating in my work environment. Can anyone
explain how the replication mechanism works? Is it communicating across
ports or through zookeeper to manager the process?




On Thu, Mar 8, 2012 at 10:57 PM, Matthew Parker 
mpar...@apogeeintegration.com wrote:

 All,

 I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
 apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
 using its crawler, and it looks like it's replicating fine once the
 documents are committed.

 This must be related to my environment somehow. Thanks for your help.

 Regards,

 Matt

 On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson erickerick...@gmail.comwrote:

 Matt:

 Just for paranoia's sake, when I was playing around with this (the
 _version_ thing was one of my problems too) I removed the entire data
 directory as well as the zoo_data directory between experiments (and
 recreated just the data dir). This included various index.2012
 files and the tlog directory on the theory that *maybe* there was some
 confusion happening on startup with an already-wonky index.

 If you have the energy and tried that it might be helpful information,
 but it may also be a total red-herring

 FWIW
 Erick

 On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com
 wrote:
  I assuming the windows configuration looked correct?
 
  Yeah, so far I can not spot any smoking gun...I'm confounded at the
 moment. I'll re read through everything once more...
 
  - Mark

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-08 Thread Matthew Parker

All,

I recreated the cluster on my machine at home (Windows 7, Java 1.6.0.23,
apache-solr-4.0-2012-02-29_09-07-30) , sent some document through Manifold
using its crawler, and it looks like it's replicating fine once the
documents are committed.

This must be related to my environment somehow. Thanks for your help.

Regards,

Matt

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson erickerick...@gmail.comwrote:

 Matt:

 Just for paranoia's sake, when I was playing around with this (the
 _version_ thing was one of my problems too) I removed the entire data
 directory as well as the zoo_data directory between experiments (and
 recreated just the data dir). This included various index.2012
 files and the tlog directory on the theory that *maybe* there was some
 confusion happening on startup with an already-wonky index.

 If you have the energy and tried that it might be helpful information,
 but it may also be a total red-herring

 FWIW
 Erick

 On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com wrote:
  I assuming the windows configuration looked correct?
 
  Yeah, so far I can not spot any smoking gun...I'm confounded at the
 moment. I'll re read through everything once more...
 
  - Mark


--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-02 Thread Erick Erickson

Matt:

Just for paranoia's sake, when I was playing around with this (the
_version_ thing was one of my problems too) I removed the entire data
directory as well as the zoo_data directory between experiments (and
recreated just the data dir). This included various index.2012
files and the tlog directory on the theory that *maybe* there was some
confusion happening on startup with an already-wonky index.

If you have the energy and tried that it might be helpful information,
but it may also be a total red-herring

FWIW
Erick

On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com wrote:
 I assuming the windows configuration looked correct?

 Yeah, so far I can not spot any smoking gun...I'm confounded at the moment. 
 I'll re read through everything once more...

 - Mark

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-02 Thread Matthew Parker

I've ensured the SOLR data subdirectories and files were completed cleaned
out, but the issue still occurs.

On Fri, Mar 2, 2012 at 9:06 AM, Erick Erickson erickerick...@gmail.comwrote:

 Matt:

 Just for paranoia's sake, when I was playing around with this (the
 _version_ thing was one of my problems too) I removed the entire data
 directory as well as the zoo_data directory between experiments (and
 recreated just the data dir). This included various index.2012
 files and the tlog directory on the theory that *maybe* there was some
 confusion happening on startup with an already-wonky index.

 If you have the energy and tried that it might be helpful information,
 but it may also be a total red-herring

 FWIW
 Erick

 On Thu, Mar 1, 2012 at 8:28 PM, Mark Miller markrmil...@gmail.com wrote:
  I assuming the windows configuration looked correct?
 
  Yeah, so far I can not spot any smoking gun...I'm confounded at the
 moment. I'll re read through everything once more...
 
  - Mark


--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker

 commands to start the other 3 instances from
 their
   home directories:
  
   java -Djetty.port=8082 -Dhostport=8082
   -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
  
   java -Djetty.port=8083 -Dhostport=8083
   -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
  
   java -Djetty.port=8084 -Dhostport=8084
   -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
  
   All startup without issue.
  
   Step 6 - Modified solrconfig.xml to have a custom request handler
   ===
  
   requestHandler name=/update/sharepoint startup=lazy
   class=solr.extraction.ExtractingRequestHandler
 lst name=defaults
str name=update.chainsharepoint-pipeline/str
str name=fmap.contenttext/str
str name=lowernamestrue/str
str name=uprefixignored/str
str name=caputreAttrtrue/str
str name=fmap.alinks/str
str name=fmap.divignored/str
 /lst
   /requestHandler
  
   updateRequestProcessorChain name=sharepoint-pipeline
  processor class=solr.processor.SignatureUpdateProcessorFactory
 bool name=enabledtrue/bool
 str name=signatureFieldid/str
 bool name=owerrightDupestrue/bool
 str name=fieldsurl/str
 str
 name=signatureClasssolr.processor.Lookup3Signature/str
  /processor
  processor class=solr.LogUpdateProcessorFactory/
  processor class=solr.RunUpdateProcessorFactory/
   /updateRequestProcessorChain
  
  
   Hopefully this will shed some light on why my configuration is having
   issues.
  
   Thanks for your help.
  
   Matt
  
  
  
   On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller markrmil...@gmail.com
  wrote:
  
   Hmm...this is very strange - there is nothing interesting in any of
  the
   logs?
  
   In clusterstate.json, all of the shards have an active state?
  
  
   There are quite a few of us doing exactly this setup recently, so
  there
   must be something we are missing here...
  
   Any info you can offer might help.
  
   - Mark
  
   On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
  
Mark,
   
I got the codebase from the 2/26/2012, and I got the same
  inconsistent
results.
   
I have solr running on four ports 8081-8084
   
8081 and 8082 are the leaders for shard 1, and shard 2,
 respectively
   
8083 - is assigned to shard 1
8084 - is assigned to shard 2
   
queries come in and sometime it seems the windows from 8081 and
 8083
   move
responding to the query but there are no results.
   
if the queries run on 8081/8082 or 8081/8084 then results come
 back
   ok.
   
The query is nothing more than: q=*:*
   
Regards,
   
Matt
   
   
On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
mpar...@apogeeintegration.com wrote:
   
I'll have to check on the commit situation. We have been pushing
   data from
SharePoint the last week or so. Would that somehow block the
   documents
moving between the solr instances?
   
I'll try another version tomorrow. Thanks for the suggestions.
   
On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller 
  markrmil...@gmail.com
   wrote:
   
Hmmm...all of that looks pretty normal...
   
Did a commit somehow fail on the other machine? When you view
 the
   stats
for the update handler, are there a lot of pending adds for on
 of
   the
nodes? Do the commit counts match across nodes?
   
You can also query an individual node with distrib=false to
 check
   that.
   
If you build is a month old, I'd honestly recommend you try
   upgrading as
well.
   
- Mark
   
On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
   
Here is most of the cluster state:
   
Connected to Zookeeper
localhost:2181, localhost: 2182, localhost:2183
   
/(v=0 children=7) 
 /CONFIGS(v=0, children=1)
/CONFIGURATION(v=0 children=25)
all the configuration files, velocity info,
  xslt,
   etc.
   
/NODE_STATES(v=0 children=4)
   MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
   
  
 state:active,core:,collection:collection1,node_name:...
   MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
   
  
 state:active,core:,collection:collection1,node_name:...
   MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
   
  
 state:active,core:,collection:collection1,node_name:...
   MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
   
  
 state:active,core:,collection:collection1,node_name:...
/ZOOKEEPER (v-0 children=1)
   QUOTA(v=0)
   
   
   
  
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
/LIVE_NODES (v=0 children=4)
   MACHINE1:8083_SOLR(ephemeral v=0)
   MACHINE1:8082_SOLR(ephemeral v=0)
   MACHINE1:8081_SOLR(ephemeral v=0)
   MACHINE1:8084_SOLR(ephemeral v=0)
/COLLECTIONS (v=1 children=1)
   COLLECTION1(v=0 children=2){configName:configuration1}
   LEADER_ELECT(v=0

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Mark Miller

 the following commands to start the other 3 instances from their
 home directories:
 
 java -Djetty.port=8082 -Dhostport=8082
 -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
 java -Djetty.port=8083 -Dhostport=8083
 -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
 java -Djetty.port=8084 -Dhostport=8084
 -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
 All startup without issue.
 
 Step 6 - Modified solrconfig.xml to have a custom request handler
 ===
 
 requestHandler name=/update/sharepoint startup=lazy
 class=solr.extraction.ExtractingRequestHandler
  lst name=defaults
 str name=update.chainsharepoint-pipeline/str
 str name=fmap.contenttext/str
 str name=lowernamestrue/str
 str name=uprefixignored/str
 str name=caputreAttrtrue/str
 str name=fmap.alinks/str
 str name=fmap.divignored/str
  /lst
 /requestHandler
 
 updateRequestProcessorChain name=sharepoint-pipeline
   processor class=solr.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldid/str
  bool name=owerrightDupestrue/bool
  str name=fieldsurl/str
  str name=signatureClasssolr.processor.Lookup3Signature/str
   /processor
   processor class=solr.LogUpdateProcessorFactory/
   processor class=solr.RunUpdateProcessorFactory/
 /updateRequestProcessorChain
 
 
 Hopefully this will shed some light on why my configuration is having
 issues.
 
 Thanks for your help.
 
 Matt
 
 
 
 On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 Hmm...this is very strange - there is nothing interesting in any of
 the
 logs?
 
 In clusterstate.json, all of the shards have an active state?
 
 
 There are quite a few of us doing exactly this setup recently, so
 there
 must be something we are missing here...
 
 Any info you can offer might help.
 
 - Mark
 
 On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
 
 Mark,
 
 I got the codebase from the 2/26/2012, and I got the same
 inconsistent
 results.
 
 I have solr running on four ports 8081-8084
 
 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
 
 8083 - is assigned to shard 1
 8084 - is assigned to shard 2
 
 queries come in and sometime it seems the windows from 8081 and 8083
 move
 responding to the query but there are no results.
 
 if the queries run on 8081/8082 or 8081/8084 then results come back
 ok.
 
 The query is nothing more than: q=*:*
 
 Regards,
 
 Matt
 
 
 On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
 mpar...@apogeeintegration.com wrote:
 
 I'll have to check on the commit situation. We have been pushing
 data from
 SharePoint the last week or so. Would that somehow block the
 documents
 moving between the solr instances?
 
 I'll try another version tomorrow. Thanks for the suggestions.
 
 On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller 
 markrmil...@gmail.com
 wrote:
 
 Hmmm...all of that looks pretty normal...
 
 Did a commit somehow fail on the other machine? When you view the
 stats
 for the update handler, are there a lot of pending adds for on of
 the
 nodes? Do the commit counts match across nodes?
 
 You can also query an individual node with distrib=false to check
 that.
 
 If you build is a month old, I'd honestly recommend you try
 upgrading as
 well.
 
 - Mark
 
 On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
 
 Here is most of the cluster state:
 
 Connected to Zookeeper
 localhost:2181, localhost: 2182, localhost:2183
 
 /(v=0 children=7) 
 /CONFIGS(v=0, children=1)
/CONFIGURATION(v=0 children=25)
all the configuration files, velocity info,
 xslt,
 etc.
 
 /NODE_STATES(v=0 children=4)
   MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
   MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
   MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
   MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
 /ZOOKEEPER (v-0 children=1)
   QUOTA(v=0)
 
 
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
 /LIVE_NODES (v=0 children=4)
   MACHINE1:8083_SOLR(ephemeral v=0)
   MACHINE1:8082_SOLR(ephemeral v=0)
   MACHINE1:8081_SOLR(ephemeral v=0)
   MACHINE1:8084_SOLR(ephemeral v=0)
 /COLLECTIONS (v=1 children=1)
   COLLECTION1(v=0 children=2){configName:configuration1}
   LEADER_ELECT(v=0 children=2)
   SHARD1(V=0 children=1)
   ELECTION(v=0 children=2)
 
 87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
 87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
   SHARD2(v=0 children=1)
   ELECTION(v=0 children=2)
 
 231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral
 v=0)
 
 159243797356740611

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker

\zkServer.cmd
start .\zookeeper2\bin\zkServer.cmd
start .\zookeeper3\bin\zkServer.cmd
   
Step 4 - Start Main SOLR instance
==
I ran the following command to start the main SOLR instance
   
java -Djetty.port=8081 -Dhostport=8081
-Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar
 start.jar
   
Starts up fine.
   
Step 5 - Start the Remaining 3 SOLR Instances
==
I ran the following commands to start the other 3 instances from
 their
home directories:
   
java -Djetty.port=8082 -Dhostport=8082
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar
 start.jar
   
java -Djetty.port=8083 -Dhostport=8083
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar
 start.jar
   
java -Djetty.port=8084 -Dhostport=8084
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar
 start.jar
   
All startup without issue.
   
Step 6 - Modified solrconfig.xml to have a custom request handler
===
   
requestHandler name=/update/sharepoint startup=lazy
class=solr.extraction.ExtractingRequestHandler
  lst name=defaults
 str name=update.chainsharepoint-pipeline/str
 str name=fmap.contenttext/str
 str name=lowernamestrue/str
 str name=uprefixignored/str
 str name=caputreAttrtrue/str
 str name=fmap.alinks/str
 str name=fmap.divignored/str
  /lst
/requestHandler
   
updateRequestProcessorChain name=sharepoint-pipeline
   processor
 class=solr.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldid/str
  bool name=owerrightDupestrue/bool
  str name=fieldsurl/str
  str
 name=signatureClasssolr.processor.Lookup3Signature/str
   /processor
   processor class=solr.LogUpdateProcessorFactory/
   processor class=solr.RunUpdateProcessorFactory/
/updateRequestProcessorChain
   
   
Hopefully this will shed some light on why my configuration is
 having
issues.
   
Thanks for your help.
   
Matt
   
   
   
On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller 
 markrmil...@gmail.com
   wrote:
   
Hmm...this is very strange - there is nothing interesting in any
 of
   the
logs?
   
In clusterstate.json, all of the shards have an active state?
   
   
There are quite a few of us doing exactly this setup recently, so
   there
must be something we are missing here...
   
Any info you can offer might help.
   
- Mark
   
On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
   
 Mark,

 I got the codebase from the 2/26/2012, and I got the same
   inconsistent
 results.

 I have solr running on four ports 8081-8084

 8081 and 8082 are the leaders for shard 1, and shard 2,
 respectively

 8083 - is assigned to shard 1
 8084 - is assigned to shard 2

 queries come in and sometime it seems the windows from 8081 and
 8083
move
 responding to the query but there are no results.

 if the queries run on 8081/8082 or 8081/8084 then results come
 back
ok.

 The query is nothing more than: q=*:*

 Regards,

 Matt


 On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
 mpar...@apogeeintegration.com wrote:

 I'll have to check on the commit situation. We have been
 pushing
data from
 SharePoint the last week or so. Would that somehow block the
documents
 moving between the solr instances?

 I'll try another version tomorrow. Thanks for the suggestions.

 On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller 
   markrmil...@gmail.com
wrote:

 Hmmm...all of that looks pretty normal...

 Did a commit somehow fail on the other machine? When you view
 the
stats
 for the update handler, are there a lot of pending adds for
 on of
the
 nodes? Do the commit counts match across nodes?

 You can also query an individual node with distrib=false to
 check
that.

 If you build is a month old, I'd honestly recommend you try
upgrading as
 well.

 - Mark

 On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:

 Here is most of the cluster state:

 Connected to Zookeeper
 localhost:2181, localhost: 2182, localhost:2183

 /(v=0 children=7) 
  /CONFIGS(v=0, children=1)
 /CONFIGURATION(v=0 children=25)
 all the configuration files, velocity info,
   xslt,
etc.

 /NODE_STATES(v=0 children=4)
MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,

   
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,

   
 state:active,core:,collection:collection1,node_name

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Matthew Parker

 the following command to start the main SOLR instance
 
  java -Djetty.port=8081 -Dhostport=8081
  -Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
  -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
  Starts up fine.
 
  Step 5 - Start the Remaining 3 SOLR Instances
  ==
  I ran the following commands to start the other 3 instances from
 their
  home directories:
 
  java -Djetty.port=8082 -Dhostport=8082
  -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
  java -Djetty.port=8083 -Dhostport=8083
  -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
  java -Djetty.port=8084 -Dhostport=8084
  -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
  All startup without issue.
 
  Step 6 - Modified solrconfig.xml to have a custom request handler
  ===
 
  requestHandler name=/update/sharepoint startup=lazy
  class=solr.extraction.ExtractingRequestHandler
   lst name=defaults
  str name=update.chainsharepoint-pipeline/str
  str name=fmap.contenttext/str
  str name=lowernamestrue/str
  str name=uprefixignored/str
  str name=caputreAttrtrue/str
  str name=fmap.alinks/str
  str name=fmap.divignored/str
   /lst
  /requestHandler
 
  updateRequestProcessorChain name=sharepoint-pipeline
processor class=solr.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   str name=signatureFieldid/str
   bool name=owerrightDupestrue/bool
   str name=fieldsurl/str
   str
 name=signatureClasssolr.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory/
processor class=solr.RunUpdateProcessorFactory/
  /updateRequestProcessorChain
 
 
  Hopefully this will shed some light on why my configuration is
 having
  issues.
 
  Thanks for your help.
 
  Matt
 
 
 
  On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  Hmm...this is very strange - there is nothing interesting in any of
  the
  logs?
 
  In clusterstate.json, all of the shards have an active state?
 
 
  There are quite a few of us doing exactly this setup recently, so
  there
  must be something we are missing here...
 
  Any info you can offer might help.
 
  - Mark
 
  On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
 
  Mark,
 
  I got the codebase from the 2/26/2012, and I got the same
  inconsistent
  results.
 
  I have solr running on four ports 8081-8084
 
  8081 and 8082 are the leaders for shard 1, and shard 2,
 respectively
 
  8083 - is assigned to shard 1
  8084 - is assigned to shard 2
 
  queries come in and sometime it seems the windows from 8081 and
 8083
  move
  responding to the query but there are no results.
 
  if the queries run on 8081/8082 or 8081/8084 then results come
 back
  ok.
 
  The query is nothing more than: q=*:*
 
  Regards,
 
  Matt
 
 
  On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
  mpar...@apogeeintegration.com wrote:
 
  I'll have to check on the commit situation. We have been pushing
  data from
  SharePoint the last week or so. Would that somehow block the
  documents
  moving between the solr instances?
 
  I'll try another version tomorrow. Thanks for the suggestions.
 
  On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller 
  markrmil...@gmail.com
  wrote:
 
  Hmmm...all of that looks pretty normal...
 
  Did a commit somehow fail on the other machine? When you view
 the
  stats
  for the update handler, are there a lot of pending adds for on
 of
  the
  nodes? Do the commit counts match across nodes?
 
  You can also query an individual node with distrib=false to
 check
  that.
 
  If you build is a month old, I'd honestly recommend you try
  upgrading as
  well.
 
  - Mark
 
  On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
 
  Here is most of the cluster state:
 
  Connected to Zookeeper
  localhost:2181, localhost: 2182, localhost:2183
 
  /(v=0 children=7) 
  /CONFIGS(v=0, children=1)
 /CONFIGURATION(v=0 children=25)
 all the configuration files, velocity info,
  xslt,
  etc.
 
  /NODE_STATES(v=0 children=4)
MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 
 
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 
 
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 
 
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 
 
 state:active,core:,collection:collection1,node_name:...
  /ZOOKEEPER (v-0 children=1)
QUOTA(v=0)
 
 
 
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
  /LIVE_NODES (v=0 children=4)
MACHINE1:8083_SOLR(ephemeral v=0)
MACHINE1:8082_SOLR(ephemeral v=0)
MACHINE1:8081_SOLR(ephemeral v=0)
MACHINE1:8084_SOLR(ephemeral v=0)
  /COLLECTIONS (v=1 children=1

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-03-01 Thread Mark Miller

 I assuming the windows configuration looked correct?

Yeah, so far I can not spot any smoking gun...I'm confounded at the moment. 
I'll re read through everything once more...

- Mark

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

Mark,

Nothing appears to be wrong in the logs. I wiped the indexes and imported
37 files from SharePoint using Manifold. All 37 make it in, but SOLR still
has issues with the results being inconsistent.

Let me run my setup by you, and see whether that is the issue?

On one machine, I have three zookeeper instances, four solr instances, and
a data directory for solr and zookeeper config data.

Step 1. I modified each zoo.xml configuration file to have:

Zookeeper 1 - Create /zookeeper1/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk1_data
clientPort=2181
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 1 - Create /[DATA_DIRECTORY]/zk1_data/myid with the following
contents:
==
1

Zookeep 2 - Create /zookeeper2/conf/zoo.cfg
==
tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk2_data
clientPort=2182
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 2 - Create /[DATA_DIRECTORY]/zk2_data/myid with the following
contents:
==
2

Zookeeper 3 - Create /zookeeper3/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=[DATA_DIRECTORY]/zk3_data
clientPort=2183
server.1=localhost:2888:3888
server.2=localhost:2889:3889
server.3=localhost:2890:3890

Zookeeper 3 - Create /[DATA_DIRECTORY]/zk3_data/myid with the following
contents:

3

Step 2 - SOLR Build
===

I pulled the latest SOLR trunk down. I built it with the following commands:

   ant example dist

I modified the solr.war files and added the solr cell and extraction
libraries to WEB-INF/lib. I couldn't get the extraction to work
any other way. Will zookeper pickup jar files stored with the rest of the
configuration files in Zookeeper?

I copied the contents of the example directory to each of my SOLR
directories.

Step 3 - Starting Zookeeper instances
===

I ran the following commands to start the zookeeper instances:

start .\zookeeper1\bin\zkServer.cmd
start .\zookeeper2\bin\zkServer.cmd
start .\zookeeper3\bin\zkServer.cmd

Step 4 - Start Main SOLR instance
==
I ran the following command to start the main SOLR instance

java -Djetty.port=8081 -Dhostport=8081
-Dbootstrap_configdir=[DATA_DIRECTORY]/solr/conf -Dnumshards=2
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

Starts up fine.

Step 5 - Start the Remaining 3 SOLR Instances
==
I ran the following commands to start the other 3 instances from their home
directories:

java -Djetty.port=8082 -Dhostport=8082
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

java -Djetty.port=8083 -Dhostport=8083
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

java -Djetty.port=8084 -Dhostport=8084
-Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar

All startup without issue.

Step 6 - Modified solrconfig.xml to have a custom request handler
===

requestHandler name=/update/sharepoint startup=lazy
class=solr.extraction.ExtractingRequestHandler
  lst name=defaults
 str name=update.chainsharepoint-pipeline/str
 str name=fmap.contenttext/str
 str name=lowernamestrue/str
 str name=uprefixignored/str
 str name=caputreAttrtrue/str
 str name=fmap.alinks/str
 str name=fmap.divignored/str
  /lst
/requestHandler

updateRequestProcessorChain name=sharepoint-pipeline
   processor class=solr.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldid/str
  bool name=owerrightDupestrue/bool
  str name=fieldsurl/str
  str name=signatureClasssolr.processor.Lookup3Signature/str
   /processor
   processor class=solr.LogUpdateProcessorFactory/
   processor class=solr.RunUpdateProcessorFactory/
/updateRequestProcessorChain


Hopefully this will shed some light on why my configuration is having
issues.

Thanks for your help.

Matt



On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller markrmil...@gmail.com wrote:

 Hmm...this is very strange - there is nothing interesting in any of the
 logs?

 In clusterstate.json, all of the shards have an active state?


 There are quite a few of us doing exactly this setup recently, so there
 must be something we are missing here...

 Any info you can offer might help.

 - Mark

 On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:

  Mark,
 
  I got the codebase from the 2/26/2012, and I got the same inconsistent
  results.
 
  I have solr running on four ports 8081-8084
 
  8081 and 8082 are the leaders for shard 1, and shard 2, respectively
 
  8083 - is assigned to shard 1
  8084 - is assigned to shard 2
 
  queries come in and sometime it seems

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

 the codebase from the 2/26/2012, and I got the same inconsistent
  results.
 
  I have solr running on four ports 8081-8084
 
  8081 and 8082 are the leaders for shard 1, and shard 2, respectively
 
  8083 - is assigned to shard 1
  8084 - is assigned to shard 2
 
  queries come in and sometime it seems the windows from 8081 and 8083
 move
  responding to the query but there are no results.
 
  if the queries run on 8081/8082 or 8081/8084 then results come back ok.
 
  The query is nothing more than: q=*:*
 
  Regards,
 
  Matt
 
 
  On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
  mpar...@apogeeintegration.com wrote:
 
  I'll have to check on the commit situation. We have been pushing data
 from
  SharePoint the last week or so. Would that somehow block the documents
  moving between the solr instances?
 
  I'll try another version tomorrow. Thanks for the suggestions.
 
  On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Hmmm...all of that looks pretty normal...
 
  Did a commit somehow fail on the other machine? When you view the
 stats
  for the update handler, are there a lot of pending adds for on of the
  nodes? Do the commit counts match across nodes?
 
  You can also query an individual node with distrib=false to check
 that.
 
  If you build is a month old, I'd honestly recommend you try upgrading
 as
  well.
 
  - Mark
 
  On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
 
  Here is most of the cluster state:
 
  Connected to Zookeeper
  localhost:2181, localhost: 2182, localhost:2183
 
  /(v=0 children=7) 
   /CONFIGS(v=0, children=1)
  /CONFIGURATION(v=0 children=25)
  all the configuration files, velocity info, xslt,
 etc.
 
  /NODE_STATES(v=0 children=4)
 MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
  /ZOOKEEPER (v-0 children=1)
 QUOTA(v=0)
 
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
  /LIVE_NODES (v=0 children=4)
 MACHINE1:8083_SOLR(ephemeral v=0)
 MACHINE1:8082_SOLR(ephemeral v=0)
 MACHINE1:8081_SOLR(ephemeral v=0)
 MACHINE1:8084_SOLR(ephemeral v=0)
  /COLLECTIONS (v=1 children=1)
 COLLECTION1(v=0 children=2){configName:configuration1}
 LEADER_ELECT(v=0 children=2)
 SHARD1(V=0 children=1)
 ELECTION(v=0 children=2)
 
  87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
  87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
 SHARD2(v=0 children=1)
 ELECTION(v=0 children=2)
 
  231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
  159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
 LEADERS (v=0 children=2)
 SHARD1 (ephemeral
  v=0){core:,node_name:MACHINE1:8081_solr,base_url:
  http://MACHINE1:8081/solr};
 SHARD2 (ephemeral
  v=0){core:,node_name:MACHINE1:8082_solr,base_url:
  http://MACHINE1:8082/solr};
  /OVERSEER_ELECT (v=0 children=2)
 ELECTION (v=0 children=4)
 231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
  v=0)
 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral
  v=0)
 159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
  v=0)
 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral
  v=0)
 LEADER (emphemeral
  v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
  On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
 
  On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
  Thanks for your reply Mark.
 
  I believe the build was towards the begining of the month. The
  solr.spec.version is 4.0.0.2012.01.10.38.09
 
  I cannot access the clusterstate.json contents. I clicked on it a
  couple
  of
  times, but nothing happens. Is that stored on disk somewhere?
 
  Are you using the new admin UI? That has recently been updated to
 work
  better with cloud - it had some troubles not too long ago. If you
 are,
  you
  should trying using the old admin UI's zookeeper page - that should
  show
  the cluster state.
 
  That being said, there has been a lot of bug fixes over the past
 month
  -
  so you may just want to update to a recent version.
 
 
  I configured a custom request handler to calculate an unique
 document
  id
  based on the file's url.
 
  On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller 
 markrmil...@gmail.com
  wrote:
 
  Hey Matt - is your build recent?
 
  Can you visit the cloud/zookeeper page in the admin and send the
  contents
  of the clusterstate.json node?
 
  Are you using a custom index

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

 this setup recently, so there
 must be something we are missing here...

 Any info you can offer might help.

 - Mark

 On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:

  Mark,
 
  I got the codebase from the 2/26/2012, and I got the same inconsistent
  results.
 
  I have solr running on four ports 8081-8084
 
  8081 and 8082 are the leaders for shard 1, and shard 2, respectively
 
  8083 - is assigned to shard 1
  8084 - is assigned to shard 2
 
  queries come in and sometime it seems the windows from 8081 and 8083
 move
  responding to the query but there are no results.
 
  if the queries run on 8081/8082 or 8081/8084 then results come back ok.
 
  The query is nothing more than: q=*:*
 
  Regards,
 
  Matt
 
 
  On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
  mpar...@apogeeintegration.com wrote:
 
  I'll have to check on the commit situation. We have been pushing data
 from
  SharePoint the last week or so. Would that somehow block the documents
  moving between the solr instances?
 
  I'll try another version tomorrow. Thanks for the suggestions.
 
  On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Hmmm...all of that looks pretty normal...
 
  Did a commit somehow fail on the other machine? When you view the
 stats
  for the update handler, are there a lot of pending adds for on of the
  nodes? Do the commit counts match across nodes?
 
  You can also query an individual node with distrib=false to check
 that.
 
  If you build is a month old, I'd honestly recommend you try
 upgrading as
  well.
 
  - Mark
 
  On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
 
  Here is most of the cluster state:
 
  Connected to Zookeeper
  localhost:2181, localhost: 2182, localhost:2183
 
  /(v=0 children=7) 
   /CONFIGS(v=0, children=1)
  /CONFIGURATION(v=0 children=25)
  all the configuration files, velocity info, xslt,
 etc.
 
  /NODE_STATES(v=0 children=4)
 MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
  /ZOOKEEPER (v-0 children=1)
 QUOTA(v=0)
 
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
  /LIVE_NODES (v=0 children=4)
 MACHINE1:8083_SOLR(ephemeral v=0)
 MACHINE1:8082_SOLR(ephemeral v=0)
 MACHINE1:8081_SOLR(ephemeral v=0)
 MACHINE1:8084_SOLR(ephemeral v=0)
  /COLLECTIONS (v=1 children=1)
 COLLECTION1(v=0 children=2){configName:configuration1}
 LEADER_ELECT(v=0 children=2)
 SHARD1(V=0 children=1)
 ELECTION(v=0 children=2)
 
  87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
  87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
 SHARD2(v=0 children=1)
 ELECTION(v=0 children=2)
 
  231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
  159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
 LEADERS (v=0 children=2)
 SHARD1 (ephemeral
  v=0){core:,node_name:MACHINE1:8081_solr,base_url:
  http://MACHINE1:8081/solr};
 SHARD2 (ephemeral
  v=0){core:,node_name:MACHINE1:8082_solr,base_url:
  http://MACHINE1:8082/solr};
  /OVERSEER_ELECT (v=0 children=2)
 ELECTION (v=0 children=4)
 231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
  v=0)
 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral
  v=0)
 159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
  v=0)
 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral
  v=0)
 LEADER (emphemeral
  v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
  On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com
 
  wrote:
 
 
  On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
  Thanks for your reply Mark.
 
  I believe the build was towards the begining of the month. The
  solr.spec.version is 4.0.0.2012.01.10.38.09
 
  I cannot access the clusterstate.json contents. I clicked on it a
  couple
  of
  times, but nothing happens. Is that stored on disk somewhere?
 
  Are you using the new admin UI? That has recently been updated to
 work
  better with cloud - it had some troubles not too long ago. If you
 are,
  you
  should trying using the old admin UI's zookeeper page - that should
  show
  the cluster state.
 
  That being said, there has been a lot of bug fixes over the past
 month
  -
  so you may just want to update to a recent version.
 
 
  I configured a custom request handler to calculate an unique
 document
  id
  based on the file's url.
 
  On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller 
 markrmil...@gmail.com

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Sami Siren

On Wed, Feb 29, 2012 at 7:03 PM, Matthew Parker
mpar...@apogeeintegration.com wrote:
 I also took out my requestHandler and used the standard /update/extract
 handler. Same result.

How did you install/start the system this time? The same way as
earlier? What kind of queries do you run?

Would it be possible for you to check out the latest version from svn.
In there we have some dev scripts for linux that can be used to setup
a test system easily (you need svn, jdk and ant).

Essentially the steps would be:

#Checkout the sources:
svn co http://svn.apache.org/repos/asf/lucene/dev/trunk

#build and start solrcloud (1 shard, no replicas)
cd solr/cloud-dev
sh ./control.sh rebuild
sh ./control.sh reinstall 1
sh ./control.sh start 1

#index content
java -jar ../example/exampledocs/post.jar ../example/exampledocs/*.xml

#after that you can run your queries

--
 Sami Siren

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

Sami,

I have the latest as of the 26th. My system is running on a standalone
network so it's not easy to get code updates without a wave of paperwork.

I installed as per the detailed instructions I laid out a couple of
messages ago from today (2/29/2012).

I'm running the following query:

http://localhost:8081/solr/collection1/select?q=*:*

which gets translated to the following:

http://localhost:8081/solr/collection1/select?q=*:*version=2.2start=0rows=10indent=on

I just tried it running only two solr nodes, and I get the same results.

Regards,

Matt

On Wed, Feb 29, 2012 at 12:25 PM, Sami Siren ssi...@gmail.com wrote:

 On Wed, Feb 29, 2012 at 7:03 PM, Matthew Parker
 mpar...@apogeeintegration.com wrote:
  I also took out my requestHandler and used the standard /update/extract
  handler. Same result.

 How did you install/start the system this time? The same way as
 earlier? What kind of queries do you run?

 Would it be possible for you to check out the latest version from svn.
 In there we have some dev scripts for linux that can be used to setup
 a test system easily (you need svn, jdk and ant).

 Essentially the steps would be:

 #Checkout the sources:
 svn co http://svn.apache.org/repos/asf/lucene/dev/trunk

 #build and start solrcloud (1 shard, no replicas)
 cd solr/cloud-dev
 sh ./control.sh rebuild
 sh ./control.sh reinstall 1
 sh ./control.sh start 1

 #index content
 java -jar ../example/exampledocs/post.jar ../example/exampledocs/*.xml

 #after that you can run your queries

 --
  Sami Siren


--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

=solr.processor.SignatureUpdateProcessorFactory
   bool name=enabledtrue/bool
   str name=signatureFieldid/str
   bool name=owerrightDupestrue/bool
   str name=fieldsurl/str
   str name=signatureClasssolr.processor.Lookup3Signature/str
/processor
processor class=solr.LogUpdateProcessorFactory/
processor class=solr.RunUpdateProcessorFactory/
 /updateRequestProcessorChain


 Hopefully this will shed some light on why my configuration is having
 issues.

 Thanks for your help.

 Matt



 On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller markrmil...@gmail.comwrote:

 Hmm...this is very strange - there is nothing interesting in any of the
 logs?

 In clusterstate.json, all of the shards have an active state?


 There are quite a few of us doing exactly this setup recently, so there
 must be something we are missing here...

 Any info you can offer might help.

 - Mark

 On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:

  Mark,
 
  I got the codebase from the 2/26/2012, and I got the same inconsistent
  results.
 
  I have solr running on four ports 8081-8084
 
  8081 and 8082 are the leaders for shard 1, and shard 2, respectively
 
  8083 - is assigned to shard 1
  8084 - is assigned to shard 2
 
  queries come in and sometime it seems the windows from 8081 and 8083
 move
  responding to the query but there are no results.
 
  if the queries run on 8081/8082 or 8081/8084 then results come back
 ok.
 
  The query is nothing more than: q=*:*
 
  Regards,
 
  Matt
 
 
  On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
  mpar...@apogeeintegration.com wrote:
 
  I'll have to check on the commit situation. We have been pushing
 data from
  SharePoint the last week or so. Would that somehow block the
 documents
  moving between the solr instances?
 
  I'll try another version tomorrow. Thanks for the suggestions.
 
  On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Hmmm...all of that looks pretty normal...
 
  Did a commit somehow fail on the other machine? When you view the
 stats
  for the update handler, are there a lot of pending adds for on of
 the
  nodes? Do the commit counts match across nodes?
 
  You can also query an individual node with distrib=false to check
 that.
 
  If you build is a month old, I'd honestly recommend you try
 upgrading as
  well.
 
  - Mark
 
  On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
 
  Here is most of the cluster state:
 
  Connected to Zookeeper
  localhost:2181, localhost: 2182, localhost:2183
 
  /(v=0 children=7) 
   /CONFIGS(v=0, children=1)
  /CONFIGURATION(v=0 children=25)
  all the configuration files, velocity info, xslt,
 etc.
 
  /NODE_STATES(v=0 children=4)
 MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 
 state:active,core:,collection:collection1,node_name:...
  /ZOOKEEPER (v-0 children=1)
 QUOTA(v=0)
 
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
  /LIVE_NODES (v=0 children=4)
 MACHINE1:8083_SOLR(ephemeral v=0)
 MACHINE1:8082_SOLR(ephemeral v=0)
 MACHINE1:8081_SOLR(ephemeral v=0)
 MACHINE1:8084_SOLR(ephemeral v=0)
  /COLLECTIONS (v=1 children=1)
 COLLECTION1(v=0 children=2){configName:configuration1}
 LEADER_ELECT(v=0 children=2)
 SHARD1(V=0 children=1)
 ELECTION(v=0 children=2)
 
  87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
  87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
 SHARD2(v=0 children=1)
 ELECTION(v=0 children=2)
 
  231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
  159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
 LEADERS (v=0 children=2)
 SHARD1 (ephemeral
  v=0){core:,node_name:MACHINE1:8081_solr,base_url:
  http://MACHINE1:8081/solr};
 SHARD2 (ephemeral
  v=0){core:,node_name:MACHINE1:8082_solr,base_url:
  http://MACHINE1:8082/solr};
  /OVERSEER_ELECT (v=0 children=2)
 ELECTION (v=0 children=4)
 
  231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
  v=0)
 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral
  v=0)
 
  159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
  v=0)
 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral
  v=0)
 LEADER (emphemeral
  v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
  On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller 
 markrmil...@gmail.com
  wrote:
 
 
  On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
  Thanks for your reply Mark.
 
  I believe the build was towards the begining of the month

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-29 Thread Mark Miller

 start.jar
 
  java -Djetty.port=8084 -Dhostport=8084
  -Dzkhost=localhost:2181,localhost:2182,localhost:2183 -jar start.jar
 
  All startup without issue.
 
  Step 6 - Modified solrconfig.xml to have a custom request handler
  ===
 
  requestHandler name=/update/sharepoint startup=lazy
  class=solr.extraction.ExtractingRequestHandler
lst name=defaults
   str name=update.chainsharepoint-pipeline/str
   str name=fmap.contenttext/str
   str name=lowernamestrue/str
   str name=uprefixignored/str
   str name=caputreAttrtrue/str
   str name=fmap.alinks/str
   str name=fmap.divignored/str
/lst
  /requestHandler
 
  updateRequestProcessorChain name=sharepoint-pipeline
 processor class=solr.processor.SignatureUpdateProcessorFactory
bool name=enabledtrue/bool
str name=signatureFieldid/str
bool name=owerrightDupestrue/bool
str name=fieldsurl/str
str name=signatureClasssolr.processor.Lookup3Signature/str
 /processor
 processor class=solr.LogUpdateProcessorFactory/
 processor class=solr.RunUpdateProcessorFactory/
  /updateRequestProcessorChain
 
 
  Hopefully this will shed some light on why my configuration is having
  issues.
 
  Thanks for your help.
 
  Matt
 
 
 
  On Tue, Feb 28, 2012 at 8:29 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Hmm...this is very strange - there is nothing interesting in any of
 the
  logs?
 
  In clusterstate.json, all of the shards have an active state?
 
 
  There are quite a few of us doing exactly this setup recently, so
 there
  must be something we are missing here...
 
  Any info you can offer might help.
 
  - Mark
 
  On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:
 
   Mark,
  
   I got the codebase from the 2/26/2012, and I got the same
 inconsistent
   results.
  
   I have solr running on four ports 8081-8084
  
   8081 and 8082 are the leaders for shard 1, and shard 2, respectively
  
   8083 - is assigned to shard 1
   8084 - is assigned to shard 2
  
   queries come in and sometime it seems the windows from 8081 and 8083
  move
   responding to the query but there are no results.
  
   if the queries run on 8081/8082 or 8081/8084 then results come back
  ok.
  
   The query is nothing more than: q=*:*
  
   Regards,
  
   Matt
  
  
   On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
   mpar...@apogeeintegration.com wrote:
  
   I'll have to check on the commit situation. We have been pushing
  data from
   SharePoint the last week or so. Would that somehow block the
  documents
   moving between the solr instances?
  
   I'll try another version tomorrow. Thanks for the suggestions.
  
   On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller 
 markrmil...@gmail.com
  wrote:
  
   Hmmm...all of that looks pretty normal...
  
   Did a commit somehow fail on the other machine? When you view the
  stats
   for the update handler, are there a lot of pending adds for on of
  the
   nodes? Do the commit counts match across nodes?
  
   You can also query an individual node with distrib=false to check
  that.
  
   If you build is a month old, I'd honestly recommend you try
  upgrading as
   well.
  
   - Mark
  
   On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
  
   Here is most of the cluster state:
  
   Connected to Zookeeper
   localhost:2181, localhost: 2182, localhost:2183
  
   /(v=0 children=7) 
/CONFIGS(v=0, children=1)
   /CONFIGURATION(v=0 children=25)
   all the configuration files, velocity info,
 xslt,
  etc.
  
   /NODE_STATES(v=0 children=4)
  MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
  
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
  
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
  
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
  
  state:active,core:,collection:collection1,node_name:...
   /ZOOKEEPER (v-0 children=1)
  QUOTA(v=0)
  
  
  
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
   /LIVE_NODES (v=0 children=4)
  MACHINE1:8083_SOLR(ephemeral v=0)
  MACHINE1:8082_SOLR(ephemeral v=0)
  MACHINE1:8081_SOLR(ephemeral v=0)
  MACHINE1:8084_SOLR(ephemeral v=0)
   /COLLECTIONS (v=1 children=1)
  COLLECTION1(v=0 children=2){configName:configuration1}
  LEADER_ELECT(v=0 children=2)
  SHARD1(V=0 children=1)
  ELECTION(v=0 children=2)
  
   87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
  
   87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
  SHARD2(v=0 children=1)
  ELECTION(v=0 children=2)
  
   231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral
 v=0)
  
   159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral
 v=0

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-28 Thread Matthew Parker

Mark,

I got the codebase from the 2/26/2012, and I got the same inconsistent
results.

I have solr running on four ports 8081-8084

8081 and 8082 are the leaders for shard 1, and shard 2, respectively

8083 - is assigned to shard 1
8084 - is assigned to shard 2

queries come in and sometime it seems the windows from 8081 and 8083 move
responding to the query but there are no results.

if the queries run on 8081/8082 or 8081/8084 then results come back ok.

The query is nothing more than: q=*:*

Regards,

Matt


On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
mpar...@apogeeintegration.com wrote:

 I'll have to check on the commit situation. We have been pushing data from
 SharePoint the last week or so. Would that somehow block the documents
 moving between the solr instances?

 I'll try another version tomorrow. Thanks for the suggestions.

 On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller markrmil...@gmail.comwrote:

 Hmmm...all of that looks pretty normal...

 Did a commit somehow fail on the other machine? When you view the stats
 for the update handler, are there a lot of pending adds for on of the
 nodes? Do the commit counts match across nodes?

 You can also query an individual node with distrib=false to check that.

 If you build is a month old, I'd honestly recommend you try upgrading as
 well.

 - Mark

 On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:

  Here is most of the cluster state:
 
  Connected to Zookeeper
  localhost:2181, localhost: 2182, localhost:2183
 
  /(v=0 children=7) 
/CONFIGS(v=0, children=1)
   /CONFIGURATION(v=0 children=25)
   all the configuration files, velocity info, xslt, etc.
 
   /NODE_STATES(v=0 children=4)
  MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
  state:active,core:,collection:collection1,node_name:...
  MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
  state:active,core:,collection:collection1,node_name:...
   /ZOOKEEPER (v-0 children=1)
  QUOTA(v=0)
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
   /LIVE_NODES (v=0 children=4)
  MACHINE1:8083_SOLR(ephemeral v=0)
  MACHINE1:8082_SOLR(ephemeral v=0)
  MACHINE1:8081_SOLR(ephemeral v=0)
  MACHINE1:8084_SOLR(ephemeral v=0)
   /COLLECTIONS (v=1 children=1)
  COLLECTION1(v=0 children=2){configName:configuration1}
  LEADER_ELECT(v=0 children=2)
  SHARD1(V=0 children=1)
  ELECTION(v=0 children=2)
 
  87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
  87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
  SHARD2(v=0 children=1)
  ELECTION(v=0 children=2)
 
  231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
  159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
  LEADERS (v=0 children=2)
  SHARD1 (ephemeral
  v=0){core:,node_name:MACHINE1:8081_solr,base_url:
  http://MACHINE1:8081/solr};
  SHARD2 (ephemeral
  v=0){core:,node_name:MACHINE1:8082_solr,base_url:
  http://MACHINE1:8082/solr};
   /OVERSEER_ELECT (v=0 children=2)
  ELECTION (v=0 children=4)
  231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
 v=0)
  87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral
 v=0)
  159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
 v=0)
  87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral
 v=0)
  LEADER (emphemeral
  v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
  On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
  On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
  Thanks for your reply Mark.
 
  I believe the build was towards the begining of the month. The
  solr.spec.version is 4.0.0.2012.01.10.38.09
 
  I cannot access the clusterstate.json contents. I clicked on it a
 couple
  of
  times, but nothing happens. Is that stored on disk somewhere?
 
  Are you using the new admin UI? That has recently been updated to work
  better with cloud - it had some troubles not too long ago. If you are,
 you
  should trying using the old admin UI's zookeeper page - that should
 show
  the cluster state.
 
  That being said, there has been a lot of bug fixes over the past month
 -
  so you may just want to update to a recent version.
 
 
  I configured a custom request handler to calculate an unique document
 id
  based on the file's url.
 
  On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller markrmil...@gmail.com
  wrote:
 
  Hey Matt - is your build recent?
 
  Can you visit the cloud/zookeeper page in the admin and send the
  contents
  of the clusterstate.json node?
 
  Are you using a custom index chain or anything out of the ordinary

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-28 Thread Mark Miller

Hmm...this is very strange - there is nothing interesting in any of the logs?

In clusterstate.json, all of the shards have an active state?


There are quite a few of us doing exactly this setup recently, so there must be 
something we are missing here...

Any info you can offer might help.

- Mark

On Feb 28, 2012, at 1:00 PM, Matthew Parker wrote:

 Mark,
 
 I got the codebase from the 2/26/2012, and I got the same inconsistent
 results.
 
 I have solr running on four ports 8081-8084
 
 8081 and 8082 are the leaders for shard 1, and shard 2, respectively
 
 8083 - is assigned to shard 1
 8084 - is assigned to shard 2
 
 queries come in and sometime it seems the windows from 8081 and 8083 move
 responding to the query but there are no results.
 
 if the queries run on 8081/8082 or 8081/8084 then results come back ok.
 
 The query is nothing more than: q=*:*
 
 Regards,
 
 Matt
 
 
 On Mon, Feb 27, 2012 at 9:26 PM, Matthew Parker 
 mpar...@apogeeintegration.com wrote:
 
 I'll have to check on the commit situation. We have been pushing data from
 SharePoint the last week or so. Would that somehow block the documents
 moving between the solr instances?
 
 I'll try another version tomorrow. Thanks for the suggestions.
 
 On Mon, Feb 27, 2012 at 5:34 PM, Mark Miller markrmil...@gmail.comwrote:
 
 Hmmm...all of that looks pretty normal...
 
 Did a commit somehow fail on the other machine? When you view the stats
 for the update handler, are there a lot of pending adds for on of the
 nodes? Do the commit counts match across nodes?
 
 You can also query an individual node with distrib=false to check that.
 
 If you build is a month old, I'd honestly recommend you try upgrading as
 well.
 
 - Mark
 
 On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:
 
 Here is most of the cluster state:
 
 Connected to Zookeeper
 localhost:2181, localhost: 2182, localhost:2183
 
 /(v=0 children=7) 
  /CONFIGS(v=0, children=1)
 /CONFIGURATION(v=0 children=25)
 all the configuration files, velocity info, xslt, etc.
 
 /NODE_STATES(v=0 children=4)
MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 state:active,core:,collection:collection1,node_name:...
MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 state:active,core:,collection:collection1,node_name:...
 /ZOOKEEPER (v-0 children=1)
QUOTA(v=0)
 
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
 /LIVE_NODES (v=0 children=4)
MACHINE1:8083_SOLR(ephemeral v=0)
MACHINE1:8082_SOLR(ephemeral v=0)
MACHINE1:8081_SOLR(ephemeral v=0)
MACHINE1:8084_SOLR(ephemeral v=0)
 /COLLECTIONS (v=1 children=1)
COLLECTION1(v=0 children=2){configName:configuration1}
LEADER_ELECT(v=0 children=2)
SHARD1(V=0 children=1)
ELECTION(v=0 children=2)
 
 87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
 87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
SHARD2(v=0 children=1)
ELECTION(v=0 children=2)
 
 231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
 159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
LEADERS (v=0 children=2)
SHARD1 (ephemeral
 v=0){core:,node_name:MACHINE1:8081_solr,base_url:
 http://MACHINE1:8081/solr};
SHARD2 (ephemeral
 v=0){core:,node_name:MACHINE1:8082_solr,base_url:
 http://MACHINE1:8082/solr};
 /OVERSEER_ELECT (v=0 children=2)
ELECTION (v=0 children=4)
231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral
 v=0)
87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral
 v=0)
159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral
 v=0)
87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral
 v=0)
LEADER (emphemeral
 v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
 On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
 On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
 Thanks for your reply Mark.
 
 I believe the build was towards the begining of the month. The
 solr.spec.version is 4.0.0.2012.01.10.38.09
 
 I cannot access the clusterstate.json contents. I clicked on it a
 couple
 of
 times, but nothing happens. Is that stored on disk somewhere?
 
 Are you using the new admin UI? That has recently been updated to work
 better with cloud - it had some troubles not too long ago. If you are,
 you
 should trying using the old admin UI's zookeeper page - that should
 show
 the cluster state.
 
 That being said, there has been a lot of bug fixes over the past month
 -
 so you may just want to update to a recent version.
 
 
 I configured a custom request handler to calculate an unique document
 id
 based on the file's url

Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

TWIMC:

Environment
=
Apache SOLR rev-1236154
Apache Zookeeper 3.3.4
Windows 7
JDK 1.6.0_23.b05

I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
servers.

I created a 3 node zookeeper ensemble to manage the solr configuration data.

All the instances run on one server so I've had to move ports around for
the various applications.

I start the 3 zookeeper nodes.

I started the first instance of solr cloud with the parameter to have two
shards.

The start the remaining 3 solr nodes.

The system comes up fine. No errors thrown.

I can view the solr cloud console and I can see the SOLR configuration
files managed by ZooKeeper.

I published data into the SOLR Cloud instances from SharePoint using Apache
Manifold 0.4-incubating. Manifold is setup to publish the data into
collection1, which is the only collection defined in the cluster.

When I query the data from collection1 as per the solr wiki, the results
are inconsistent. Sometimes all the results are there, other times nothing
comes back at all.

It seems to be having an issue auto replicating the data across the cloud.

Is there some specific setting I might have missed? Based upon what I read,
I thought that SOLR cloud would take care of distributing and replicating
the data automatically. Do you have to tell it what shard to publish the
data into as well?

Any help would be appreciated.

Thanks,

Matt

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller

Hey Matt - is your build recent?

Can you visit the cloud/zookeeper page in the admin and send the contents of 
the clusterstate.json node?

Are you using a custom index chain or anything out of the ordinary?


- Mark

On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:

 TWIMC:
 
 Environment
 =
 Apache SOLR rev-1236154
 Apache Zookeeper 3.3.4
 Windows 7
 JDK 1.6.0_23.b05
 
 I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
 servers.
 
 I created a 3 node zookeeper ensemble to manage the solr configuration data.
 
 All the instances run on one server so I've had to move ports around for
 the various applications.
 
 I start the 3 zookeeper nodes.
 
 I started the first instance of solr cloud with the parameter to have two
 shards.
 
 The start the remaining 3 solr nodes.
 
 The system comes up fine. No errors thrown.
 
 I can view the solr cloud console and I can see the SOLR configuration
 files managed by ZooKeeper.
 
 I published data into the SOLR Cloud instances from SharePoint using Apache
 Manifold 0.4-incubating. Manifold is setup to publish the data into
 collection1, which is the only collection defined in the cluster.
 
 When I query the data from collection1 as per the solr wiki, the results
 are inconsistent. Sometimes all the results are there, other times nothing
 comes back at all.
 
 It seems to be having an issue auto replicating the data across the cloud.
 
 Is there some specific setting I might have missed? Based upon what I read,
 I thought that SOLR cloud would take care of distributing and replicating
 the data automatically. Do you have to tell it what shard to publish the
 data into as well?
 
 Any help would be appreciated.
 
 Thanks,
 
 Matt
 
 --
 This e-mail and any files transmitted with it may be proprietary.  Please 
 note that any views or opinions presented in this e-mail are solely those of 
 the author and do not necessarily represent those of Apogee Integration.

- Mark Miller
lucidimagination.com

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

Thanks for your reply Mark.

I believe the build was towards the begining of the month. The
solr.spec.version is 4.0.0.2012.01.10.38.09

I cannot access the clusterstate.json contents. I clicked on it a couple of
times, but nothing happens. Is that stored on disk somewhere?

I configured a custom request handler to calculate an unique document id
based on the file's url.

On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller markrmil...@gmail.com wrote:

 Hey Matt - is your build recent?

 Can you visit the cloud/zookeeper page in the admin and send the contents
 of the clusterstate.json node?

 Are you using a custom index chain or anything out of the ordinary?


 - Mark

 On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:

  TWIMC:
 
  Environment
  =
  Apache SOLR rev-1236154
  Apache Zookeeper 3.3.4
  Windows 7
  JDK 1.6.0_23.b05
 
  I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
  servers.
 
  I created a 3 node zookeeper ensemble to manage the solr configuration
 data.
 
  All the instances run on one server so I've had to move ports around for
  the various applications.
 
  I start the 3 zookeeper nodes.
 
  I started the first instance of solr cloud with the parameter to have two
  shards.
 
  The start the remaining 3 solr nodes.
 
  The system comes up fine. No errors thrown.
 
  I can view the solr cloud console and I can see the SOLR configuration
  files managed by ZooKeeper.
 
  I published data into the SOLR Cloud instances from SharePoint using
 Apache
  Manifold 0.4-incubating. Manifold is setup to publish the data into
  collection1, which is the only collection defined in the cluster.
 
  When I query the data from collection1 as per the solr wiki, the results
  are inconsistent. Sometimes all the results are there, other times
 nothing
  comes back at all.
 
  It seems to be having an issue auto replicating the data across the
 cloud.
 
  Is there some specific setting I might have missed? Based upon what I
 read,
  I thought that SOLR cloud would take care of distributing and replicating
  the data automatically. Do you have to tell it what shard to publish the
  data into as well?
 
  Any help would be appreciated.
 
  Thanks,
 
  Matt
 
  --
  This e-mail and any files transmitted with it may be proprietary.
  Please note that any views or opinions presented in this e-mail are solely
 those of the author and do not necessarily represent those of Apogee
 Integration.

 - Mark Miller
 lucidimagination.com














-- 
Regards,

Matt Parker (CTR)
Senior Software Architect
Apogee Integration, LLC
5180 Parkstone Drive, Suite #160
Chantilly, Virginia 20151
703.272.4797 (site)
703.474.1918 (cell)
www.apogeeintegration.com

--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller


On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:

 Thanks for your reply Mark.
 
 I believe the build was towards the begining of the month. The
 solr.spec.version is 4.0.0.2012.01.10.38.09
 
 I cannot access the clusterstate.json contents. I clicked on it a couple of
 times, but nothing happens. Is that stored on disk somewhere?

Are you using the new admin UI? That has recently been updated to work better 
with cloud - it had some troubles not too long ago. If you are, you should 
trying using the old admin UI's zookeeper page - that should show the cluster 
state.

That being said, there has been a lot of bug fixes over the past month - so you 
may just want to update to a recent version.

 
 I configured a custom request handler to calculate an unique document id
 based on the file's url.
 
 On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller markrmil...@gmail.com wrote:
 
 Hey Matt - is your build recent?
 
 Can you visit the cloud/zookeeper page in the admin and send the contents
 of the clusterstate.json node?
 
 Are you using a custom index chain or anything out of the ordinary?
 
 
 - Mark
 
 On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
 
 TWIMC:
 
 Environment
 =
 Apache SOLR rev-1236154
 Apache Zookeeper 3.3.4
 Windows 7
 JDK 1.6.0_23.b05
 
 I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
 servers.
 
 I created a 3 node zookeeper ensemble to manage the solr configuration
 data.
 
 All the instances run on one server so I've had to move ports around for
 the various applications.
 
 I start the 3 zookeeper nodes.
 
 I started the first instance of solr cloud with the parameter to have two
 shards.
 
 The start the remaining 3 solr nodes.
 
 The system comes up fine. No errors thrown.
 
 I can view the solr cloud console and I can see the SOLR configuration
 files managed by ZooKeeper.
 
 I published data into the SOLR Cloud instances from SharePoint using
 Apache
 Manifold 0.4-incubating. Manifold is setup to publish the data into
 collection1, which is the only collection defined in the cluster.
 
 When I query the data from collection1 as per the solr wiki, the results
 are inconsistent. Sometimes all the results are there, other times
 nothing
 comes back at all.
 
 It seems to be having an issue auto replicating the data across the
 cloud.
 
 Is there some specific setting I might have missed? Based upon what I
 read,
 I thought that SOLR cloud would take care of distributing and replicating
 the data automatically. Do you have to tell it what shard to publish the
 data into as well?
 
 Any help would be appreciated.
 
 Thanks,
 
 Matt
 
 --
 This e-mail and any files transmitted with it may be proprietary.
 Please note that any views or opinions presented in this e-mail are solely
 those of the author and do not necessarily represent those of Apogee
 Integration.
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -- 
 Regards,
 
 Matt Parker (CTR)
 Senior Software Architect
 Apogee Integration, LLC
 5180 Parkstone Drive, Suite #160
 Chantilly, Virginia 20151
 703.272.4797 (site)
 703.474.1918 (cell)
 www.apogeeintegration.com
 
 --
 This e-mail and any files transmitted with it may be proprietary.  Please 
 note that any views or opinions presented in this e-mail are solely those of 
 the author and do not necessarily represent those of Apogee Integration.

- Mark Miller
lucidimagination.com

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

I was trying to use the new interface. I see it using the old admin page.

Is there a piece of it you're interested in? I don't have access to the
Internet where it exists so it would mean transcribing it.

On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com wrote:


 On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:

  Thanks for your reply Mark.
 
  I believe the build was towards the begining of the month. The
  solr.spec.version is 4.0.0.2012.01.10.38.09
 
  I cannot access the clusterstate.json contents. I clicked on it a couple
 of
  times, but nothing happens. Is that stored on disk somewhere?

 Are you using the new admin UI? That has recently been updated to work
 better with cloud - it had some troubles not too long ago. If you are, you
 should trying using the old admin UI's zookeeper page - that should show
 the cluster state.

 That being said, there has been a lot of bug fixes over the past month -
 so you may just want to update to a recent version.

 
  I configured a custom request handler to calculate an unique document id
  based on the file's url.
 
  On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Hey Matt - is your build recent?
 
  Can you visit the cloud/zookeeper page in the admin and send the
 contents
  of the clusterstate.json node?
 
  Are you using a custom index chain or anything out of the ordinary?
 
 
  - Mark
 
  On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
 
  TWIMC:
 
  Environment
  =
  Apache SOLR rev-1236154
  Apache Zookeeper 3.3.4
  Windows 7
  JDK 1.6.0_23.b05
 
  I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
  servers.
 
  I created a 3 node zookeeper ensemble to manage the solr configuration
  data.
 
  All the instances run on one server so I've had to move ports around
 for
  the various applications.
 
  I start the 3 zookeeper nodes.
 
  I started the first instance of solr cloud with the parameter to have
 two
  shards.
 
  The start the remaining 3 solr nodes.
 
  The system comes up fine. No errors thrown.
 
  I can view the solr cloud console and I can see the SOLR configuration
  files managed by ZooKeeper.
 
  I published data into the SOLR Cloud instances from SharePoint using
  Apache
  Manifold 0.4-incubating. Manifold is setup to publish the data into
  collection1, which is the only collection defined in the cluster.
 
  When I query the data from collection1 as per the solr wiki, the
 results
  are inconsistent. Sometimes all the results are there, other times
  nothing
  comes back at all.
 
  It seems to be having an issue auto replicating the data across the
  cloud.
 
  Is there some specific setting I might have missed? Based upon what I
  read,
  I thought that SOLR cloud would take care of distributing and
 replicating
  the data automatically. Do you have to tell it what shard to publish
 the
  data into as well?
 
  Any help would be appreciated.
 
  Thanks,
 
  Matt
 
  --
  This e-mail and any files transmitted with it may be proprietary.
  Please note that any views or opinions presented in this e-mail are
 solely
  those of the author and do not necessarily represent those of Apogee
  Integration.
 
  - Mark Miller
  lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  --
  Regards,
 
  Matt Parker (CTR)
  Senior Software Architect
  Apogee Integration, LLC
  5180 Parkstone Drive, Suite #160
  Chantilly, Virginia 20151
  703.272.4797 (site)
  703.474.1918 (cell)
  www.apogeeintegration.com
 
  --
  This e-mail and any files transmitted with it may be proprietary.
  Please note that any views or opinions presented in this e-mail are solely
 those of the author and do not necessarily represent those of Apogee
 Integration.

 - Mark Miller
 lucidimagination.com













--
This e-mail and any files transmitted with it may be proprietary.  Please note 
that any views or opinions presented in this e-mail are solely those of the 
author and do not necessarily represent those of Apogee Integration.

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

Here is most of the cluster state:

Connected to Zookeeper
localhost:2181, localhost: 2182, localhost:2183

/(v=0 children=7) 
   /CONFIGS(v=0, children=1)
  /CONFIGURATION(v=0 children=25)
  all the configuration files, velocity info, xslt, etc.

  /NODE_STATES(v=0 children=4)
 MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
state:active,core:,collection:collection1,node_name:...
 MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
state:active,core:,collection:collection1,node_name:...
 MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
state:active,core:,collection:collection1,node_name:...
 MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
state:active,core:,collection:collection1,node_name:...
  /ZOOKEEPER (v-0 children=1)
 QUOTA(v=0)

/CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
  /LIVE_NODES (v=0 children=4)
 MACHINE1:8083_SOLR(ephemeral v=0)
 MACHINE1:8082_SOLR(ephemeral v=0)
 MACHINE1:8081_SOLR(ephemeral v=0)
 MACHINE1:8084_SOLR(ephemeral v=0)
  /COLLECTIONS (v=1 children=1)
 COLLECTION1(v=0 children=2){configName:configuration1}
 LEADER_ELECT(v=0 children=2)
 SHARD1(V=0 children=1)
 ELECTION(v=0 children=2)

87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)

87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
 SHARD2(v=0 children=1)
 ELECTION(v=0 children=2)

231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)

159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
 LEADERS (v=0 children=2)
 SHARD1 (ephemeral
v=0){core:,node_name:MACHINE1:8081_solr,base_url:
http://MACHINE1:8081/solr};
 SHARD2 (ephemeral
v=0){core:,node_name:MACHINE1:8082_solr,base_url:
http://MACHINE1:8082/solr};
  /OVERSEER_ELECT (v=0 children=2)
 ELECTION (v=0 children=4)
 231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral v=0)
 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral v=0)
 159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral v=0)
 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral v=0)
 LEADER (emphemeral
v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}



On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com wrote:


 On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:

  Thanks for your reply Mark.
 
  I believe the build was towards the begining of the month. The
  solr.spec.version is 4.0.0.2012.01.10.38.09
 
  I cannot access the clusterstate.json contents. I clicked on it a couple
 of
  times, but nothing happens. Is that stored on disk somewhere?

 Are you using the new admin UI? That has recently been updated to work
 better with cloud - it had some troubles not too long ago. If you are, you
 should trying using the old admin UI's zookeeper page - that should show
 the cluster state.

 That being said, there has been a lot of bug fixes over the past month -
 so you may just want to update to a recent version.

 
  I configured a custom request handler to calculate an unique document id
  based on the file's url.
 
  On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
  Hey Matt - is your build recent?
 
  Can you visit the cloud/zookeeper page in the admin and send the
 contents
  of the clusterstate.json node?
 
  Are you using a custom index chain or anything out of the ordinary?
 
 
  - Mark
 
  On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
 
  TWIMC:
 
  Environment
  =
  Apache SOLR rev-1236154
  Apache Zookeeper 3.3.4
  Windows 7
  JDK 1.6.0_23.b05
 
  I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
  servers.
 
  I created a 3 node zookeeper ensemble to manage the solr configuration
  data.
 
  All the instances run on one server so I've had to move ports around
 for
  the various applications.
 
  I start the 3 zookeeper nodes.
 
  I started the first instance of solr cloud with the parameter to have
 two
  shards.
 
  The start the remaining 3 solr nodes.
 
  The system comes up fine. No errors thrown.
 
  I can view the solr cloud console and I can see the SOLR configuration
  files managed by ZooKeeper.
 
  I published data into the SOLR Cloud instances from SharePoint using
  Apache
  Manifold 0.4-incubating. Manifold is setup to publish the data into
  collection1, which is the only collection defined in the cluster.
 
  When I query the data from collection1 as per the solr wiki, the
 results
  are inconsistent. Sometimes all the results are there, other times
  nothing
  comes back at all.
 
  It seems to be having an issue auto replicating the data across the
  cloud.
 
  Is there some specific setting I might have missed? Based upon what I
  read,
  I thought that SOLR cloud would take care of distributing and
 replicating
  the data automatically. Do you have to tell it what shard to publish

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes

2012-02-27 Thread Mark Miller

Hmmm...all of that looks pretty normal...

Did a commit somehow fail on the other machine? When you view the stats for the 
update handler, are there a lot of pending adds for on of the nodes? Do the 
commit counts match across nodes?

You can also query an individual node with distrib=false to check that.

If you build is a month old, I'd honestly recommend you try upgrading as well.

- Mark

On Feb 27, 2012, at 3:34 PM, Matthew Parker wrote:

 Here is most of the cluster state:
 
 Connected to Zookeeper
 localhost:2181, localhost: 2182, localhost:2183
 
 /(v=0 children=7) 
   /CONFIGS(v=0, children=1)
  /CONFIGURATION(v=0 children=25)
  all the configuration files, velocity info, xslt, etc.
 
  /NODE_STATES(v=0 children=4)
 MACHINE1:8083_SOLR (v=121)[{shard_id:shard1,
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8082_SOLR (v=101)[{shard_id:shard2,
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8081_SOLR (v=92)[{shard_id:shard1,
 state:active,core:,collection:collection1,node_name:...
 MACHINE1:8084_SOLR (v=73)[{shard_id:shard2,
 state:active,core:,collection:collection1,node_name:...
  /ZOOKEEPER (v-0 children=1)
 QUOTA(v=0)
 
 /CLUSTERSTATE.JSON(V=272){collection1:{shard1:{MACHINE1:8081_solr_:{shard_id:shard1,leader:true,...
  /LIVE_NODES (v=0 children=4)
 MACHINE1:8083_SOLR(ephemeral v=0)
 MACHINE1:8082_SOLR(ephemeral v=0)
 MACHINE1:8081_SOLR(ephemeral v=0)
 MACHINE1:8084_SOLR(ephemeral v=0)
  /COLLECTIONS (v=1 children=1)
 COLLECTION1(v=0 children=2){configName:configuration1}
 LEADER_ELECT(v=0 children=2)
 SHARD1(V=0 children=1)
 ELECTION(v=0 children=2)
 
 87186203314552835-MACHINE1:8081_SOLR_-N_96(ephemeral v=0)
 
 87186203314552836-MACHINE1:8083_SOLR_-N_84(ephemeral v=0)
 SHARD2(v=0 children=1)
 ELECTION(v=0 children=2)
 
 231301391392833539-MACHINE1:8084_SOLR_-N_85(ephemeral v=0)
 
 159243797356740611-MACHINE1:8082_SOLR_-N_84(ephemeral v=0)
 LEADERS (v=0 children=2)
 SHARD1 (ephemeral
 v=0){core:,node_name:MACHINE1:8081_solr,base_url:
 http://MACHINE1:8081/solr};
 SHARD2 (ephemeral
 v=0){core:,node_name:MACHINE1:8082_solr,base_url:
 http://MACHINE1:8082/solr};
  /OVERSEER_ELECT (v=0 children=2)
 ELECTION (v=0 children=4)
 231301391392833539-MACHINE1:8084_SOLR_-N_000251(ephemeral v=0)
 87186203314552835-MACHINE1:8081_SOLR_-N_000248(ephemeral v=0)
 159243797356740611-MACHINE1:8082_SOLR_-N_000250(ephemeral v=0)
 87186203314552836-MACHINE1:8083_SOLR_-N_000249(ephemeral v=0)
 LEADER (emphemeral
 v=0){id:87186203314552835-MACHINE1:8081_solr-n_00248}
 
 
 
 On Mon, Feb 27, 2012 at 2:47 PM, Mark Miller markrmil...@gmail.com wrote:
 
 
 On Feb 27, 2012, at 2:22 PM, Matthew Parker wrote:
 
 Thanks for your reply Mark.
 
 I believe the build was towards the begining of the month. The
 solr.spec.version is 4.0.0.2012.01.10.38.09
 
 I cannot access the clusterstate.json contents. I clicked on it a couple
 of
 times, but nothing happens. Is that stored on disk somewhere?
 
 Are you using the new admin UI? That has recently been updated to work
 better with cloud - it had some troubles not too long ago. If you are, you
 should trying using the old admin UI's zookeeper page - that should show
 the cluster state.
 
 That being said, there has been a lot of bug fixes over the past month -
 so you may just want to update to a recent version.
 
 
 I configured a custom request handler to calculate an unique document id
 based on the file's url.
 
 On Mon, Feb 27, 2012 at 1:13 PM, Mark Miller markrmil...@gmail.com
 wrote:
 
 Hey Matt - is your build recent?
 
 Can you visit the cloud/zookeeper page in the admin and send the
 contents
 of the clusterstate.json node?
 
 Are you using a custom index chain or anything out of the ordinary?
 
 
 - Mark
 
 On Feb 27, 2012, at 12:26 PM, Matthew Parker wrote:
 
 TWIMC:
 
 Environment
 =
 Apache SOLR rev-1236154
 Apache Zookeeper 3.3.4
 Windows 7
 JDK 1.6.0_23.b05
 
 I have built a SOLR Cloud instance with 4 nodes using the embeded Jetty
 servers.
 
 I created a 3 node zookeeper ensemble to manage the solr configuration
 data.
 
 All the instances run on one server so I've had to move ports around
 for
 the various applications.
 
 I start the 3 zookeeper nodes.
 
 I started the first instance of solr cloud with the parameter to have
 two
 shards.
 
 The start the remaining 3 solr nodes.
 
 The system comes up fine. No errors thrown.
 
 I can view the solr cloud console and I can see the SOLR configuration
 files managed by ZooKeeper.
 
 I published data into the SOLR Cloud instances from SharePoint using
 Apache
 Manifold 0.4-incubating. Manifold is setup to publish the data into
 collection1, which is the only collection defined in the cluster.
 
 When I query the data from collection1

Re: Inconsistent Results with ZooKeeper Ensemble and Four SOLR Cloud Nodes