subject:"Testing Solr"

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Erik Hatcher

Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing 
you mention, but since you’re already using SolrCloud how about simply 
upconfig’ing the configuration from the Git repo, create a temporary collection 
using that configset and smoke test it before making it ready for end 
client/customer/user use?   Maybe the configset and collection created for 
smoke testing are just temporary in order to validate it.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com 



> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] 
>  wrote:
> 
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
> 
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
> 
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
> 
> Any suggestions?
> 
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Alexandre Rafalovitch

Well, I guess NIH stands for Not Invented Here. No idea what NLM is for.

P.s. sorry, could not resist. I worked for orgs like that too :-(
On 1 Jan 2016 12:03 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
wrote:

> That's incredibly cool.   Much easier than the chef/puppet scripts and
> stuff I've seen.I'm certain to play with this and get under the hood;
> however, we locally don't have a permission to use AWS EC2 in this corner
> of NLM.There's some limited use of S3 and Glacier.   Maybe we'll
> negotiate EC2 for dev later this year, maybe not.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 11:40 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Makes sense.
>
> Answering the answer email in this thread, did you look at Solr Scale?
> Maybe it has the base infrastructure you need:
> https://github.com/LucidWorks/solr-scale-tk
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
> >> What is the next step you are stuck on?
> >>
> >> Regards,
> >>Alex
> >
> > I'm not really stuck.   My question has been about the best practices.
>  I am trying to work against "not-invented-here" syndrome,
> "only-useful-here" syndrome, and "boil-the-ocean" syndrome.I have to
> make the solution work with a Continuous Integration (CI) environment that
> will not be creating either docker images or VMs for each project, and so
> I've been seeking the wisdom of the crowd.
> >
> > -Original Message-
> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> > Sent: Thursday, December 31, 2015 12:42 AM
> > To: solr-user <solr-user@lucene.apache.org>
> > Subject: Re: Testing Solr configuration, schema, and other fields
> >
> > I might be just confused here, but I am not sure what your bottle neck
> actually is. You seem to know your critical path already, so how can we
> help?
> >
> > Starting new solr core from given configuration directory is easy.
> Catching hard errors from that is probably just gripping logs or a custom
> logger.
> >
> > And you don't seem to be talking about lint style soft sanity checks,
> but rather the initialization stopping hard checks.
> >
> > What is the next step you are stuck on?
> >
> > Regards,
> >Alex
> > On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]"
> > <daniel.da...@nih.gov>
> > wrote:
> >
> >> At my organization, I want to create a tool that allows users to keep a
> >> solr configuration as a Git repository.   Then, I want my Continuous
> >> Integration environment to take some branch of the git repository and
> >> "publish" it into ZooKeeper/SolrCloud.
> >>
> >> Working on my own, it is only a very small pain to note foolish errors
> >> I've made, fix them, and restart.However, I want my users to be
> able to
> >> edit their own Solr schema and config *most* of the time, at least on
> >> development servers.They will not have command-line access to these
> >> servers, and I want to avoid endless restarts.
> >>
> >> I'm not interested in fighting to maintain such a useless thing as a
> >> DTD/XSD without community support; what I really want to know is whether
> >> Solr will start and can index some sample documents.   I'm wondering
> >> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> >> and capture error messages/exceptions in a reasonable way. This tool
> >> could then be run by my users before they commit to git, and then
> >> again by the CI server before it "publishes" the configuration to
> >> ZooKeeper/SolrCloud.
> >>
> >> Any suggestions?
> >>
> >> Dan Davis, Systems/Applications Architect (Contractor), Office of
> >> Computer and Communications Systems, National Library of Medicine,
> >> NIH
> >>
> >>
>

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Alexandre Rafalovitch

Makes sense.

Answering the answer email in this thread, did you look at Solr Scale?
Maybe it has the base infrastructure you need:
https://github.com/LucidWorks/solr-scale-tk

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C]
<daniel.da...@nih.gov> wrote:
>> What is the next step you are stuck on?
>>
>> Regards,
>>Alex
>
> I'm not really stuck.   My question has been about the best practices.   I am 
> trying to work against "not-invented-here" syndrome, "only-useful-here" 
> syndrome, and "boil-the-ocean" syndrome.I have to make the solution work 
> with a Continuous Integration (CI) environment that will not be creating 
> either docker images or VMs for each project, and so I've been seeking the 
> wisdom of the crowd.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 12:42 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> I might be just confused here, but I am not sure what your bottle neck 
> actually is. You seem to know your critical path already, so how can we help?
>
> Starting new solr core from given configuration directory is easy. Catching 
> hard errors from that is probably just gripping logs or a custom logger.
>
> And you don't seem to be talking about lint style soft sanity checks, but 
> rather the initialization stopping hard checks.
>
> What is the next step you are stuck on?
>
> Regards,
>Alex
> On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
> wrote:
>
>> At my organization, I want to create a tool that allows users to keep a
>> solr configuration as a Git repository.   Then, I want my Continuous
>> Integration environment to take some branch of the git repository and
>> "publish" it into ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors
>> I've made, fix them, and restart.However, I want my users to be able to
>> edit their own Solr schema and config *most* of the time, at least on
>> development servers.They will not have command-line access to these
>> servers, and I want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a
>> DTD/XSD without community support; what I really want to know is whether
>> Solr will start and can index some sample documents.   I'm wondering
>> whether I might be able to build a tool to fire up an EmbeddedSolrServer
>> and capture error messages/exceptions in a reasonable way. This tool
>> could then be run by my users before they commit to git, and then
>> again by the CI server before it "publishes" the configuration to
>> ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of
>> Computer and Communications Systems, National Library of Medicine, NIH
>>
>>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

That's incredibly cool.   Much easier than the chef/puppet scripts and stuff 
I've seen.I'm certain to play with this and get under the hood; however, we 
locally don't have a permission to use AWS EC2 in this corner of NLM.
There's some limited use of S3 and Glacier.   Maybe we'll negotiate EC2 for dev 
later this year, maybe not.
 
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 31, 2015 11:40 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

Makes sense.

Answering the answer email in this thread, did you look at Solr Scale?
Maybe it has the base infrastructure you need:
https://github.com/LucidWorks/solr-scale-tk

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] 
<daniel.da...@nih.gov> wrote:
>> What is the next step you are stuck on?
>>
>> Regards,
>>Alex
>
> I'm not really stuck.   My question has been about the best practices.   I am 
> trying to work against "not-invented-here" syndrome, "only-useful-here" 
> syndrome, and "boil-the-ocean" syndrome.I have to make the solution work 
> with a Continuous Integration (CI) environment that will not be creating 
> either docker images or VMs for each project, and so I've been seeking the 
> wisdom of the crowd.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 12:42 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> I might be just confused here, but I am not sure what your bottle neck 
> actually is. You seem to know your critical path already, so how can we help?
>
> Starting new solr core from given configuration directory is easy. Catching 
> hard errors from that is probably just gripping logs or a custom logger.
>
> And you don't seem to be talking about lint style soft sanity checks, but 
> rather the initialization stopping hard checks.
>
> What is the next step you are stuck on?
>
> Regards,
>Alex
> On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" 
> <daniel.da...@nih.gov>
> wrote:
>
>> At my organization, I want to create a tool that allows users to keep a
>> solr configuration as a Git repository.   Then, I want my Continuous
>> Integration environment to take some branch of the git repository and 
>> "publish" it into ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors
>> I've made, fix them, and restart.However, I want my users to be able to
>> edit their own Solr schema and config *most* of the time, at least on
>> development servers.They will not have command-line access to these
>> servers, and I want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a 
>> DTD/XSD without community support; what I really want to know is whether
>> Solr will start and can index some sample documents.   I'm wondering
>> whether I might be able to build a tool to fire up an EmbeddedSolrServer
>> and capture error messages/exceptions in a reasonable way. This tool
>> could then be run by my users before they commit to git, and then 
>> again by the CI server before it "publishes" the configuration to 
>> ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of 
>> Computer and Communications Systems, National Library of Medicine, 
>> NIH
>>
>>

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Erick Erickson

Hmmm, a couple of things:

the bin/solr script could be used as a model in this scenario for
how to automate a lot of this. I'm thinking you can skip all the
argument parsing and that and just see how the SolrCLI jar file
is used to spin up collections, upload configs and the like. In fact,
assuming a unique collection name per developer you could
use a common dev SolrCloud setup for this.

Or heck, perhaps just use the bin/solr script for all of that...

The other thing I was assuming is that you don't _really_ care
about starting/stopping Solr, it's more the requirement for your
devs to upload the configs, reload a collection, find out whether
the collection is running or not, if not find the log files and see why
cycle you'd like to shorten

FWIW,
Erick

On Thu, Dec 31, 2015 at 8:31 AM, Davis, Daniel (NIH/NLM) [C]
<daniel.da...@nih.gov> wrote:
> Erik, that suggests an additional approach that seems to have "legs":
>
> * A webapp that acts as a sort of Cloud IDE for Solr configsets.   It 
> supports multiple projects and a single SolrCloud cluster.   For each 
> project, it upconfigs a git repository local to the webapp, and has the 
> ability to define tests that run against a "temporary" collection to verify 
> the configuration.
>
> * A command-line utility that upconfigs the configuration a local directory, 
> creates a temporary collection, and supports an optional "tests" by applying 
> an update query.
>
> Since the webapp would be based on something like the command-line utility 
> (maybe in library form), I think I'm still going to target the command-line 
> utility as my "minimum viable product".   I'll support SolrCloud first, and 
> then see about EmbeddedSolrServer.
>
> -Original Message-
> From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Sent: Thursday, December 31, 2015 10:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing 
> you mention, but since you’re already using SolrCloud how about simply 
> upconfig’ing the configuration from the Git repo, create a temporary 
> collection using that configset and smoke test it before making it ready for 
> end client/customer/user use?   Maybe the configset and collection created 
> for smoke testing are just temporary in order to validate it.
>
> —
> Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com 
> <http://www.lucidworks.com/>
>
>
>
>> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] 
>> <daniel.da...@nih.gov> wrote:
>>
>> At my organization, I want to create a tool that allows users to keep a solr 
>> configuration as a Git repository.   Then, I want my Continuous Integration 
>> environment to take some branch of the git repository and "publish" it into 
>> ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors I've 
>> made, fix them, and restart.However, I want my users to be able to edit 
>> their own Solr schema and config *most* of the time, at least on development 
>> servers.They will not have command-line access to these servers, and I 
>> want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
>> without community support; what I really want to know is whether Solr will 
>> start and can index some sample documents.   I'm wondering whether I might 
>> be able to build a tool to fire up an EmbeddedSolrServer and capture error 
>> messages/exceptions in a reasonable way. This tool could then be run by 
>> my users before they commit to git, and then again by the CI server before 
>> it "publishes" the configuration to ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of
>> Computer and Communications Systems, National Library of Medicine, NIH
>>
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

> What is the next step you are stuck on?
> 
> Regards,
>Alex

I'm not really stuck.   My question has been about the best practices.   I am 
trying to work against "not-invented-here" syndrome, "only-useful-here" 
syndrome, and "boil-the-ocean" syndrome.I have to make the solution work 
with a Continuous Integration (CI) environment that will not be creating either 
docker images or VMs for each project, and so I've been seeking the wisdom of 
the crowd.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 31, 2015 12:42 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

I might be just confused here, but I am not sure what your bottle neck actually 
is. You seem to know your critical path already, so how can we help?

Starting new solr core from given configuration directory is easy. Catching 
hard errors from that is probably just gripping logs or a custom logger.

And you don't seem to be talking about lint style soft sanity checks, but 
rather the initialization stopping hard checks.

What is the next step you are stuck on?

Regards,
   Alex
On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
wrote:

> At my organization, I want to create a tool that allows users to keep a
> solr configuration as a Git repository.   Then, I want my Continuous
> Integration environment to take some branch of the git repository and 
> "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors
> I've made, fix them, and restart.However, I want my users to be able to
> edit their own Solr schema and config *most* of the time, at least on
> development servers.They will not have command-line access to these
> servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a 
> DTD/XSD without community support; what I really want to know is whether
> Solr will start and can index some sample documents.   I'm wondering
> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> and capture error messages/exceptions in a reasonable way. This tool
> could then be run by my users before they commit to git, and then 
> again by the CI server before it "publishes" the configuration to 
> ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

Heh

National Library of Medicine (NLM) is all over the map in terms of 
"not-invented-here", being a large organization within a large organization.  
It's my personal tendency towards "not-invented-here" that concerns me.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 31, 2015 12:24 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: RE: Testing Solr configuration, schema, and other fields

Well, I guess NIH stands for Not Invented Here. No idea what NLM is for.

P.s. sorry, could not resist. I worked for orgs like that too :-( On 1 Jan 2016 
12:03 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
wrote:

> That's incredibly cool.   Much easier than the chef/puppet scripts and
> stuff I've seen.I'm certain to play with this and get under the hood;
> however, we locally don't have a permission to use AWS EC2 in this corner
> of NLM.There's some limited use of S3 and Glacier.   Maybe we'll
> negotiate EC2 for dev later this year, maybe not.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 11:40 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Makes sense.
>
> Answering the answer email in this thread, did you look at Solr Scale?
> Maybe it has the base infrastructure you need:
> https://github.com/LucidWorks/solr-scale-tk
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] < 
> daniel.da...@nih.gov> wrote:
> >> What is the next step you are stuck on?
> >>
> >> Regards,
> >>Alex
> >
> > I'm not really stuck.   My question has been about the best practices.
>  I am trying to work against "not-invented-here" syndrome,
> "only-useful-here" syndrome, and "boil-the-ocean" syndrome.I have to
> make the solution work with a Continuous Integration (CI) environment 
> that will not be creating either docker images or VMs for each 
> project, and so I've been seeking the wisdom of the crowd.
> >
> > -Original Message-
> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> > Sent: Thursday, December 31, 2015 12:42 AM
> > To: solr-user <solr-user@lucene.apache.org>
> > Subject: Re: Testing Solr configuration, schema, and other fields
> >
> > I might be just confused here, but I am not sure what your bottle 
> > neck
> actually is. You seem to know your critical path already, so how can 
> we help?
> >
> > Starting new solr core from given configuration directory is easy.
> Catching hard errors from that is probably just gripping logs or a 
> custom logger.
> >
> > And you don't seem to be talking about lint style soft sanity 
> > checks,
> but rather the initialization stopping hard checks.
> >
> > What is the next step you are stuck on?
> >
> > Regards,
> >Alex
> > On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]"
> > <daniel.da...@nih.gov>
> > wrote:
> >
> >> At my organization, I want to create a tool that allows users to keep a
> >> solr configuration as a Git repository.   Then, I want my Continuous
> >> Integration environment to take some branch of the git repository 
> >> and "publish" it into ZooKeeper/SolrCloud.
> >>
> >> Working on my own, it is only a very small pain to note foolish errors
> >> I've made, fix them, and restart.However, I want my users to be
> able to
> >> edit their own Solr schema and config *most* of the time, at least on
> >> development servers.They will not have command-line access to these
> >> servers, and I want to avoid endless restarts.
> >>
> >> I'm not interested in fighting to maintain such a useless thing as 
> >> a DTD/XSD without community support; what I really want to know is whether
> >> Solr will start and can index some sample documents.   I'm wondering
> >> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> >> and capture error messages/exceptions in a reasonable way. This tool
> >> could then be run by my users before they commit to git, and then 
> >> again by the CI server before it "publishes" the configuration to 
> >> ZooKeeper/SolrCloud.
> >>
> >> Any suggestions?
> >>
> >> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> >> Computer and Communications Systems, National Library of Medicine, 
> >> NIH
> >>
> >>
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

Erik, that suggests an additional approach that seems to have "legs":

* A webapp that acts as a sort of Cloud IDE for Solr configsets.   It supports 
multiple projects and a single SolrCloud cluster.   For each project, it 
upconfigs a git repository local to the webapp, and has the ability to define 
tests that run against a "temporary" collection to verify the configuration.

* A command-line utility that upconfigs the configuration a local directory, 
creates a temporary collection, and supports an optional "tests" by applying an 
update query.

Since the webapp would be based on something like the command-line utility 
(maybe in library form), I think I'm still going to target the command-line 
utility as my "minimum viable product".   I'll support SolrCloud first, and 
then see about EmbeddedSolrServer.

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Thursday, December 31, 2015 10:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Testing Solr configuration, schema, and other fields

Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing 
you mention, but since you’re already using SolrCloud how about simply 
upconfig’ing the configuration from the Git repo, create a temporary collection 
using that configset and smoke test it before making it ready for end 
client/customer/user use?   Maybe the configset and collection created for 
smoke testing are just temporary in order to validate it.

—
Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com 
<http://www.lucidworks.com/>

> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] 
> <daniel.da...@nih.gov> wrote:
> 
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
> 
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
> 
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
> 
> Any suggestions?
> 
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>

Re: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Alexandre Rafalovitch

I might be just confused here, but I am not sure what your bottle neck
actually is. You seem to know your critical path already, so how can we
help?

Starting new solr core from given configuration directory is easy. Catching
hard errors from that is probably just gripping logs or a custom logger.

And you don't seem to be talking about lint style soft sanity checks, but
rather the initialization stopping hard checks.

What is the next step you are stuck on?

Regards,
   Alex
On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" 
wrote:

> At my organization, I want to create a tool that allows users to keep a
> solr configuration as a Git repository.   Then, I want my Continuous
> Integration environment to take some branch of the git repository and
> "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors
> I've made, fix them, and restart.However, I want my users to be able to
> edit their own Solr schema and config *most* of the time, at least on
> development servers.They will not have command-line access to these
> servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a
> DTD/XSD without community support; what I really want to know is whether
> Solr will start and can index some sample documents.   I'm wondering
> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> and capture error messages/exceptions in a reasonable way. This tool
> could then be run by my users before they commit to git, and then again by
> the CI server before it "publishes" the configuration to
> ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
>

Testing Solr configuration, schema, and other fields

2015-12-30 Thread Davis, Daniel (NIH/NLM) [C]

At my organization, I want to create a tool that allows users to keep a solr 
configuration as a Git repository.   Then, I want my Continuous Integration 
environment to take some branch of the git repository and "publish" it into 
ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've 
made, fix them, and restart.However, I want my users to be able to edit 
their own Solr schema and config *most* of the time, at least on development 
servers.They will not have command-line access to these servers, and I want 
to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
without community support; what I really want to know is whether Solr will 
start and can index some sample documents.   I'm wondering whether I might be 
able to build a tool to fire up an EmbeddedSolrServer and capture error 
messages/exceptions in a reasonable way. This tool could then be run by my 
users before they commit to git, and then again by the CI server before it 
"publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

RE: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Davis, Daniel (NIH/NLM) [C]

Your bottom line point is that EmbeddedSolrServer is different, and some 
configurations will not work on it where they would work on a SolrCloud.   This 
is well taken.   Maybe creating a new collection on existing dev nodes could be 
done.

As far as VDI and Puppet.   My requirements are different because my 
organization is different.   I would prefer not to go into how different.   I 
have written puppet modules for other system configurations, tested them on AWS 
EC2, and yet those modules have not been adopted by my organization.


-Original Message-
From: Mark Horninger [mailto:mhornin...@grayhairsoftware.com] 
Sent: Wednesday, December 30, 2015 3:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Testing Solr configuration, schema, and other fields

Daniel,


Sounds almost like you're reinventing the wheel.  Could you possibly automate 
this through puppet or Chef?  With a VDI environment, then all you would need 
to do is build a new VM Node based on original setup.  Then you can just roll 
out the node as one of the zk nodes.

Just a thought on that subject.

v/r,

-Mark H.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
Sent: Wednesday, December 30, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Testing Solr configuration, schema, and other fields

At my organization, I want to create a tool that allows users to keep a solr 
configuration as a Git repository.   Then, I want my Continuous Integration 
environment to take some branch of the git repository and "publish" it into 
ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've 
made, fix them, and restart.However, I want my users to be able to edit 
their own Solr schema and config *most* of the time, at least on development 
servers.They will not have command-line access to these servers, and I want 
to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
without community support; what I really want to know is whether Solr will 
start and can index some sample documents.   I'm wondering whether I might be 
able to build a tool to fire up an EmbeddedSolrServer and capture error 
messages/exceptions in a reasonable way. This tool could then be run by my 
users before they commit to git, and then again by the CI server before it 
"publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and 
Communications Systems, National Library of Medicine, NIH

[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution of this information is 
prohibited, and may be punishable by law. If this was sent to you in error, 
please notify the sender by reply e-mail and destroy all copies of the original 
message.

GrayHair Software <http://www.grayhairSoftware.com>

RE: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Mark Horninger

Daniel,


Sounds almost like you're reinventing the wheel.  Could you possibly automate 
this through puppet or Chef?  With a VDI environment, then all you would need 
to do is build a new VM Node based on original setup.  Then you can just roll 
out the node as one of the zk nodes.

Just a thought on that subject.

v/r,

-Mark H.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
Sent: Wednesday, December 30, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Testing Solr configuration, schema, and other fields

At my organization, I want to create a tool that allows users to keep a solr 
configuration as a Git repository.   Then, I want my Continuous Integration 
environment to take some branch of the git repository and "publish" it into 
ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've 
made, fix them, and restart.However, I want my users to be able to edit 
their own Solr schema and config *most* of the time, at least on development 
servers.They will not have command-line access to these servers, and I want 
to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
without community support; what I really want to know is whether Solr will 
start and can index some sample documents.   I'm wondering whether I might be 
able to build a tool to fire up an EmbeddedSolrServer and capture error 
messages/exceptions in a reasonable way. This tool could then be run by my 
users before they commit to git, and then again by the CI server before it 
"publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and 
Communications Systems, National Library of Medicine, NIH

[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution of this information is 
prohibited, and may be punishable by law. If this was sent to you in error, 
please notify the sender by reply e-mail and destroy all copies of the original 
message.

GrayHair Software <http://www.grayhairSoftware.com>

RE: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Davis, Daniel (NIH/NLM) [C]

I think of enterprise search as very similar to RDBMS:

- It belongs in the backend behind your app.
- Each project ought to control its own schema and data.

So, I want the configset for each team's Solr collections to be stored in our 
Git server just as the RDBMS schema is if a developer is using a framework or a 
couple of SQL files, scripts, and a VERSION table.It ought to be that easy.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, December 30, 2015 5:37 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

Yeah, the notion of DTDs have gone around several times but always founder on 
the fact that you can, say, define your own Filter with it's own set of 
parameters etc. Sure, you can make a generic DTD that accommodates this, but 
then it becomes so general as to be little more than a syntax checker.

The managed schema stuff allows modifications of the schema via REST calls and 
there is some equivalent functionality for solrconfig.xml, but the interesting 
bit about that is that then your VCS is not the "one true source" of the 
configs, it almost goes backwards: Modify the configs in Zookeeper then check 
in to Git.
And even that doesn't really solve, say, putting default search fields in 
solrconfig.xml that do not exist in the schema file.

Frankly what I usually do when heavily editing either one is just do it on my 
local laptop, either stand alone or SolrCloud, _then_ check it in and/or test 
it on my cloud setup. So I guess the take-away is that I don't have any very 
good solution here.

Best,
Erick


On Wed, Dec 30, 2015 at 1:10 PM, Davis, Daniel (NIH/NLM) [C] 
<daniel.da...@nih.gov> wrote:
> Your bottom line point is that EmbeddedSolrServer is different, and some 
> configurations will not work on it where they would work on a SolrCloud.   
> This is well taken.   Maybe creating a new collection on existing dev nodes 
> could be done.
>
> As far as VDI and Puppet.   My requirements are different because my 
> organization is different.   I would prefer not to go into how different.   I 
> have written puppet modules for other system configurations, tested them on 
> AWS EC2, and yet those modules have not been adopted by my organization.
>
>
> -Original Message-
> From: Mark Horninger [mailto:mhornin...@grayhairsoftware.com]
> Sent: Wednesday, December 30, 2015 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Testing Solr configuration, schema, and other fields
>
> Daniel,
>
>
> Sounds almost like you're reinventing the wheel.  Could you possibly automate 
> this through puppet or Chef?  With a VDI environment, then all you would need 
> to do is build a new VM Node based on original setup.  Then you can just roll 
> out the node as one of the zk nodes.
>
> Just a thought on that subject.
>
> v/r,
>
> -Mark H.
>
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
> Sent: Wednesday, December 30, 2015 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Testing Solr configuration, schema, and other fields
>
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or distribution of this 
> information is prohibited, and may be punishable by law. If this was sent to 
> you in error, please notify the sender by reply e-mail and destroy all copies 
> of the original message.
>
> GrayHair Software <http://www.grayhairSoftware.com>
>

Re: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Erick Erickson

Yeah, the notion of DTDs have gone around several times but always founder
on the fact that you can, say, define your own Filter with it's own set of
parameters etc. Sure, you can make a generic DTD that accommodates
this, but then it becomes so general as to be little more than a syntax checker.

The managed schema stuff allows modifications of the schema via REST calls
and there is some equivalent functionality for solrconfig.xml, but the
interesting
bit about that is that then your VCS is not the "one true source" of
the configs,
it almost goes backwards: Modify the configs in Zookeeper then check in to Git.
And even that doesn't really solve, say, putting default search fields in
solrconfig.xml that do not exist in the schema file.

Frankly what I usually do when heavily editing either one is just do
it on my local
laptop, either stand alone or SolrCloud, _then_ check it in and/or test it on
my cloud setup. So I guess the take-away is that I don't have any very good
solution here.

Best,
Erick


On Wed, Dec 30, 2015 at 1:10 PM, Davis, Daniel (NIH/NLM) [C]
<daniel.da...@nih.gov> wrote:
> Your bottom line point is that EmbeddedSolrServer is different, and some 
> configurations will not work on it where they would work on a SolrCloud.   
> This is well taken.   Maybe creating a new collection on existing dev nodes 
> could be done.
>
> As far as VDI and Puppet.   My requirements are different because my 
> organization is different.   I would prefer not to go into how different.   I 
> have written puppet modules for other system configurations, tested them on 
> AWS EC2, and yet those modules have not been adopted by my organization.
>
>
> -Original Message-
> From: Mark Horninger [mailto:mhornin...@grayhairsoftware.com]
> Sent: Wednesday, December 30, 2015 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Testing Solr configuration, schema, and other fields
>
> Daniel,
>
>
> Sounds almost like you're reinventing the wheel.  Could you possibly automate 
> this through puppet or Chef?  With a VDI environment, then all you would need 
> to do is build a new VM Node based on original setup.  Then you can just roll 
> out the node as one of the zk nodes.
>
> Just a thought on that subject.
>
> v/r,
>
> -Mark H.
>
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
> Sent: Wednesday, December 30, 2015 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Testing Solr configuration, schema, and other fields
>
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of Computer 
> and Communications Systems, National Library of Medicine, NIH
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or distribution of this 
> information is prohibited, and may be punishable by law. If this was sent to 
> you in error, please notify the sender by reply e-mail and destroy all copies 
> of the original message.
>
> GrayHair Software <http://www.grayhairSoftware.com>
>

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-15 Thread Chantal Ackermann

Hi,

@Lance - thanks, it's a pleasure to give something back to the community. Even 
if it is comparatively small. :-)

@Paul - it's definitly not 15 min but rather 2 min. Actually, the testing part 
of this setup is very regular compared to other Maven projects. The copying of 
the WAR file and repackaging is not that time consuming. (This is still Maven - 
widely used and proven - it wouldn't be if it was not practical?)


Cheers,
Chantal

Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Chantal Ackermann

Hi all,


this is not a question. I just wanted to announce that I've written a blog post 
on how to set up Maven for packaging and automatic testing of a SOLR index 
configuration.

http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/

Feedback or comments appreciated!
And again, thanks for that great piece of software.

Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread David Philip

Informative. Useful.Thanks


On Thu, Mar 14, 2013 at 1:59 PM, Chantal Ackermann 
c.ackerm...@it-agenten.com wrote:

 Hi all,


 this is not a question. I just wanted to announce that I've written a blog
 post on how to set up Maven for packaging and automatic testing of a SOLR
 index configuration.


 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/

 Feedback or comments appreciated!
 And again, thanks for that great piece of software.

 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht

Nice,

Chantal can you indicate there or here what kind of speed for integration tests 
you've reached with this, from a bare source to a successfully tested 
application?
(e.g. with 100 documents)

thanks in advance

Paul


On 14 mars 2013, at 09:29, Chantal Ackermann wrote:

 Hi all,
 
 
 this is not a question. I just wanted to announce that I've written a blog 
 post on how to set up Maven for packaging and automatic testing of a SOLR 
 index configuration.
 
 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/
 
 Feedback or comments appreciated!
 And again, thanks for that great piece of software.
 
 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Chantal Ackermann

Hi Paul,

I'm sorry I cannot provide you with any numbers. I also doubt it would be wise 
to post any as I think the speed depends highly on what you are doing in your 
integration tests.

Say you have several request handlers that you want to test (on different 
cores), and some more complex use cases like using output from one request 
handler as input to others. You would also import test data that would be 
representative enough to test these request handlers and use cases.

The requests themselves, of course, only take as long as SolrJ takes to run and 
SOLR takes to answer them.
In addition, there is the overhead of Maven starting up, running all the 
plugins, importing the data, executing the tests. Well, Maven is certainly not 
the fastest tool to start up and get going…

If you are asking because you want to run rather a lot requests and test their 
output - JMeter might be preferrable?

Hope that was not too vague an answer,
Chantal


Am 14.03.2013 um 09:51 schrieb Paul Libbrecht:

 Nice,
 
 Chantal can you indicate there or here what kind of speed for integration 
 tests you've reached with this, from a bare source to a successfully tested 
 application?
 (e.g. with 100 documents)
 
 thanks in advance
 
 Paul
 
 
 On 14 mars 2013, at 09:29, Chantal Ackermann wrote:
 
 Hi all,
 
 
 this is not a question. I just wanted to announce that I've written a blog 
 post on how to set up Maven for packaging and automatic testing of a SOLR 
 index configuration.
 
 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/
 
 Feedback or comments appreciated!
 And again, thanks for that great piece of software.
 
 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht

Chantal,

the goal is different: get a general feeling how practical it is to integrate 
this in the routine.
If you are able, on your contemporary machine which I assume is not a 
supercomputer of some special sort, to run this whole process somewhat useful 
for you in about 2 minutes then I'll be very interested.

If, like quite many things where maven starts and integration is measured from 
all facets, it takes more than 15 minutes to run this process, once useful, 
then I will be less motivated.

I'm not asking for performance measurement and certainly not for that of solr 
which I trust largely and depends a lot on good caching. Yes, for this, jMeter 
or others are useful.

Paul


On 14 mars 2013, at 12:20, Chantal Ackermann wrote:

 Hi Paul,
 
 I'm sorry I cannot provide you with any numbers. I also doubt it would be 
 wise to post any as I think the speed depends highly on what you are doing in 
 your integration tests.
 
 Say you have several request handlers that you want to test (on different 
 cores), and some more complex use cases like using output from one request 
 handler as input to others. You would also import test data that would be 
 representative enough to test these request handlers and use cases.
 
 The requests themselves, of course, only take as long as SolrJ takes to run 
 and SOLR takes to answer them.
 In addition, there is the overhead of Maven starting up, running all the 
 plugins, importing the data, executing the tests. Well, Maven is certainly 
 not the fastest tool to start up and get going…
 
 If you are asking because you want to run rather a lot requests and test 
 their output - JMeter might be preferrable?
 
 Hope that was not too vague an answer,
 Chantal
 
 
 Am 14.03.2013 um 09:51 schrieb Paul Libbrecht:
 
 Nice,
 
 Chantal can you indicate there or here what kind of speed for integration 
 tests you've reached with this, from a bare source to a successfully tested 
 application?
 (e.g. with 100 documents)
 
 thanks in advance
 
 Paul
 
 
 On 14 mars 2013, at 09:29, Chantal Ackermann wrote:
 
 Hi all,
 
 
 this is not a question. I just wanted to announce that I've written a blog 
 post on how to set up Maven for packaging and automatic testing of a SOLR 
 index configuration.
 
 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/
 
 Feedback or comments appreciated!
 And again, thanks for that great piece of software.
 
 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Lance Norskog

Wow! That's great. And it's a lot of work, especially getting it all 
keyboard-complete. Thank you.


On 03/14/2013 01:29 AM, Chantal Ackermann wrote:

Hi all,


this is not a question. I just wanted to announce that I've written a blog post 
on how to set up Maven for packaging and automatic testing of a SOLR index 
configuration.

http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/

Feedback or comments appreciated!
And again, thanks for that great piece of software.

Chantal

Re: stress testing Solr 4.x

2012-12-10 Thread Alain Rogister

Hi Mark,

Usually I was stopping them with ctrl-c but several times, one of the
servers was hung and had to be stopped with kill -9.

Thanks,

Alain

On Mon, Dec 10, 2012 at 5:09 AM, Mark Miller markrmil...@gmail.com wrote:

 Hmmm...EOF on the segments file is odd...

 How were you killing the nodes? Just stopping them or kill -9 or what?

 - Mark

 On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com
 wrote:
  Hi,
 
  I have re-ran my tests today after I updated Solr 4.1 to apply the patch.
 
  First, the good news : it works i.e. if I stop all three Solr servers and
  then restart one, it will try to find the other two for a while (about 3
  minutes I think) then give up, become the leader and start processing
  requests.
 
  Now, the not-so-good : I encountered several exceptions that seem to
  indicate 2 other issues. Here are the relevant bits.
 
  1) The ZK session expiry problem : not sure what caused it but I did a
 few
  Solr or ZK node restarts while the system was under load.
 
  SEVERE: There was a problem finding the leader in
  zk:org.apache.solr.common.SolrException: Could not get leader props
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
  at
 
 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
  at
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
  at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
  at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
  at
 
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
  at
 
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
  at
 
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
  at
 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
  Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for
 /collections/adressage/leaders/shard1
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
  at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
  at
 org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
  ... 10 more
  SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for /overseer/queue/qn-
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
  at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
  at
 org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
  at
 org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
  at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
  at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
  at
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
  at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
  at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
  at
 
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
  at
 
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
  at
 
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
  at
 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 
  2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
  to start due to the exceptions below and both servers went into a
 seemingly
  endless loop of exponential retries. The fix was to stop both faulty
  servers, remove the data directory of this core and restart : replication
  then took place correctly. As above, not sure what exactly caused this to
  happen; no updates were taking place, only searches.
 
  On server 1 :
 
  INFO: Closing

Re: stress testing Solr 4.x

2012-12-09 Thread Mark Miller

Hmmm...EOF on the segments file is odd...

How were you killing the nodes? Just stopping them or kill -9 or what?

- Mark

On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com wrote:
 Hi,

 I have re-ran my tests today after I updated Solr 4.1 to apply the patch.

 First, the good news : it works i.e. if I stop all three Solr servers and
 then restart one, it will try to find the other two for a while (about 3
 minutes I think) then give up, become the leader and start processing
 requests.

 Now, the not-so-good : I encountered several exceptions that seem to
 indicate 2 other issues. Here are the relevant bits.

 1) The ZK session expiry problem : not sure what caused it but I did a few
 Solr or ZK node restarts while the system was under load.

 SEVERE: There was a problem finding the leader in
 zk:org.apache.solr.common.SolrException: Could not get leader props
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
 at
 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
 at
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
 at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
 at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
 at
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
 at
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
 at
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
 at
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
 KeeperErrorCode = Session expired for /collections/adressage/leaders/shard1
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
 at
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
 at
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
 at
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
 at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
 ... 10 more
 SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
 KeeperErrorCode = Session expired for /overseer/queue/qn-
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
 at
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
 at
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
 at
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
 at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
 at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
 at
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
 at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
 at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
 at
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
 at
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
 at
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
 at
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

 2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
 to start due to the exceptions below and both servers went into a seemingly
 endless loop of exponential retries. The fix was to stop both faulty
 servers, remove the data directory of this core and restart : replication
 then took place correctly. As above, not sure what exactly caused this to
 happen; no updates were taking place, only searches.

 On server 1 :

 INFO: Closing
 directory:/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785
 Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
 SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch
 failed :
 at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400)
 at

Re: stress testing Solr 4.x

2012-12-08 Thread Mark Miller

Hmm…I've tried to replicate what looked like a bug from your report (3 Solr 
servers stop/start ), but on 5x it works no problem for me. It shouldn't be any 
different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it won't 
work currently. Cores won't be able to register with ZooKeeper and will fail 
loading. It would probably be nicer to come up in search only mode and keep 
trying to reconnect to zookeeper - file a JIRA issue if you are interested.

On the zk data dir, see 
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

 Hey, I'll try and answer this tomorrow.
 
 There is a def an unreported bug in there that needs to be fixed for the 
 restarting the all nodes case.
 
 Also, a 404 one is generally when jetty is starting or stopping - there are 
 points where 404's can be returned. I'm not sure why else you'd see one. 
 Generally we do retries when that happens.
 
 - Mark
 
 On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com wrote:
 
 I am reporting the results of my stress tests against Solr 4.x. As I was
 getting many error conditions with 4.0, I switched to the 4.1 trunk in the
 hope that some of the issues would be fixed already. Here is my setup :
 
 - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
 this is not representative of a production environment but it's a fine way
 to find out what happens under resource-constrained conditions.
 - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
 MB of data)
 - single shard
 - 3 Zookeeper instances
 - HAProxy load balancing requests across Solr servers
 - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
 each, sending search requests continuously (no updates)
 
 In nominal conditions, it all works fine i.e. it can process a million
 requests, maxing out the CPUs at all time, without experiencing nasty
 failures. There are errors in the logs about replication failures though;
 they should be benigne in this case as no updates are taking place but it's
 hard to tell what is going on exactly. Example :
 
 Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
 exception talking to
 http://192.168.0.101:8985/solr/adressage/, failed
 org.apache.solr.common.SolrException: Server at
 http://192.168.0.101:8985/solr/adressage returned non ok status:404,
 message:Not Found
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 
 Then I simulated various failure scenarios :
 
 - 1 Solr server stop/start
 - 2 Solr servers stop/start
 - 3 Solr servers stop/start : it seems that in this case, the Solr servers
 *cannot* be restarted : more exactly, the restarted server will consider
 that it is number 1 out of 4 and wait for the other 3 to come up. The only
 way out is to stop it again, then stop all Zookeeper instances *and* clean
 up their zkdata directory, start them, then start the Solr servers.
 
 I noticed that these zkdata directory had grown to 200 MB after a while.
 What exactly is in there besides the configuration data ? Does it stop
 growing ?
 
 Then I tried this :
 
 - kill 1 Zookeeper process
 - kill 2 Zookeeper processes
 - stop/start 1 Solr server
 
 When doing this, I experienced (many times) situations where the Solr
 servers could not reconnect and threw scary exceptions. The only way out
 was to restart the whole cluster.
 
 Q : when, if ever, is one supposed to clean up the zkdata directories ?
 
 Here are the errors I found in the logs. It seems that some of them have
 been reported in JIRA but 4.1-trunk seems to experience basically the same
 issues as 4.0 in my test scenarios.
 
 Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
 couldn't connect to
 http://192.168.0.101:8984/solr/cachede/, counting as success
 Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log

Re: stress testing Solr 4.x

2012-12-08 Thread Mark Miller

After some more playing around on 5x I have duplicated the issue. I'll file a
JIRA issue for you and fix it shortly.

- Mark

On Dec 8, 2012, at 8:43 AM, Mark Miller markrmil...@gmail.com wrote:

Hmm…I've tried to replicate what looked like a bug from your report (3 Solr
servers stop/start ), but on 5x it works no problem for me. It shouldn't be
any different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it won't
work currently. Cores won't be able to register with ZooKeeper and will fail
loading. It would probably be nicer to come up in search only mode and keep
trying to reconnect to zookeeper - file a JIRA issue if you are interested.

On the zk data dir, see
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for the
restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there are
points where 404's can be returned. I'm not sure why else you'd see one.
Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com wrote:

I am reporting the results of my stress tests against Solr 4.x. As I was
getting many error conditions with 4.0, I switched to the 4.1 trunk in the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
this is not representative of a production environment but it's a fine way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures though;
they should be benigne in this case as no updates are taking place but it's
hard to tell what is going on exactly. Example :

Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
exception talking to
http://192.168.0.101:8985/solr/adressage/, failed
org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/adressage returned non ok status:404,
message:Not Found
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Then I simulated various failure scenarios :

- 1 Solr server stop/start
- 2 Solr servers stop/start
- 3 Solr servers stop/start : it seems that in this case, the Solr servers
*cannot* be restarted : more exactly, the restarted server will consider
that it is number 1 out of 4 and wait for the other 3 to come up. The only
way out is to stop it again, then stop all Zookeeper instances *and* clean
up their zkdata directory, start them, then start the Solr servers.

I noticed that these zkdata directory had grown to 200 MB after a while.
What exactly is in there besides the configuration data ? Does it stop
growing ?

Then I tried this :

- kill 1 Zookeeper process
- kill 2 Zookeeper processes
- stop/start 1 Solr server

When doing this, I experienced (many times) situations where the Solr
servers could not reconnect and threw scary exceptions. The only way out
was to restart the whole cluster.

Q : when, if ever, is one supposed to clean up the zkdata directories ?

Here are the errors I found in the logs. It seems that some of them have
been reported in JIRA but 4.1-trunk seems to experience basically the same
issues as 4.0 in my test scenarios.

Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse

Re: stress testing Solr 4.x

2012-12-08 Thread Alain Rogister

Great, thanks Mark ! I'll test the fix and post my results.

Alain

On Saturday, December 8, 2012, Mark Miller wrote:

After some more playing around on 5x I have duplicated the issue. I'll
file a JIRA issue for you and fix it shortly.

- Mark

On Dec 8, 2012, at 8:43 AM, Mark Miller markrmil...@gmail.com wrote:

Hmm…I've tried to replicate what looked like a bug from your report (3
Solr servers stop/start ), but on 5x it works no problem for me. It
shouldn't be any different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it
won't work currently. Cores won't be able to register with ZooKeeper and
will fail loading. It would probably be nicer to come up in search only
mode and keep trying to reconnect to zookeeper - file a JIRA issue if you
are interested.

On the zk data dir, see
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for
the restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there
are points where 404's can be returned. I'm not sure why else you'd see
one. Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com
wrote:

I am reporting the results of my stress tests against Solr 4.x. As I
was
getting many error conditions with 4.0, I switched to the 4.1 trunk in
the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
realize
this is not representative of a production environment but it's a fine
way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one
has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20
threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures
though;
they should be benigne in this case as no updates are taking place but
it's
hard to tell what is going on exactly. Example :

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.

Re: stress testing Solr 4.x

2012-12-08 Thread Mark Miller

No problem!

Here is the JIRA issue: https://issues.apache.org/jira/browse/SOLR-4158

- Mark

On Sat, Dec 8, 2012 at 6:03 PM, Alain Rogister alain.rogis...@gmail.com wrote:
Great, thanks Mark ! I'll test the fix and post my results.

Alain

On Saturday, December 8, 2012, Mark Miller wrote:

After some more playing around on 5x I have duplicated the issue. I'll
file a JIRA issue for you and fix it shortly.

- Mark

On Dec 8, 2012, at 8:43 AM, Mark Miller markrmil...@gmail.com wrote:

Hmm…I've tried to replicate what looked like a bug from your report (3
Solr servers stop/start ), but on 5x it works no problem for me. It
shouldn't be any different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it
won't work currently. Cores won't be able to register with ZooKeeper and
will fail loading. It would probably be nicer to come up in search only
mode and keep trying to reconnect to zookeeper - file a JIRA issue if you
are interested.

On the zk data dir, see
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for
the restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there
are points where 404's can be returned. I'm not sure why else you'd see
one. Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com
wrote:

I am reporting the results of my stress tests against Solr 4.x. As I
was
getting many error conditions with 4.0, I switched to the 4.1 trunk in
the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
realize
this is not representative of a production environment but it's a fine
way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one
has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20
threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures
though;
they should be benigne in this case as no updates are taking place but
it's
hard to tell what is going on exactly. Example :

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at

--
- Mark

Re: stress testing Solr 4.x

2012-12-07 Thread Mark Miller

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for the 
restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there are 
points where 404's can be returned. I'm not sure why else you'd see one. 
Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com wrote:

 I am reporting the results of my stress tests against Solr 4.x. As I was
 getting many error conditions with 4.0, I switched to the 4.1 trunk in the
 hope that some of the issues would be fixed already. Here is my setup :
 
 - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
 this is not representative of a production environment but it's a fine way
 to find out what happens under resource-constrained conditions.
 - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
 MB of data)
 - single shard
 - 3 Zookeeper instances
 - HAProxy load balancing requests across Solr servers
 - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
 each, sending search requests continuously (no updates)
 
 In nominal conditions, it all works fine i.e. it can process a million
 requests, maxing out the CPUs at all time, without experiencing nasty
 failures. There are errors in the logs about replication failures though;
 they should be benigne in this case as no updates are taking place but it's
 hard to tell what is going on exactly. Example :
 
 Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
 exception talking to
 http://192.168.0.101:8985/solr/adressage/, failed
 org.apache.solr.common.SolrException: Server at
 http://192.168.0.101:8985/solr/adressage returned non ok status:404,
 message:Not Found
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 
 Then I simulated various failure scenarios :
 
 - 1 Solr server stop/start
 - 2 Solr servers stop/start
 - 3 Solr servers stop/start : it seems that in this case, the Solr servers
 *cannot* be restarted : more exactly, the restarted server will consider
 that it is number 1 out of 4 and wait for the other 3 to come up. The only
 way out is to stop it again, then stop all Zookeeper instances *and* clean
 up their zkdata directory, start them, then start the Solr servers.
 
 I noticed that these zkdata directory had grown to 200 MB after a while.
 What exactly is in there besides the configuration data ? Does it stop
 growing ?
 
 Then I tried this :
 
 - kill 1 Zookeeper process
 - kill 2 Zookeeper processes
 - stop/start 1 Solr server
 
 When doing this, I experienced (many times) situations where the Solr
 servers could not reconnect and threw scary exceptions. The only way out
 was to restart the whole cluster.
 
 Q : when, if ever, is one supposed to clean up the zkdata directories ?
 
 Here are the errors I found in the logs. It seems that some of them have
 been reported in JIRA but 4.1-trunk seems to experience basically the same
 issues as 4.0 in my test scenarios.
 
 Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
 couldn't connect to
 http://192.168.0.101:8984/solr/cachede/, counting as success
 Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
 SEVERE: Sync request error:
 org.apache.solr.client.solrj.SolrServerException: Server refused connection
 at: http://192.168.0.101:8984/solr/cachede
 Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
 SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
 to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
 connection at: http://192.168.0.101:8984/solr
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
 at

Re: Testing Solr Cloud with ZooKeeper

2012-11-13 Thread darul

https://issues.apache.org/jira/browse/SOLR-3993 has been resolved.

Just few question, is it in trunk, I mean in main distrib downloadable on
main solr site.

Because I have downloaded it and get still same behaviour while running
first instance..or second shards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020118.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-13 Thread darul

Looks like after timeout has finished, first solr instance respond



I was not waiting enough. Is it possible to reduce this *timeout* value ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020190.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread Erick Erickson

you have to have at least one node per shard running for SolrCloud to
function. So when you bring down all nodes and start one, then you have
some shards with no live nodes and SolrCloud goes into a wait state.

Best
Erick


On Thu, Nov 8, 2012 at 6:17 PM, darul daru...@gmail.com wrote:

 Is it same issue as one detailed here

 http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-td4015804.html



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019183.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread darul

- Shards : 2
- ZooKeeper Cluster : 3
- One collection.

Here is how I run it and my scenario case:

In first console, I get first Node (first Shard) running on port 8983:





In second console, I get second Node (second Shard) running on port 8984:





Here I get just 2 nodes for my 2 shards running.

The I decide to add 2 replicates for each shard node.


and


Now everything is fine, a robust collection with 2 shards, 2 replicates
running. 

Result expected is here:

http://lucene.472066.n3.nabble.com/file/n4019257/Solr_Admin_192.168.1.6_.png 

Then, I decide to stop the 2 last predicates running on port 7501/7502.

Results expected is here:
http://lucene.472066.n3.nabble.com/file/n4019257/2.png 

Then I now stop the 2 main instances running on port 8983/8983.

Restart the first one 8983:

I get a lot of this dump in console:


Why not, I start second one running on 8984, and get 



I do not understand why replicates are needed at this phase...first when I
started the first time, no need for replicates. And now, I would like
restart 2 main instances, and maybe start replicates later.

If I start both instances 7501/7502, everything is fine but not what I was
expected.

Any ideas,

Thanks again,

Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019257.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread ku3ia

Hi, I have near the same problems with cloud state
see
http://lucene.472066.n3.nabble.com/Replicated-zookeeper-td4018984.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019264.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread darul

Yes ku3ia, I read your thread yesterday and looks like we get same issue. I
wish Apache Con is nearly finished and expert can resolve this 
Thanks again to solr community,
Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019271.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul

Hello again,

With the following config :

- 2 zookeeper ensemble
- 2 shards
- 2 main solr instances for the 2 shards
- I added 2, 3 replicates for fun.

While running and I stop one replicate, I see in admin ui graph updates
(replicate disabled/inactivated)...normal.

But if I stopped all solr instance and restart the first main instance
:8983, I always get it waiting for some replicates...is it useful ? Why
replicate are needed to run ? Can not access to admin anymore. 

Solution is to erase zookeeper data and start again, do you have any
solutions to avoid :



What if my replicates are really down in production and I restart everything
?

Another question, 2 shards means 2 zookeeper ensemble, 3 shards, 3 zookeeper
ensemble ?

Thanks,

Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019028.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul

Thanks Otis, 

Indeed here too  zoo doc
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
 
, they advise to choose odd number of zk nodes this way To create a
deployment that can tolerate the failure of F machines, you should count on
deploying 2xF+1 machines...

Well, I just do not yet understand why after using replicate, I am not able
to restart solr instances if replicates are not running. (When I start them,
it is ok)

Do I need to erase all zookeeper config every time solr servers are
restarted...I mean send the conf again with bootstrap, looks like I am not
doing the right way ;)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019102.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul

Too illustrate:

http://lucene.472066.n3.nabble.com/file/n4019103/SolrAdmin.png 

Taking this example, 8983 and 8984 are Shard owner, 7501/7502 just
replicates.

If I stop all instance, then restart 8983 or 8984 first, they won't run and
asked for replicates too be started...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019103.html
Sent from the Solr - User mailing list archive at Nabble.com.

Testing Solr Cloud with ZooKeeper

2012-11-07 Thread darul

Hello everyone,

Having used *Hadoop* (not in charge of deployment, just java code part) and
*Solr 3.6* (deployment and coding) this year, today I made the solr cloud
wiki.

Well,

* I have deployed 2 zookeeper (not embedded) instances
* 2 solr instances with 2 shards (pointing to zookeeper nodes)
* 2 solr replicates

successfully ...thank you for new administration ui, graph and co,
nice.

But I am still confused with all these new amazing features. (compared to
when I was using multicore and master/slave behaviour).

Here in cloud, I am lost (in translation too)

*Few questions:*
- my both zookeeper have their own data directory, as usual, but I did not
see so much change inside after indexing examples docs. Are data stored
their or just /configuration (conf files) /is stored in zookeeper ensemble ?
Can you confirmed /index data/ are also stored in zookeeper cluster ? Or not
?
- In my solr instances directory tree, /solr/mycollection/ sometimes I have
an index or index.20121107185908378 directory and tlog directory, what
is it used for, could you explain me why index directory sometimes looks
like a snapshot ? zookeeper should not store index, sorry I repeat myself,
or is it just a snaphot. what is tlog directory for ?
- Then, playing a little bit, I test following command
http://localhost:8983/solr/admin/collections?action=CREATEname=mynamenumShards=2replicationFactor=1
and see it update configuration of core.xml and create data directory as
well, nice. But when I navigate to admin ui and check schema for instance,
where does this configuration come from ? I do not get any conf directory
for this core, does it take one by default

I have so much questions to ask.

Thanks,

Julien

--
View this message in context:
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread darul

I reply to myself :


darul wrote
 
*
 Few questions:
*
 - my both zookeeper have their own data directory, as usual, but I did not
 see so much change inside after indexing examples docs. Are data stored
 their or just 
/
 configuration (conf files) 
/
 is stored in zookeeper ensemble ? Can you confirmed 
/
 index data
/
  are also stored in zookeeper cluster ? Or not ?

I read again and see Solr embeds and uses Zookeeper as a repository for
cluster configuration and coordination, so meaning just configuration, not
index repository at all ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018902.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread Erick Erickson

Right. Solr uses zookeeper only for configuration information. The index
resides on the machines running solr.

bq: In my solr instances directory tree, /solr/mycollection/ sometimes I
have an index or index.20121107185908378

You can configure Solr to keep snapshots of indexes around under control of
an index deletion policy, which you can configure. I think what you're
seeing is this policy in action, you can check to see how it's set up in
your particular situation. This is independent of SolrCloud, it's local to
the solr node.

About CREATE. I'm not entirely sure where the config comes from, sorry I
can't help there... What does the solr.xml file show? Are there instanceDir
attribute to the newly-created core (or schema or config)?

Best
Erick

On Wed, Nov 7, 2012 at 3:52 PM, darul daru...@gmail.com wrote:

I reply to myself :

darul wrote

I read again and see Solr embeds and uses Zookeeper as a repository for
cluster configuration and coordination, so meaning just configuration, not
index repository at all ?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018902.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread darul

Yes instanceDir attribute point to new created core (with no conf dir) so it
is stranged...

but looks like I have played to much:



when I start main solr shard. I try everything again tomorrow and give you
feedback.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018909.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread Otis Gospodnetic

You didn't ask about this, but you'll want an odd number of zookeeper
nodes. Think voting.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 7, 2012 4:43 PM, darul daru...@gmail.com wrote:

 Yes instanceDir attribute point to new created core (with no conf dir) so
 it
 is stranged...

 but looks like I have played to much:



 when I start main solr shard. I try everything again tomorrow and give you
 feedback.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018909.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Jetty Error while testing Solr

2012-11-07 Thread deniz

=org.eclipse.jetty.server.handler.DefaultHandler/
   /Item
   Item
 New id=RequestLog
class=org.eclipse.jetty.server.handler.RequestLogHandler/
   /Item
 /Array
/Set
  /New
/Set
 




 



Set name=stopAtShutdowntrue/Set
Set name=sendServerVersionfalse/Set
Set name=sendDateHeaderfalse/Set
Set name=gracefulShutdown1000/Set
Set name=dumpAfterStartfalse/Set
Set name=dumpBeforeStopfalse/Set
 
 
 
 
Call name=addBean
  Arg
New id=DeploymentManager
class=org.eclipse.jetty.deploy.DeploymentManager
  Set name=contexts
Ref id=Contexts /
  /Set
  Call name=setContextAttribute
   
Argorg.eclipse.jetty.server.webapp.ContainerIncludeJarPattern/Arg
Arg.*/servlet-api-[^/]*\.jar$/Arg
  /Call
 
 
  
  
 
/New
  /Arg
/Call
 
Ref id=DeploymentManager
  Call name=addAppProvider
 Arg
  New
class=org.eclipse.jetty.deploy.providers.ContextProvider
Set name=monitoredDirNameSystemProperty
name=jetty.home default=.//contexts/Set
Set name=scanInterval0/Set
  /New
/Arg
  /Call
/Ref
 
/Configure




do i need to change some stuff in config or are there ways to fix this
without dealing with jetty configs? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Jetty-Error-while-testing-Solr-tp4018950.html
Sent from the Solr - User mailing list archive at Nabble.com.

Facing Problem while testing solr 3.6 with Tomcat 6

2012-05-16 Thread Amit Handa

hi All,

Kindly guide me in resolving the following issue which is coming while
testing Apache Solr 3.6 with Tomcat 6 while trying to access 
http://localhost:8080/solr-example/;

HTTP Status 500 -
--

*type* Exception report

*message* **

*description* *The server encountered an internal error () that prevented
it from fulfilling this request.*

*exception*

javax.servlet.ServletException: java.lang.AbstractMethodError:
javax.servlet.jsp.JspFactory.getJspApplicationContext(Ljavax/servlet/ServletContext;)Ljavax/servlet/jsp/JspApplicationContext;
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:268)
javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:306)

*root cause*

java.lang.AbstractMethodError:
javax.servlet.jsp.JspFactory.getJspApplicationContext(Ljavax/servlet/ServletContext;)Ljavax/servlet/jsp/JspApplicationContext;
org.apache.jsp.index_jsp._jspInit(index_jsp.java:24)
org.apache.jasper.runtime.HttpJspBase.init(HttpJspBase.java:52)

org.apache.jasper.servlet.JspServletWrapper.getServlet(JspServletWrapper.java:164)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:340)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
javax.servlet.http.HttpServlet.service(HttpServlet.java:717)

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:306)

*note* *The full stack trace of the root cause is available in the Apache
Tomcat/6.0.35 logs.*
--
Apache Tomcat/6.0.35

I followed instruction given in link (http://wiki.apache.org/solr/SolrTomcat
 )

1) First Installed tomcat 6
2) Download solr 3.6 zip
http://mirror.cc.columbia.edu/pub/software/apache/lucene/solr/3.6.0/apache-solr-3.6.0.zip
3) Follow steps mentioned inside *Installing Solr instances under Tomcat *(
http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat
)
4) TRy to open http://host:8080/solr-example/admin or
http://host:8080/solr-example.
( while accessing these pages i am geting above mentioned Error

Kindly help me out in resolving this problem

Thanks in advance.

With Regards,
Amit Handa

Re: Testing Solr Search results

2011-09-05 Thread Marc SCHNEIDER

Hi,
It depends what you want to test. If you want to check that your fields
behave like they should (for example make sure that the content of a field
containing accents can be retrieved) they you can write unit tests using a
Solr client API like solrj. You insert sample data and then you
programmatically test that you get the expected results.

Marc.

On Sun, Sep 4, 2011 at 2:26 AM, Erick Erickson erickerick...@gmail.comwrote:

 This an unsolved problem in general. The TREC folks try this, see:
 http://trec.nist.gov/

 but in general I've found that each domain has such specific needs
 that correctness isn't an easy thing to pin down. Of course you
 can, with a known set of data, define the best response and try to
 tune Solr to return those, but that's a static snapshot not a
 general test of correctness.

 Best
 Erick

 On Fri, Sep 2, 2011 at 11:25 AM, Nemani, Raj raj.nem...@turner.com
 wrote:
  All,
 
  I was wondering if anybody has any information on approaches to testing
  and verification search results from Solr.  Most of the time we end up
  manually verifying the results from a search but the verification is not
  necessarily scientific.
 
  The main question is what are we verifying these search results against?
  As an example, how can I be sure that that the relevancy calculated at
  any given time for a given document in an index is accurate?
 
 
 
  Hope the question makes sense.
 
 
 
  Any feedback is really appreciated.
 
 
 
  Thanks
 
  Raj

Re: Testing Solr Search results

2011-09-03 Thread Erick Erickson

This an unsolved problem in general. The TREC folks try this, see:
http://trec.nist.gov/

but in general I've found that each domain has such specific needs
that correctness isn't an easy thing to pin down. Of course you
can, with a known set of data, define the best response and try to
tune Solr to return those, but that's a static snapshot not a
general test of correctness.

Best
Erick

On Fri, Sep 2, 2011 at 11:25 AM, Nemani, Raj raj.nem...@turner.com wrote:
 All,

 I was wondering if anybody has any information on approaches to testing
 and verification search results from Solr.  Most of the time we end up
 manually verifying the results from a search but the verification is not
 necessarily scientific.

 The main question is what are we verifying these search results against?
 As an example, how can I be sure that that the relevancy calculated at
 any given time for a given document in an index is accurate?



 Hope the question makes sense.



 Any feedback is really appreciated.



 Thanks

 Raj

Testing Solr Search results

2011-09-02 Thread Nemani, Raj

All,

I was wondering if anybody has any information on approaches to testing
and verification search results from Solr.  Most of the time we end up
manually verifying the results from a search but the verification is not
necessarily scientific.  

The main question is what are we verifying these search results against?
As an example, how can I be sure that that the relevancy calculated at
any given time for a given document in an index is accurate?  

 

Hope the question makes sense.

 

Any feedback is really appreciated.

 

Thanks

Raj

Testing Solr

2010-12-16 Thread satya swaroop

Hi All,

 I built solr successfully and i am thinking to test it  with nearly
300 pdf files, 300 docs, 300 excel files,...and so on of each type with 300
files nearly
 Is there any dummy data available to test for solr,Otherwise i need to
download each and every file individually..??
Another question is there any Benchmarks of solr...??

Regards,
satya

Re: Testing Solr

2010-12-16 Thread Dennis Gearon

There are websites with data sets out there. 'Data sets' may not be the right 
search terms, but it's something like that.

Exactly what you want, I couldn't guess otherwise?
Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better idea to learn from others’ mistakes, so you do not have to make them 
yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Thu, 12/16/10, satya swaroop satya.yada...@gmail.com wrote:

 From: satya swaroop satya.yada...@gmail.com
 Subject: Testing Solr
 To: solr-user@lucene.apache.org
 Date: Thursday, December 16, 2010, 10:55 PM
 Hi All,
 
          I built solr
 successfully and i am thinking to test it  with nearly
 300 pdf files, 300 docs, 300 excel files,...and so on of
 each type with 300
 files nearly
  Is there any dummy data available to test for
 solr,Otherwise i need to
 download each and every file individually..??
 Another question is there any Benchmarks of solr...??
 
 Regards,
 satya

Re: Is there any strss test tool for testing Solr?

2010-08-30 Thread 朱炎詹

Thanks to both Gora  Amit. A little information for people who concern this 
discussion, I found there's a SolrMeter open source project in Google Code - 
http://code.google.com/p/solrmeter/, it's specifically for load test of 
Solr -


I'll evaluate following tools  pick up one for my testing:

WebStress
Apache Bench
JMeter
SolrMetre

Oh, I'll correct a wrong information my post: We're builiding a 12-million 
newspaper index, rather than 1.2 million.


Scott
- Original Message - 
From: Gora Mohanty g...@srijan.in

To: solr-user@lucene.apache.org
Sent: Friday, August 27, 2010 2:22 AM
Subject: Re: Is there any strss test tool for testing Solr?



On Wed, 25 Aug 2010 19:58:36 -0700
Amit Nithian anith...@gmail.com wrote:


i recommend JMeter. We use that to do load testing on a search
server.

[...]

JMeter is certainly good, but we have also found Apache bench
to also be of much use. Maybe it is just us, and what we are
familiar with, but Apache bench seemed easier to automate. Also,
much easier to get up and running with, at least IMHO.


Be careful though.. as silly as this may sound.. do NOT just
issue random queries because that won't exercise your caches...

[...]

Conversely, we are still trying to figure out how to make real-life
measurements, without having the Solr cache coming into the picture.
For querying on a known keyword, every hit after the first, with
Apache bench, is strongly affected by the Solr cache. We tried using
random strings, but at least with Apache bench, the query string is
fixed for each invocation of Apache bench. Have to investigate
whether one can do otherwise with JMeter plugins. Also, a query
that returns no result (as a random query string typically would)
seems to be significantly faster than a real query. So, I think that
in the long run, the best way is to build information about
*typical* queries that your users run; using the Solr logs, and
then use a set of such queries for benchmarking.

Regards,
Gora








¥¼¦b¶Ç¤J°T®§¤¤§ä¨ì¯f¬r¡C
Checked by AVG - www.avg.com
Version: 9.0.851 / Virus Database: 271.1.1/3094 - Release Date: 08/26/10 
02:34:00

Re: Is there any strss test tool for testing Solr?

2010-08-26 Thread Gora Mohanty

On Wed, 25 Aug 2010 19:58:36 -0700
Amit Nithian anith...@gmail.com wrote:

 i recommend JMeter. We use that to do load testing on a search
 server.
[...]

JMeter is certainly good, but we have also found Apache bench
to also be of much use. Maybe it is just us, and what we are
familiar with, but Apache bench seemed easier to automate. Also,
much easier to get up and running with, at least IMHO.

 Be careful though.. as silly as this may sound.. do NOT just
 issue random queries because that won't exercise your caches...
[...]

Conversely, we are still trying to figure out how to make real-life
measurements, without having the Solr cache coming into the picture.
For querying on a known keyword, every hit after the first, with
Apache bench, is strongly affected by the Solr cache. We tried using
random strings, but at least with Apache bench, the query string is
fixed for each invocation of Apache bench. Have to investigate
whether one can do otherwise with JMeter plugins. Also, a query
that returns no result (as a random query string typically would)
seems to be significantly faster than a real query. So, I think that
in the long run, the best way is to build information about
*typical* queries that your users run; using the Solr logs, and
then use a set of such queries for benchmarking.

Regards,
Gora

Re: Is there any strss test tool for testing Solr?

2010-08-26 Thread Chris Hostetter


: References: aanlktikmljxobdckycqsgfu8i47-0ec7bx7ujfkj2...@mail.gmail.com
: c7062fba00bb4ced8360a7e2dcf5c...@udc70634p002
: In-Reply-To: c7062fba00bb4ced8360a7e2dcf5c...@udc70634p002
: Subject: Is there any strss test tool for testing Solr?

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!

Is there any strss test tool for testing Solr?

2010-08-25 Thread 朱炎詹

We're currently building a Solr index with ober 1.2 million documents. I 
want to do a good stress test of it. Does anyone know if ther's a 
appropriate stress test tool for Solr? Or any good suggestion?


Best Regards,

Scott

Re: Is there any strss test tool for testing Solr?

2010-08-25 Thread Amit Nithian

i recommend JMeter. We use that to do load testing on a search server. of
course you have to provide a reasonable set of queries as input... if you
don't have any then a reasonable estimation based on your expected traffic
should suffice. JMeter can be used for other load testing too..

Be careful though.. as silly as this may sound.. do NOT just issue random
queries because that won't exercise your caches... We had a load test that
killed our servers because our caches kept getting blown out. Of course the
traffic being generated was purely random was not representative of real
world traffic which usually has more predictable behavior.

hope that helps!
Amit

On Wed, Aug 25, 2010 at 7:50 PM, scott chu (朱炎詹) scott@udngroup.comwrote:

 We're currently building a Solr index with ober 1.2 million documents. I
 want to do a good stress test of it. Does anyone know if ther's a
 appropriate stress test tool for Solr? Or any good suggestion?

 Best Regards,

 Scott

54 matches

Mail list logo