subject:"Testing"

Re: just testing if my emails are reaching the mailing list

2020-10-14 Thread uyilmaz

Thank you!

On Wed, 14 Oct 2020 09:41:16 +0200
Szűcs Roland  wrote:

> Hi,
> I got it from the solr user list.
> 
> 
> Roland
> 
> uyilmaz  ezt írta (időpont: 2020. okt. 14.,
> Sze, 9:39):
> 
> > Hello all,
> >
> > I have never got an answer to my questions in this mailing list yet, and
> > my mail client shows INVALID next to my mail address, so I thought I should
> > check if my emails are reaching to you.
> >
> > Can anyone reply?
> >
> > Regards
> >
> > --
> > uyilmaz 
> >


-- 
uyilmaz

Re: just testing if my emails are reaching the mailing list

2020-10-14 Thread Szűcs Roland

Hi,
I got it from the solr user list.


Roland

uyilmaz  ezt írta (időpont: 2020. okt. 14.,
Sze, 9:39):

> Hello all,
>
> I have never got an answer to my questions in this mailing list yet, and
> my mail client shows INVALID next to my mail address, so I thought I should
> check if my emails are reaching to you.
>
> Can anyone reply?
>
> Regards
>
> --
> uyilmaz 
>

just testing if my emails are reaching the mailing list

2020-10-14 Thread uyilmaz

Hello all,

I have never got an answer to my questions in this mailing list yet, and my 
mail client shows INVALID next to my mail address, so I thought I should check 
if my emails are reaching to you.

Can anyone reply?

Regards

-- 
uyilmaz

Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-19 Thread Michael Frank

Hi Chriss,
thanks for opening the ticket. I have found some possibly related issues:
Open:
https://issues.apache.org/jira/browse/SOLR-3888  - "need beter handling of
external add/commit requests during tlog recovery"


Closed:
https://issues.apache.org/jira/browse/SOLR-12011
https://issues.apache.org/jira/browse/SOLR-9366

Cheers,
Michael

Am Do., 13. Feb. 2020 um 19:19 Uhr schrieb Chris Hostetter <
hossman_luc...@fucit.org>:

>
> : We think this is a bug (silently dropping commits even if the client
> : requested "waitForSearcher"), or at least a missing feature (commits
> beging
> : the only UpdateRequests not reporting the achieved RF), which should be
> : worth a JIRA Ticket.
>
> Thanks for your analysis Michael -- I agree something better should be
> done here, and have filed SOLR-14262 for subsequent discussion...
>
> https://issues.apache.org/jira/browse/SOLR-14262
>
> I believe the reason the local commit is ignored during replay is to
> ensure a consistent view of the index -- if the tlog being
> replayed contains COMMIT1,A,B,C,COMMIT2,D,... we should never open a new
> searcher containing just A or just A+B w/o C if a COMMIT3 comes along
> during replay -- but agree with you 100% that either commit should support
> 'rf' making it obvious that this commit didn't succeed (which would also
> be important & helpful if the node was still down when the client sends
> the commit) ... *AND* ... we should consider making the commit block until
> replay is finished.
>
> ...BUT... there are probably other nuances i don't understand ...
> hoepfully other folks more familiar with the current implementation will
> chime in on the jira.
>
>
>
>
> -Hoss
> http://www.lucidworks.com/
>

Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-13 Thread Chris Hostetter



: We think this is a bug (silently dropping commits even if the client
: requested "waitForSearcher"), or at least a missing feature (commits beging
: the only UpdateRequests not reporting the achieved RF), which should be
: worth a JIRA Ticket.

Thanks for your analysis Michael -- I agree something better should be 
done here, and have filed SOLR-14262 for subsequent discussion...

https://issues.apache.org/jira/browse/SOLR-14262

I believe the reason the local commit is ignored during replay is to 
ensure a consistent view of the index -- if the tlog being 
replayed contains COMMIT1,A,B,C,COMMIT2,D,... we should never open a new 
searcher containing just A or just A+B w/o C if a COMMIT3 comes along 
during replay -- but agree with you 100% that either commit should support 
'rf' making it obvious that this commit didn't succeed (which would also 
be important & helpful if the node was still down when the client sends 
the commit) ... *AND* ... we should consider making the commit block until 
replay is finished.

...BUT... there are probably other nuances i don't understand ... 
hoepfully other folks more familiar with the current implementation will 
chime in on the jira.




-Hoss
http://www.lucidworks.com/

Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-12 Thread Michael Frank

; The client that indexes new documents performs a hard commit with
> waitSearcher=true and after that was successful, we expect the documents to
> be visible on all Replicas.
> This seems to work as expected if the cluster is in a healthy state.
> If we shut down nodes while updating documents and committing we observe
> that commits somehow get lost.
> The documents are neither visible on the leader nor on any replica! Even
> after all nodes and replicas are up again.
> And we don't get any error or exception from the Solrj client.
> Is there any way to make sure that a commit is executed sucessfully on
> _every_ replica (and fail if the replica is currently down or recovering)?
> Or to get notified that the commit could not be executed because the
> cluster is in an unhealthy state?
> If we can confirm and verify this in our Indexing client, we could detect
> failures and recover.
>
> I don't think the /get request handler is not an option for us because it
> only accepts document IDs and no search queries, which we rely heavily on.
> Is that correct?
>
>
> : FYI: three is no need to send a softCommit after a hardCommit
> Agreed, that was just us experimenting and trying stuff.
>
> : So to be clear: 'rf=2' means a total of 2 replicas confirmed the update
> -- that includes the leader replica.  'rf=1' means the leader accepted the
> doc, but all other replicas are down.
> : if you wnat to me 100% certain that every replica recieved the update,
> then you should be confirming rf=3
> Agreed, should have been more clear. We have multiple test scenarios. Some
> with 2 replicas (1 leader 1 reps) and some with 3 (1 leader, 2 reps). In
> the first mail i just picked the simplest test setup that failed,
> consisting of one leader and one replica - so technically we could
> reproduce the error in a two node cluster.
>
> Cheers,
> Michael
>
> Am Do., 6. Feb. 2020 um 01:42 Uhr schrieb Chris Hostetter <
> hossman_luc...@fucit.org>:
>
>>
>> I may be missunderstanding something in your setup, and/or I may be
>> miss-remembering things about Solr, but I think the behavior you are
>> seeing is because *search* in solr is "eventually consistent" -- while
>> "RTG" (ie: using the /get" handler) is (IIRC) "strongly consistent"
>>
>> ie: there's a reason it's called "Near Real Time Searching" and "NRT
>> Replica" ... not "RT Replica"
>>
>> When you kill a node hosting a replica, then send an update which a
>> leader
>> accepts but can't send to that replica, that replica is now "out of sync"
>> and will continue to be out of sync when it comes back online and starts
>> responding to search requests as it recovers from the leader/tlog --
>> eventually the search will have consistent results across all replicas,
>> but during the recovery period this isn't garunteed.
>>
>> If however you use the /get request handler, then it (again, IIRC)
>> consults the tlog for the latest version of the doc even if it's
>> mid-recovery and the index itself isn't yet up to date.
>>
>> So for the purposes of testing solr as a "strongly consistent" document
>> store, using /get?id=foo to check the "current" data in the document is
>> more appropriate then /select?q=id:foo
>>
>> Some more info here...
>>
>> https://lucene.apache.org/solr/guide/8_4/solrcloud-resilience.html
>> https://lucene.apache.org/solr/guide/8_4/realtime-get.html
>>
>>
>> A few other things that jumped out at me in your email that seemed weird
>> or worthy of comment...
>>
>> : Accordung to solrs documentation, a commit with openSearcher=true and
>> : waitSearcher=true and waitFlush=true only returns once everything is
>> : presisted AND the new searcher is visible.
>> :
>> : To me this sounds like that any subsequent request after a successful
>> : commit MUST hit the new searcher and is guaranteed to see the commit
>> : changes, regardless of node failures or restarts.
>>
>> that is true for *single* node solr, or a "heathy" cluster but as i
>> mentioned if a node is down when the "commit" happens it won't have the
>> document yet -- nor is it alive to process the commit.  the document
>> update -- and the commit -- are in the tlog that still needs to replay
>> when the replica comes back online
>>
>> :- A test-collection with 1 Shard and 2 NRT Replicas.
>>
>> I'm guessing since you said you were using 3 nodes, that what you
>> mean here is a single shard with a total of 3 replicas which are all NRT
>> -- remember the &quo

Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-06 Thread Michael Frank

Hi Chris,
thank you for your detailed answer!

We are aware that Solr Cloud is eventually consistent and in our
application that's fine in most cases.
However, what is really important for us is that we get a "Read Your
Writes" for a clear point in time - which in our understand should be after
hard commits with waitSearcher=true return sucessfull from all replicas. Is
that correct?
The client that indexes new documents performs a hard commit with
waitSearcher=true and after that was successful, we expect the documents to
be visible on all Replicas.
This seems to work as expected if the cluster is in a healthy state.
If we shut down nodes while updating documents and committing we observe
that commits somehow get lost.
The documents are neither visible on the leader nor on any replica! Even
after all nodes and replicas are up again.
And we don't get any error or exception from the Solrj client.
Is there any way to make sure that a commit is executed sucessfully on
_every_ replica (and fail if the replica is currently down or recovering)?
Or to get notified that the commit could not be executed because the
cluster is in an unhealthy state?
If we can confirm and verify this in our Indexing client, we could detect
failures and recover.

I don't think the /get request handler is not an option for us because it
only accepts document IDs and no search queries, which we rely heavily on.
Is that correct?


: FYI: three is no need to send a softCommit after a hardCommit
Agreed, that was just us experimenting and trying stuff.

: So to be clear: 'rf=2' means a total of 2 replicas confirmed the update
-- that includes the leader replica.  'rf=1' means the leader accepted the
doc, but all other replicas are down.
: if you wnat to me 100% certain that every replica recieved the update,
then you should be confirming rf=3
Agreed, should have been more clear. We have multiple test scenarios. Some
with 2 replicas (1 leader 1 reps) and some with 3 (1 leader, 2 reps). In
the first mail i just picked the simplest test setup that failed,
consisting of one leader and one replica - so technically we could
reproduce the error in a two node cluster.

Cheers,
Michael

Am Do., 6. Feb. 2020 um 01:42 Uhr schrieb Chris Hostetter <
hossman_luc...@fucit.org>:

>
> I may be missunderstanding something in your setup, and/or I may be
> miss-remembering things about Solr, but I think the behavior you are
> seeing is because *search* in solr is "eventually consistent" -- while
> "RTG" (ie: using the /get" handler) is (IIRC) "strongly consistent"
>
> ie: there's a reason it's called "Near Real Time Searching" and "NRT
> Replica" ... not "RT Replica"
>
> When you kill a node hosting a replica, then send an update which a leader
> accepts but can't send to that replica, that replica is now "out of sync"
> and will continue to be out of sync when it comes back online and starts
> responding to search requests as it recovers from the leader/tlog --
> eventually the search will have consistent results across all replicas,
> but during the recovery period this isn't garunteed.
>
> If however you use the /get request handler, then it (again, IIRC)
> consults the tlog for the latest version of the doc even if it's
> mid-recovery and the index itself isn't yet up to date.
>
> So for the purposes of testing solr as a "strongly consistent" document
> store, using /get?id=foo to check the "current" data in the document is
> more appropriate then /select?q=id:foo
>
> Some more info here...
>
> https://lucene.apache.org/solr/guide/8_4/solrcloud-resilience.html
> https://lucene.apache.org/solr/guide/8_4/realtime-get.html
>
>
> A few other things that jumped out at me in your email that seemed weird
> or worthy of comment...
>
> : Accordung to solrs documentation, a commit with openSearcher=true and
> : waitSearcher=true and waitFlush=true only returns once everything is
> : presisted AND the new searcher is visible.
> :
> : To me this sounds like that any subsequent request after a successful
> : commit MUST hit the new searcher and is guaranteed to see the commit
> : changes, regardless of node failures or restarts.
>
> that is true for *single* node solr, or a "heathy" cluster but as i
> mentioned if a node is down when the "commit" happens it won't have the
> document yet -- nor is it alive to process the commit.  the document
> update -- and the commit -- are in the tlog that still needs to replay
> when the replica comes back online
>
> :- A test-collection with 1 Shard and 2 NRT Replicas.
>
> I'm guessing since you said you were using 3 nodes, that what you
> mean here is a single shard with a total of 3 replicas which are all NRT
> -- remember the "leader&qu

Re: Bug? Documents not visible after sucessful commit - chaos testing

2020-02-05 Thread Chris Hostetter



I may be missunderstanding something in your setup, and/or I may be 
miss-remembering things about Solr, but I think the behavior you are 
seeing is because *search* in solr is "eventually consistent" -- while 
"RTG" (ie: using the /get" handler) is (IIRC) "strongly consistent"

ie: there's a reason it's called "Near Real Time Searching" and "NRT 
Replica" ... not "RT Replica"

When you kill a node hosting a replica, then send an update which a leader 
accepts but can't send to that replica, that replica is now "out of sync" 
and will continue to be out of sync when it comes back online and starts 
responding to search requests as it recovers from the leader/tlog -- 
eventually the search will have consistent results across all replicas, 
but during the recovery period this isn't garunteed.

If however you use the /get request handler, then it (again, IIRC) 
consults the tlog for the latest version of the doc even if it's 
mid-recovery and the index itself isn't yet up to date.

So for the purposes of testing solr as a "strongly consistent" document 
store, using /get?id=foo to check the "current" data in the document is 
more appropriate then /select?q=id:foo

Some more info here...

https://lucene.apache.org/solr/guide/8_4/solrcloud-resilience.html
https://lucene.apache.org/solr/guide/8_4/realtime-get.html


A few other things that jumped out at me in your email that seemed weird 
or worthy of comment...

: Accordung to solrs documentation, a commit with openSearcher=true and
: waitSearcher=true and waitFlush=true only returns once everything is
: presisted AND the new searcher is visible.
: 
: To me this sounds like that any subsequent request after a successful
: commit MUST hit the new searcher and is guaranteed to see the commit
: changes, regardless of node failures or restarts.

that is true for *single* node solr, or a "heathy" cluster but as i 
mentioned if a node is down when the "commit" happens it won't have the 
document yet -- nor is it alive to process the commit.  the document 
update -- and the commit -- are in the tlog that still needs to replay 
when the replica comes back online

:- A test-collection with 1 Shard and 2 NRT Replicas.

I'm guessing since you said you were using 3 nodes, that what you 
mean here is a single shard with a total of 3 replicas which are all NRT 
-- remember the "leader" is still itself an NRT  replica.  

(i know, i know ... i hate the terminology) 

This is a really important point to clarify in your testing because of how 
you are using 'rf' ... seeing exactly how you create your collection is 
important to make sure we're talking about the same thing.

: Each "transaction" adds, modifys and deletes documents and we ensure that
: each response has a "rf=2" (achieved replication factor=2) attribute.

So to be clear: 'rf=2' means a total of 2 replicas confirmed the update -- 
that includes the leader replica.  'rf=1' means the leader accepted the 
doc, but all other replicas are down.

if you wnat to me 100% certain that every replica recieved the update, 
then you should be confirming rf=3

: After a "transaction" was performed without errors we send first a
: hardCommit and then a softCommit, both with waitFlush=true,
: waitSearcher=true and ensure they both return without errors.

FYI: three is no need to send a softCommit after a hardCommit -- a hard 
commit with openSearcher=true (the default) is a super-set of a soft 
commit.



-Hoss
http://www.lucidworks.com/

Bug? Documents not visible after sucessful commit - chaos testing

2020-02-05 Thread Michael Frank

Hi All,

In our Solr Cloud cluster (8.4.1) sometimes committed documents are not
visible to subsequent requests sent after a, apprently, sucessful
commit(waitFlush=true, wait=searcherTrue). This behaviour does not happen
if all nodes are stable, but will happen eventually if we kill off random
nodes using a chaosMonkey script.

Accordung to solrs documentation, a commit with openSearcher=true and
waitSearcher=true and waitFlush=true only returns once everything is
presisted AND the new searcher is visible.

To me this sounds like that any subsequent request after a successful
commit MUST hit the new searcher and is guaranteed to see the commit
changes, regardless of node failures or restarts.

Is this assumption on strong-consistency for commits with
openSearcher=true, waitSearcher=true and waitFlush=true correct?

If so, we discoverd a bug.

TestSetup:

Infrastructure

   - 3 Solr  (8.4.1) Nodes in Docker Containers
   - Each Solr node on its own Host (same hosts run the 3 Zookeeper nodes)
   - Persistent Host Volume is mounted inside the DockerContainer
   - Solr instances are pinned to host.
   - A test-collection with 1 Shard and 2 NRT Replicas.
   - Using Solrj (8.4.1)  and CloudSolrClient for communication.
   - Containers are automatically restarted on errors
   - autoCommit maxDocs1 openSearcher=false
   - autoSoftCommit -never-
   - (We fairly often commit ourself)
   - the solrconfig.xml 


Scenario
After adding an initial batch of documents we perform multiple
"transactions".
Each "transaction" adds, modifys and deletes documents and we ensure that
each response has a "rf=2" (achieved replication factor=2) attribute.
A transaction has to be set atomically visible or not.
We achieve this by storing a CurrentVersion counter attribute in each
document.
This makes our life easier verifiying this corner case, as we can search
and count all documents having a specific transaction-id-counter value.
After a "transaction" was performed without errors we send first a
hardCommit and then a softCommit, both with waitFlush=true,
waitSearcher=true and ensure they both return without errors.
Only after everything happend without errors, we start to verifiy
visibility and correctness of the commited "transaction" by sending
counting queries against solr, filtering on our transaction-id-counter.
This works fine, as long as all nodes are stable. However ..

ErrorCase
If we periodically kill (SIGTERM) random solr nodes every 30, eventually
the aforementioned visibility gurantees after comit(waitFlush=true,
waitSearcher=true) break and documents that should be there/visible are not.
Sometimes this happens after minutes, somtimes it takes hours to hit this
case.

In the error case the verification counting queries return with ZERO hits.

We suspect that commits do not reach all replicas or that commits are
lost/ignored.
Unfortunatly a commit request do not include the "rf" attribute in their
responsen, which would allow us to assert the achieved replication factor.

We hope someone has an idea or clue how to fix this or why this happens, as
this is a showstopper for us and we require strong-consistency gurantees.
(Once a commit was sucessfull at time T, ALL subsequent requests after T
MUST see the new documents)


Some notes:

   - the obeserved errors can be reproduced regardless of these settings in
   the solrconfig.xml 
  - useColdSearcher=true/false
  - cache's autowarmCount=0 or any other value
  - Errors appear to happen more frequently if we have more load (more
   collections with the same test)


Cheers,
Michael

Re: Quepid, the relevance testing tool for Solr, released as open source

2019-07-26 Thread Doug Turnbull

Quepid has been really powerful for helping teams bootstsrap on relevance
problems for teams just getting started and allowing the product & tech
teams to collaborate.

Often we start with a dozen or so queries, get some ratings, create a bit
of success criteria and tune in Quepid. You can do this in a 'sprint' like
structure, where the next 'sprint' you tackle the next dozen or so queries,
ensure the stuff you fixed before is still working, and make progress on
the next set of use cases. During the process, you get a lot of feedback
you can give to the product team about 'why' results are ranked the way
they work in a relevance tuning sandbox.

I would contrast this with RRE, that you might have heard of, which has
different strengths. For example, if I wanted to automate or do a bit more
"CI" kinds of relevance evaluation, without the interactive "IDE" that
Quepid provides.

Quarite is another tool in the genre from Tim Allison. I (think) it also
adds features for genetic optimization of relevance parameters

Very cool to see the ecosystem of relevance tuning tools growing!

https://github.com/SeaseLtd/rated-ranking-evaluator

https://github.com/mitre/quaerite

-Doug

On Fri, Jul 26, 2019 at 8:03 AM Charlie Hull  wrote:

> Hi all,
>
> We've finally made Quepid, the relevance testing tool, open source.
> There's also a free hosted version at www.quepid.com . Looking forward
> to contributions driving the project forward! Quepid is a way to record
> human relevance judgements, and then to experiment with query tuning and
> see the results in real time.
>
> More details at
>
> https://opensourceconnections.com/blog/2019/07/25/2019-07-22-quepid-is-now-open-source/
>
> (also particularly pleased to see Luwak, the stored query engine we
> built at Flax become part of Lucene - it's a great day for open source!)
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>
>

-- 
*Doug Turnbull **| CTO* | OpenSource Connections
<http://opensourceconnections.com>, LLC | 240.476.9983
Author: Relevant Search <http://manning.com/turnbull>
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Quepid, the relevance testing tool for Solr, released as open source

2019-07-26 Thread Charlie Hull


Hi all,

We've finally made Quepid, the relevance testing tool, open source. 
There's also a free hosted version at www.quepid.com . Looking forward 
to contributions driving the project forward! Quepid is a way to record 
human relevance judgements, and then to experiment with query tuning and 
see the results in real time.


More details at 
https://opensourceconnections.com/blog/2019/07/25/2019-07-22-quepid-is-now-open-source/


(also particularly pleased to see Luwak, the stored query engine we 
built at Flax become part of Lucene - it's a great day for open source!)


Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Re: solr in memory testing

2017-08-08 Thread Xie, Sean

There is MiniSolrCloudCluster that you can use for testing. This is from 
solr-test-framework: 
https://github.com/apache/lucene-solr/tree/master/solr/test-framework.

On 8/8/17, 7:54 AM, "Thaer Sammar" <t.sam...@geophy.com> wrote:

Hi,

We are using solr 6.6, and we are looking for guidance documentation or 
java example on how to create a solr core inmeory for the purpose of testing 
using solrj. We found 
https://wiki.searchtechnologies.com/index.php/Unit_Testing_with_Embedded_Solr 
but this works for solr v.4 and earlier versions.

regards,


Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.

solr in memory testing

2017-08-08 Thread Thaer Sammar

Hi,

We are using solr 6.6, and we are looking for guidance documentation or java 
example on how to create a solr core inmeory for the purpose of testing using 
solrj. We found 
https://wiki.searchtechnologies.com/index.php/Unit_Testing_with_Embedded_Solr 
but this works for solr v.4 and earlier versions.

regards,

Re: Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Mattmann, Chris A (3010)

++1 awesome job

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif"  wrote:

Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov :

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. :
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> > Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>

Re: Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Luís Filipe Nassif

Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov :

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. :
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> > Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>

Re: Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Konstantin Gribov

Tim,

it's a awesome feature for downstream projects' integration tests. Thanks
for implementing it!

чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. :

> All,
>
> I finally got around to documenting Apache Tika's MockParser[1].  As of
> Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and you
> can simulate:
>
> 1. Regular catchable exceptions
> 2. OOMs
> 3. Permanent hangs
>
> This will allow you to determine if your ingest framework is robust
> against these issues.
>
> As always, we fix Tika when we can, but if history is any indicator,
> you'll want to make sure your ingest code can handle these issues if you
> are handling millions/billions of files from the wild.
>
> Cheers,
>
> Tim
>
>
> [1] https://wiki.apache.org/tika/MockParser
>
-- 

Best regards,
Konstantin Gribov

Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Allison, Timothy B.

All,

I finally got around to documenting Apache Tika's MockParser[1].  As of Tika 
1.15 (unreleased), add tika-core-tests.jar to your class path, and you can 
simulate:

1. Regular catchable exceptions
2. OOMs
3. Permanent hangs

This will allow you to determine if your ingest framework is robust against 
these issues.

As always, we fix Tika when we can, but if history is any indicator, you'll 
want to make sure your ingest code can handle these issues if you are handling 
millions/billions of files from the wild.

Cheers,

Tim


[1] https://wiki.apache.org/tika/MockParser

Unit testing HttpPost With an Embedded Solr Server

2016-08-17 Thread Jennifer Coston



Hello,

I have written a data service to send an HttpPost command to post JSON to
Solr. The code is working, but now I want to switch to using an embedded
Solr server for just  the unit tests. The problem is that the embedded Solr
server doesn't seem to be starting an embedded server with a port. So I'm
at a loss on how to test this. I guess I have two questions. (1) How do I
unit test my post command with an embedded Solr server? (2) If it isn't
possible to use the embedded Solr server, I believe I read somewhere that
Solr uses a Jetty server. Is it possible to convert an embedded jetty
server (with a port I can access) to a Solr server?

Here is the class I am trying to test:

public class SolrDataServiceClient {

private String urlString;
private HttpClient httpClient;
private final Logger LOGGER = LoggerFactory.getLogger
(SolrDataServiceClient.class);

/**
 * Constructor for connecting to the Solr Server
 * @param solrCore
 * @param serverName
 * @param portNumber
 */
public SolrDataServiceClient(String solrCore, String serverName,
String portNumber){
LOGGER.info("Initializing new Http Client to Connect To Solr");
urlString = serverName + ":" + portNumber + "/solr/" + solrCore
;

if(httpClient == null){
httpClient = new HttpClient();
}
}

/**
* Post the provided JSON to Solr
*/
public CloseableHttpResponse postJSON(String jsonToAdd) {
CloseableHttpResponse response = null;
try {
CloseableHttpClient client = 
HttpClients.createDefault();
HttpPost httpPost = new HttpPost(urlString +
"/update/json/docs");
HttpEntity entity = new ByteArrayEntity(jsonToAdd
.getBytes("UTF-8"));
httpPost.setEntity(entity);
httpPost.setHeader("Content-type", "application/json");
LOGGER.debug("httpPost = " + httpPost.toString());
response = client.execute(httpPost);
String result = EntityUtils.toString(response.getEntity
());
LOGGER.debug("result = " + result);
client.close();
} catch (IOException e) {
LOGGER.error("IOException", e);
}

return response;
}


Here is my JUnit test:

public class SolrDataServiceClientTest {

private static EmbeddedSolrServer embeddedServer;
private static SolrDataServiceClient solrDataServiceClient;

@BeforeClass
public static void setUpBeforeClass() throws Exception {
System.setProperty("solr.solr.home", "solr/conf");
System.setProperty("solr.data.dir", new File(
"target/solr-embedded-data").getAbsolutePath());
CoreContainer coreContainer = new CoreContainer("solr/conf");
coreContainer.load();

CoreDescriptor cd = new CoreDescriptor(coreContainer, "myCoreName",
new File("solr").getAbsolutePath());
coreContainer.create(cd);

embeddedServer = new EmbeddedSolrServer(coreContainer, "myCoreName");

solrDataServiceClient = new SolrDataServiceClient("myCoreName",
"http://localhost;, "8983"); //I'm not sure what should go here
}

@Test
public void testPostJson() {
 String testJson = " { " +
 "\"observationId\": \"12345c\"," +
 "\"observationType\": \"image\"," +
"\"locationLat\": 38.9215," +
 "\"locationLon\": -77.235" +
"}";
 CloseableHttpResponse response = solrDataServiceClient.postJSON(
testJson);
 assertEquals(response.getStatusLine().getStatusCode(), 200);
 }

Thank you!

Jennifer

RE: SolrCloud App Unit Testing

2016-03-20 Thread Davis, Daniel (NIH/NLM) [C]

MiniSolrCloudCluster is intended for building unit tests for cloud commands 
within Solr itself.

What most people do to test applications based on Solr (and their Solr 
configurations) is to start solr either on their CI server or in the cloud 
(more likely the later), and then point their application at that Solr instance 
through configuration for the unit tests.   They may also have separate tests 
to test the Solr collection/core configuration itself.

You can have your CI tool (Travis/etc.) or unit test scripts start-up Solr 
locally, or in the cloud, using various tools and concoctions.   Part of the 
core of that is the solr command-line in SOLR_HOME/bin, post tool in 
SOLR_HOME/bin, and zkcli in SOLR_HOME/server/scripts/cloud-scripts.

To start Solr in the cloud, you should look towards something that exists:
https://github.com/lucidworks/solr-scale-tk 
https://github.com/vkhatri/chef-solrcloud

Hope this helps,

-Dan

-Original Message-
From: Madhire, Naveen [mailto:naveen.madh...@capitalone.com] 
Sent: Thursday, March 17, 2016 11:24 AM
To: solr-user@lucene.apache.org
Subject: FW: SolrCloud App Unit Testing


Hi,

I am writing a Solr Application, can anyone please let me know how to Unit test 
the application?

I see we have MiniSolrCloudCluster class available in Solr, but I am confused 
about how to use that for Unit testing.

How should I create a embedded server for unit testing?



Thanks,
Naveen


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread Shawn Heisey

On 3/19/2016 7:11 AM, GW wrote:
> I think the easiest way to write apps for Solr is with some kind of
> programming language and the REST API. Don't bother with the PHP or Perl
> modules. They are deprecated and beyond useless. just use the HTTP call
> that you see in Solr Admin. Mind the URL encoding when putting together
> your server calls.

The problem with using the REST-like API directly is that you have to
understand the API completely and construct every URL parameter
yourself.  You also have to understand the response format and write
code to extract the info you need from the response.

Using a pre-made client makes it so you don't have to do ANY of that. 
The request is built up from easily understood objects/methods, and all
the useful information from the response is loaded into data structures
that are fairly easy to understand if you know the language you're
writing in.

There are a LOT of php clients, and some of them have seen new releases
about three months ago.  I wouldn't call that deprecated.

https://wiki.apache.org/solr/IntegratingSolr#PHP

There aren't as many clients for Perl.  I haven't checked the last
update for these yet:

https://wiki.apache.org/solr/IntegratingSolr#Perl

Thanks,
Shawn

Re: FW: SolrCloud App Unit Testing

2016-03-19 Thread GW

I think the easiest way to write apps for Solr is with some kind of
programming language and the REST API. Don't bother with the PHP or Perl
modules. They are deprecated and beyond useless. just use the HTTP call
that you see in Solr Admin. Mind the URL encoding when putting together
your server calls.

I've used Perl and PHP with Curl to create Solr Apps

PHP:

function fetchContent($URL){
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $URL);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpCode == "404") {
$data="nin";
}
return $data;
}


switch($filters){

case "":
$url = "http://localhost:8983/solr/products/query?q=name
:".$urlsearch."^20+OR+short_description:".$urlsearch."~6=13=".$start."=*,score=json";
break;

case "clothing":
$url = "http://localhost:8983/solr/products/query
?q=name:%22".$urlsearch."%22^20+OR
+short_description:%22".$urlsearch."%22~6=13=".$start."=*,score=json";
break;

case "beauty cosmetics":

$url = "http://localhost:8983/solr/products/query?q=name
:".$urlsearch."^20+OR+short_description:".$urlsearch."~6=13=".$start."=*,score=json";


break;


}


$my_data = fetchContent($url);


Data goes into the $my_data as a JSON string in this case.


/// your forward facing App can be in Apache round robin in
front of a Solr system. This gives you insane scalability in the client app
and the Solr service.


Hope that helps.

GW


On 17 March 2016 at 11:23, Madhire, Naveen <naveen.madh...@capitalone.com>
wrote:

>
> Hi,
>
> I am writing a Solr Application, can anyone please let me know how to Unit
> test the application?
>
> I see we have MiniSolrCloudCluster class available in Solr, but I am
> confused about how to use that for Unit testing.
>
> How should I create a embedded server for unit testing?
>
>
>
> Thanks,
> Naveen
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: SolrCloud App Unit Testing

2016-03-19 Thread Steve Davids

Naveen,

The Solr codebase generally uses the base “SolrTestCaseJ4” class and sometimes 
mixes in the cloud cluster. I personally write a generic abstract base test 
class to fit my needs and have an abstract `getSolrServer` method with an 
EmbeddedSolrServer implementation along with a separate implementation for the 
CloudSolrServer. I use the EmbeddedSolrServer for almost all of my test cases 
since it is a lot faster to setup, I’ll pull in the Cloud implementation if 
there is some distributed logic that is necessary for testing. Here is a simple 
example project (https://gitlab.com/bti360/solr-exercise/tree/example-solution 
<https://gitlab.com/bti360/solr-exercise/tree/example-solution>) which has a 
base test 
<https://gitlab.com/bti360/solr-exercise/blob/example-solution/src/test/java/com/bti360/gt/search/BaseSolrTestCase.java>
 which piggy-backs off the SolrTestCase class. If you don’t want to complete 
the “exercise” switch over to the 

Hopefully that points you in the right direction,

-Steve

> On Mar 17, 2016, at 1:03 PM, Davis, Daniel (NIH/NLM) [C] 
> <daniel.da...@nih.gov> wrote:
> 
> MiniSolrCloudCluster is intended for building unit tests for cloud commands 
> within Solr itself.
> 
> What most people do to test applications based on Solr (and their Solr 
> configurations) is to start solr either on their CI server or in the cloud 
> (more likely the later), and then point their application at that Solr 
> instance through configuration for the unit tests.   They may also have 
> separate tests to test the Solr collection/core configuration itself.
> 
> You can have your CI tool (Travis/etc.) or unit test scripts start-up Solr 
> locally, or in the cloud, using various tools and concoctions.   Part of the 
> core of that is the solr command-line in SOLR_HOME/bin, post tool in 
> SOLR_HOME/bin, and zkcli in SOLR_HOME/server/scripts/cloud-scripts.
> 
> To start Solr in the cloud, you should look towards something that exists:
>   https://github.com/lucidworks/solr-scale-tk 
>   https://github.com/vkhatri/chef-solrcloud
> 
> Hope this helps,
> 
> -Dan
> 
> -Original Message-
> From: Madhire, Naveen [mailto:naveen.madh...@capitalone.com] 
> Sent: Thursday, March 17, 2016 11:24 AM
> To: solr-user@lucene.apache.org
> Subject: FW: SolrCloud App Unit Testing
> 
> 
> Hi,
> 
> I am writing a Solr Application, can anyone please let me know how to Unit 
> test the application?
> 
> I see we have MiniSolrCloudCluster class available in Solr, but I am confused 
> about how to use that for Unit testing.
> 
> How should I create a embedded server for unit testing?
> 
> 
> 
> Thanks,
> Naveen
> 
> 
> The information contained in this e-mail is confidential and/or proprietary 
> to Capital One and/or its affiliates and may only be used solely in 
> performance of work or services for Capital One. The information transmitted 
> herewith is intended only for use by the individual or entity to which it is 
> addressed. If the reader of this message is not the intended recipient, you 
> are hereby notified that any review, retransmission, dissemination, 
> distribution, copying or other use of, or taking of any action in reliance 
> upon this information is strictly prohibited. If you have received this 
> communication in error, please contact the sender and delete the material 
> from your computer.

FW: SolrCloud App Unit Testing

2016-03-18 Thread Madhire, Naveen


Hi,

I am writing a Solr Application, can anyone please let me know how to Unit test 
the application?

I see we have MiniSolrCloudCluster class available in Solr, but I am confused 
about how to use that for Unit testing.

How should I create a embedded server for unit testing?



Thanks,
Naveen


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Erik Hatcher

Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing 
you mention, but since you’re already using SolrCloud how about simply 
upconfig’ing the configuration from the Git repo, create a temporary collection 
using that configset and smoke test it before making it ready for end 
client/customer/user use?   Maybe the configset and collection created for 
smoke testing are just temporary in order to validate it.

—
Erik Hatcher, Senior Solutions Architect
http://www.lucidworks.com <http://www.lucidworks.com/>



> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] 
> <daniel.da...@nih.gov> wrote:
> 
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
> 
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
> 
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
> 
> Any suggestions?
> 
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Alexandre Rafalovitch

Well, I guess NIH stands for Not Invented Here. No idea what NLM is for.

P.s. sorry, could not resist. I worked for orgs like that too :-(
On 1 Jan 2016 12:03 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
wrote:

> That's incredibly cool.   Much easier than the chef/puppet scripts and
> stuff I've seen.I'm certain to play with this and get under the hood;
> however, we locally don't have a permission to use AWS EC2 in this corner
> of NLM.There's some limited use of S3 and Glacier.   Maybe we'll
> negotiate EC2 for dev later this year, maybe not.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 11:40 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Makes sense.
>
> Answering the answer email in this thread, did you look at Solr Scale?
> Maybe it has the base infrastructure you need:
> https://github.com/LucidWorks/solr-scale-tk
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] <
> daniel.da...@nih.gov> wrote:
> >> What is the next step you are stuck on?
> >>
> >> Regards,
> >>Alex
> >
> > I'm not really stuck.   My question has been about the best practices.
>  I am trying to work against "not-invented-here" syndrome,
> "only-useful-here" syndrome, and "boil-the-ocean" syndrome.I have to
> make the solution work with a Continuous Integration (CI) environment that
> will not be creating either docker images or VMs for each project, and so
> I've been seeking the wisdom of the crowd.
> >
> > -Original Message-
> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> > Sent: Thursday, December 31, 2015 12:42 AM
> > To: solr-user <solr-user@lucene.apache.org>
> > Subject: Re: Testing Solr configuration, schema, and other fields
> >
> > I might be just confused here, but I am not sure what your bottle neck
> actually is. You seem to know your critical path already, so how can we
> help?
> >
> > Starting new solr core from given configuration directory is easy.
> Catching hard errors from that is probably just gripping logs or a custom
> logger.
> >
> > And you don't seem to be talking about lint style soft sanity checks,
> but rather the initialization stopping hard checks.
> >
> > What is the next step you are stuck on?
> >
> > Regards,
> >Alex
> > On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]"
> > <daniel.da...@nih.gov>
> > wrote:
> >
> >> At my organization, I want to create a tool that allows users to keep a
> >> solr configuration as a Git repository.   Then, I want my Continuous
> >> Integration environment to take some branch of the git repository and
> >> "publish" it into ZooKeeper/SolrCloud.
> >>
> >> Working on my own, it is only a very small pain to note foolish errors
> >> I've made, fix them, and restart.However, I want my users to be
> able to
> >> edit their own Solr schema and config *most* of the time, at least on
> >> development servers.They will not have command-line access to these
> >> servers, and I want to avoid endless restarts.
> >>
> >> I'm not interested in fighting to maintain such a useless thing as a
> >> DTD/XSD without community support; what I really want to know is whether
> >> Solr will start and can index some sample documents.   I'm wondering
> >> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> >> and capture error messages/exceptions in a reasonable way. This tool
> >> could then be run by my users before they commit to git, and then
> >> again by the CI server before it "publishes" the configuration to
> >> ZooKeeper/SolrCloud.
> >>
> >> Any suggestions?
> >>
> >> Dan Davis, Systems/Applications Architect (Contractor), Office of
> >> Computer and Communications Systems, National Library of Medicine,
> >> NIH
> >>
> >>
>

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Alexandre Rafalovitch

Makes sense.

Answering the answer email in this thread, did you look at Solr Scale?
Maybe it has the base infrastructure you need:
https://github.com/LucidWorks/solr-scale-tk

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C]
<daniel.da...@nih.gov> wrote:
>> What is the next step you are stuck on?
>>
>> Regards,
>>Alex
>
> I'm not really stuck.   My question has been about the best practices.   I am 
> trying to work against "not-invented-here" syndrome, "only-useful-here" 
> syndrome, and "boil-the-ocean" syndrome.I have to make the solution work 
> with a Continuous Integration (CI) environment that will not be creating 
> either docker images or VMs for each project, and so I've been seeking the 
> wisdom of the crowd.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 12:42 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> I might be just confused here, but I am not sure what your bottle neck 
> actually is. You seem to know your critical path already, so how can we help?
>
> Starting new solr core from given configuration directory is easy. Catching 
> hard errors from that is probably just gripping logs or a custom logger.
>
> And you don't seem to be talking about lint style soft sanity checks, but 
> rather the initialization stopping hard checks.
>
> What is the next step you are stuck on?
>
> Regards,
>Alex
> On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
> wrote:
>
>> At my organization, I want to create a tool that allows users to keep a
>> solr configuration as a Git repository.   Then, I want my Continuous
>> Integration environment to take some branch of the git repository and
>> "publish" it into ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors
>> I've made, fix them, and restart.However, I want my users to be able to
>> edit their own Solr schema and config *most* of the time, at least on
>> development servers.They will not have command-line access to these
>> servers, and I want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a
>> DTD/XSD without community support; what I really want to know is whether
>> Solr will start and can index some sample documents.   I'm wondering
>> whether I might be able to build a tool to fire up an EmbeddedSolrServer
>> and capture error messages/exceptions in a reasonable way. This tool
>> could then be run by my users before they commit to git, and then
>> again by the CI server before it "publishes" the configuration to
>> ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of
>> Computer and Communications Systems, National Library of Medicine, NIH
>>
>>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

That's incredibly cool.   Much easier than the chef/puppet scripts and stuff 
I've seen.I'm certain to play with this and get under the hood; however, we 
locally don't have a permission to use AWS EC2 in this corner of NLM.
There's some limited use of S3 and Glacier.   Maybe we'll negotiate EC2 for dev 
later this year, maybe not.
 
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 31, 2015 11:40 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

Makes sense.

Answering the answer email in this thread, did you look at Solr Scale?
Maybe it has the base infrastructure you need:
https://github.com/LucidWorks/solr-scale-tk

Regards,
   Alex.

Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] 
<daniel.da...@nih.gov> wrote:
>> What is the next step you are stuck on?
>>
>> Regards,
>>Alex
>
> I'm not really stuck.   My question has been about the best practices.   I am 
> trying to work against "not-invented-here" syndrome, "only-useful-here" 
> syndrome, and "boil-the-ocean" syndrome.I have to make the solution work 
> with a Continuous Integration (CI) environment that will not be creating 
> either docker images or VMs for each project, and so I've been seeking the 
> wisdom of the crowd.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 12:42 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> I might be just confused here, but I am not sure what your bottle neck 
> actually is. You seem to know your critical path already, so how can we help?
>
> Starting new solr core from given configuration directory is easy. Catching 
> hard errors from that is probably just gripping logs or a custom logger.
>
> And you don't seem to be talking about lint style soft sanity checks, but 
> rather the initialization stopping hard checks.
>
> What is the next step you are stuck on?
>
> Regards,
>Alex
> On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" 
> <daniel.da...@nih.gov>
> wrote:
>
>> At my organization, I want to create a tool that allows users to keep a
>> solr configuration as a Git repository.   Then, I want my Continuous
>> Integration environment to take some branch of the git repository and 
>> "publish" it into ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors
>> I've made, fix them, and restart.However, I want my users to be able to
>> edit their own Solr schema and config *most* of the time, at least on
>> development servers.They will not have command-line access to these
>> servers, and I want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a 
>> DTD/XSD without community support; what I really want to know is whether
>> Solr will start and can index some sample documents.   I'm wondering
>> whether I might be able to build a tool to fire up an EmbeddedSolrServer
>> and capture error messages/exceptions in a reasonable way. This tool
>> could then be run by my users before they commit to git, and then 
>> again by the CI server before it "publishes" the configuration to 
>> ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of 
>> Computer and Communications Systems, National Library of Medicine, 
>> NIH
>>
>>

Re: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Erick Erickson

Hmmm, a couple of things:

the bin/solr script could be used as a model in this scenario for
how to automate a lot of this. I'm thinking you can skip all the
argument parsing and that and just see how the SolrCLI jar file
is used to spin up collections, upload configs and the like. In fact,
assuming a unique collection name per developer you could
use a common dev SolrCloud setup for this.

Or heck, perhaps just use the bin/solr script for all of that...

The other thing I was assuming is that you don't _really_ care
about starting/stopping Solr, it's more the requirement for your
devs to upload the configs, reload a collection, find out whether
the collection is running or not, if not find the log files and see why
cycle you'd like to shorten

FWIW,
Erick

On Thu, Dec 31, 2015 at 8:31 AM, Davis, Daniel (NIH/NLM) [C]
<daniel.da...@nih.gov> wrote:
> Erik, that suggests an additional approach that seems to have "legs":
>
> * A webapp that acts as a sort of Cloud IDE for Solr configsets.   It 
> supports multiple projects and a single SolrCloud cluster.   For each 
> project, it upconfigs a git repository local to the webapp, and has the 
> ability to define tests that run against a "temporary" collection to verify 
> the configuration.
>
> * A command-line utility that upconfigs the configuration a local directory, 
> creates a temporary collection, and supports an optional "tests" by applying 
> an update query.
>
> Since the webapp would be based on something like the command-line utility 
> (maybe in library form), I think I'm still going to target the command-line 
> utility as my "minimum viable product".   I'll support SolrCloud first, and 
> then see about EmbeddedSolrServer.
>
> -Original Message-
> From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Sent: Thursday, December 31, 2015 10:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing 
> you mention, but since you’re already using SolrCloud how about simply 
> upconfig’ing the configuration from the Git repo, create a temporary 
> collection using that configset and smoke test it before making it ready for 
> end client/customer/user use?   Maybe the configset and collection created 
> for smoke testing are just temporary in order to validate it.
>
> —
> Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com 
> <http://www.lucidworks.com/>
>
>
>
>> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] 
>> <daniel.da...@nih.gov> wrote:
>>
>> At my organization, I want to create a tool that allows users to keep a solr 
>> configuration as a Git repository.   Then, I want my Continuous Integration 
>> environment to take some branch of the git repository and "publish" it into 
>> ZooKeeper/SolrCloud.
>>
>> Working on my own, it is only a very small pain to note foolish errors I've 
>> made, fix them, and restart.However, I want my users to be able to edit 
>> their own Solr schema and config *most* of the time, at least on development 
>> servers.They will not have command-line access to these servers, and I 
>> want to avoid endless restarts.
>>
>> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
>> without community support; what I really want to know is whether Solr will 
>> start and can index some sample documents.   I'm wondering whether I might 
>> be able to build a tool to fire up an EmbeddedSolrServer and capture error 
>> messages/exceptions in a reasonable way. This tool could then be run by 
>> my users before they commit to git, and then again by the CI server before 
>> it "publishes" the configuration to ZooKeeper/SolrCloud.
>>
>> Any suggestions?
>>
>> Dan Davis, Systems/Applications Architect (Contractor), Office of
>> Computer and Communications Systems, National Library of Medicine, NIH
>>
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

> What is the next step you are stuck on?
> 
> Regards,
>Alex

I'm not really stuck.   My question has been about the best practices.   I am 
trying to work against "not-invented-here" syndrome, "only-useful-here" 
syndrome, and "boil-the-ocean" syndrome.I have to make the solution work 
with a Continuous Integration (CI) environment that will not be creating either 
docker images or VMs for each project, and so I've been seeking the wisdom of 
the crowd.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 31, 2015 12:42 AM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

I might be just confused here, but I am not sure what your bottle neck actually 
is. You seem to know your critical path already, so how can we help?

Starting new solr core from given configuration directory is easy. Catching 
hard errors from that is probably just gripping logs or a custom logger.

And you don't seem to be talking about lint style soft sanity checks, but 
rather the initialization stopping hard checks.

What is the next step you are stuck on?

Regards,
   Alex
On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
wrote:

> At my organization, I want to create a tool that allows users to keep a
> solr configuration as a Git repository.   Then, I want my Continuous
> Integration environment to take some branch of the git repository and 
> "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors
> I've made, fix them, and restart.However, I want my users to be able to
> edit their own Solr schema and config *most* of the time, at least on
> development servers.They will not have command-line access to these
> servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a 
> DTD/XSD without community support; what I really want to know is whether
> Solr will start and can index some sample documents.   I'm wondering
> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> and capture error messages/exceptions in a reasonable way. This tool
> could then be run by my users before they commit to git, and then 
> again by the CI server before it "publishes" the configuration to 
> ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

Heh

National Library of Medicine (NLM) is all over the map in terms of 
"not-invented-here", being a large organization within a large organization.  
It's my personal tendency towards "not-invented-here" that concerns me.

-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 31, 2015 12:24 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: RE: Testing Solr configuration, schema, and other fields

Well, I guess NIH stands for Not Invented Here. No idea what NLM is for.

P.s. sorry, could not resist. I worked for orgs like that too :-( On 1 Jan 2016 
12:03 am, "Davis, Daniel (NIH/NLM) [C]" <daniel.da...@nih.gov>
wrote:

> That's incredibly cool.   Much easier than the chef/puppet scripts and
> stuff I've seen.I'm certain to play with this and get under the hood;
> however, we locally don't have a permission to use AWS EC2 in this corner
> of NLM.There's some limited use of S3 and Glacier.   Maybe we'll
> negotiate EC2 for dev later this year, maybe not.
>
> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Thursday, December 31, 2015 11:40 AM
> To: solr-user <solr-user@lucene.apache.org>
> Subject: Re: Testing Solr configuration, schema, and other fields
>
> Makes sense.
>
> Answering the answer email in this thread, did you look at Solr Scale?
> Maybe it has the base infrastructure you need:
> https://github.com/LucidWorks/solr-scale-tk
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 31 December 2015 at 23:37, Davis, Daniel (NIH/NLM) [C] < 
> daniel.da...@nih.gov> wrote:
> >> What is the next step you are stuck on?
> >>
> >> Regards,
> >>Alex
> >
> > I'm not really stuck.   My question has been about the best practices.
>  I am trying to work against "not-invented-here" syndrome,
> "only-useful-here" syndrome, and "boil-the-ocean" syndrome.I have to
> make the solution work with a Continuous Integration (CI) environment 
> that will not be creating either docker images or VMs for each 
> project, and so I've been seeking the wisdom of the crowd.
> >
> > -Original Message-
> > From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> > Sent: Thursday, December 31, 2015 12:42 AM
> > To: solr-user <solr-user@lucene.apache.org>
> > Subject: Re: Testing Solr configuration, schema, and other fields
> >
> > I might be just confused here, but I am not sure what your bottle 
> > neck
> actually is. You seem to know your critical path already, so how can 
> we help?
> >
> > Starting new solr core from given configuration directory is easy.
> Catching hard errors from that is probably just gripping logs or a 
> custom logger.
> >
> > And you don't seem to be talking about lint style soft sanity 
> > checks,
> but rather the initialization stopping hard checks.
> >
> > What is the next step you are stuck on?
> >
> > Regards,
> >Alex
> > On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]"
> > <daniel.da...@nih.gov>
> > wrote:
> >
> >> At my organization, I want to create a tool that allows users to keep a
> >> solr configuration as a Git repository.   Then, I want my Continuous
> >> Integration environment to take some branch of the git repository 
> >> and "publish" it into ZooKeeper/SolrCloud.
> >>
> >> Working on my own, it is only a very small pain to note foolish errors
> >> I've made, fix them, and restart.However, I want my users to be
> able to
> >> edit their own Solr schema and config *most* of the time, at least on
> >> development servers.They will not have command-line access to these
> >> servers, and I want to avoid endless restarts.
> >>
> >> I'm not interested in fighting to maintain such a useless thing as 
> >> a DTD/XSD without community support; what I really want to know is whether
> >> Solr will start and can index some sample documents.   I'm wondering
> >> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> >> and capture error messages/exceptions in a reasonable way. This tool
> >> could then be run by my users before they commit to git, and then 
> >> again by the CI server before it "publishes" the configuration to 
> >> ZooKeeper/SolrCloud.
> >>
> >> Any suggestions?
> >>
> >> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> >> Computer and Communications Systems, National Library of Medicine, 
> >> NIH
> >>
> >>
>

RE: Testing Solr configuration, schema, and other fields

2015-12-31 Thread Davis, Daniel (NIH/NLM) [C]

Erik, that suggests an additional approach that seems to have "legs":

* A webapp that acts as a sort of Cloud IDE for Solr configsets.   It supports 
multiple projects and a single SolrCloud cluster.   For each project, it 
upconfigs a git repository local to the webapp, and has the ability to define 
tests that run against a "temporary" collection to verify the configuration.

* A command-line utility that upconfigs the configuration a local directory, 
creates a temporary collection, and supports an optional "tests" by applying an 
update query.

Since the webapp would be based on something like the command-line utility 
(maybe in library form), I think I'm still going to target the command-line 
utility as my "minimum viable product".   I'll support SolrCloud first, and 
then see about EmbeddedSolrServer.

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Thursday, December 31, 2015 10:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Testing Solr configuration, schema, and other fields

Dan - I’m a fan of the idea of using EmbeddedSolrServer for the type of thing 
you mention, but since you’re already using SolrCloud how about simply 
upconfig’ing the configuration from the Git repo, create a temporary collection 
using that configset and smoke test it before making it ready for end 
client/customer/user use?   Maybe the configset and collection created for 
smoke testing are just temporary in order to validate it.

—
Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com 
<http://www.lucidworks.com/>

> On Dec 30, 2015, at 3:09 PM, Davis, Daniel (NIH/NLM) [C] 
> <daniel.da...@nih.gov> wrote:
> 
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
> 
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
> 
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
> 
> Any suggestions?
> 
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>

Re: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Alexandre Rafalovitch

I might be just confused here, but I am not sure what your bottle neck
actually is. You seem to know your critical path already, so how can we
help?

Starting new solr core from given configuration directory is easy. Catching
hard errors from that is probably just gripping logs or a custom logger.

And you don't seem to be talking about lint style soft sanity checks, but
rather the initialization stopping hard checks.

What is the next step you are stuck on?

Regards,
   Alex
On 31 Dec 2015 3:09 am, "Davis, Daniel (NIH/NLM) [C]" 
wrote:

> At my organization, I want to create a tool that allows users to keep a
> solr configuration as a Git repository.   Then, I want my Continuous
> Integration environment to take some branch of the git repository and
> "publish" it into ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors
> I've made, fix them, and restart.However, I want my users to be able to
> edit their own Solr schema and config *most* of the time, at least on
> development servers.They will not have command-line access to these
> servers, and I want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a
> DTD/XSD without community support; what I really want to know is whether
> Solr will start and can index some sample documents.   I'm wondering
> whether I might be able to build a tool to fire up an EmbeddedSolrServer
> and capture error messages/exceptions in a reasonable way. This tool
> could then be run by my users before they commit to git, and then again by
> the CI server before it "publishes" the configuration to
> ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor),
> Office of Computer and Communications Systems,
> National Library of Medicine, NIH
>
>

Testing Solr configuration, schema, and other fields

2015-12-30 Thread Davis, Daniel (NIH/NLM) [C]

At my organization, I want to create a tool that allows users to keep a solr 
configuration as a Git repository.   Then, I want my Continuous Integration 
environment to take some branch of the git repository and "publish" it into 
ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've 
made, fix them, and restart.However, I want my users to be able to edit 
their own Solr schema and config *most* of the time, at least on development 
servers.They will not have command-line access to these servers, and I want 
to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
without community support; what I really want to know is whether Solr will 
start and can index some sample documents.   I'm wondering whether I might be 
able to build a tool to fire up an EmbeddedSolrServer and capture error 
messages/exceptions in a reasonable way. This tool could then be run by my 
users before they commit to git, and then again by the CI server before it 
"publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor),
Office of Computer and Communications Systems,
National Library of Medicine, NIH

RE: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Davis, Daniel (NIH/NLM) [C]

Your bottom line point is that EmbeddedSolrServer is different, and some 
configurations will not work on it where they would work on a SolrCloud.   This 
is well taken.   Maybe creating a new collection on existing dev nodes could be 
done.

As far as VDI and Puppet.   My requirements are different because my 
organization is different.   I would prefer not to go into how different.   I 
have written puppet modules for other system configurations, tested them on AWS 
EC2, and yet those modules have not been adopted by my organization.


-Original Message-
From: Mark Horninger [mailto:mhornin...@grayhairsoftware.com] 
Sent: Wednesday, December 30, 2015 3:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Testing Solr configuration, schema, and other fields

Daniel,


Sounds almost like you're reinventing the wheel.  Could you possibly automate 
this through puppet or Chef?  With a VDI environment, then all you would need 
to do is build a new VM Node based on original setup.  Then you can just roll 
out the node as one of the zk nodes.

Just a thought on that subject.

v/r,

-Mark H.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
Sent: Wednesday, December 30, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Testing Solr configuration, schema, and other fields

At my organization, I want to create a tool that allows users to keep a solr 
configuration as a Git repository.   Then, I want my Continuous Integration 
environment to take some branch of the git repository and "publish" it into 
ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've 
made, fix them, and restart.However, I want my users to be able to edit 
their own Solr schema and config *most* of the time, at least on development 
servers.They will not have command-line access to these servers, and I want 
to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
without community support; what I really want to know is whether Solr will 
start and can index some sample documents.   I'm wondering whether I might be 
able to build a tool to fire up an EmbeddedSolrServer and capture error 
messages/exceptions in a reasonable way. This tool could then be run by my 
users before they commit to git, and then again by the CI server before it 
"publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and 
Communications Systems, National Library of Medicine, NIH

[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution of this information is 
prohibited, and may be punishable by law. If this was sent to you in error, 
please notify the sender by reply e-mail and destroy all copies of the original 
message.

GrayHair Software <http://www.grayhairSoftware.com>

RE: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Mark Horninger

Daniel,


Sounds almost like you're reinventing the wheel.  Could you possibly automate 
this through puppet or Chef?  With a VDI environment, then all you would need 
to do is build a new VM Node based on original setup.  Then you can just roll 
out the node as one of the zk nodes.

Just a thought on that subject.

v/r,

-Mark H.

-Original Message-
From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
Sent: Wednesday, December 30, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Testing Solr configuration, schema, and other fields

At my organization, I want to create a tool that allows users to keep a solr 
configuration as a Git repository.   Then, I want my Continuous Integration 
environment to take some branch of the git repository and "publish" it into 
ZooKeeper/SolrCloud.

Working on my own, it is only a very small pain to note foolish errors I've 
made, fix them, and restart.However, I want my users to be able to edit 
their own Solr schema and config *most* of the time, at least on development 
servers.They will not have command-line access to these servers, and I want 
to avoid endless restarts.

I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
without community support; what I really want to know is whether Solr will 
start and can index some sample documents.   I'm wondering whether I might be 
able to build a tool to fire up an EmbeddedSolrServer and capture error 
messages/exceptions in a reasonable way. This tool could then be run by my 
users before they commit to git, and then again by the CI server before it 
"publishes" the configuration to ZooKeeper/SolrCloud.

Any suggestions?

Dan Davis, Systems/Applications Architect (Contractor), Office of Computer and 
Communications Systems, National Library of Medicine, NIH

[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution of this information is 
prohibited, and may be punishable by law. If this was sent to you in error, 
please notify the sender by reply e-mail and destroy all copies of the original 
message.

GrayHair Software <http://www.grayhairSoftware.com>

RE: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Davis, Daniel (NIH/NLM) [C]

I think of enterprise search as very similar to RDBMS:

- It belongs in the backend behind your app.
- Each project ought to control its own schema and data.

So, I want the configset for each team's Solr collections to be stored in our 
Git server just as the RDBMS schema is if a developer is using a framework or a 
couple of SQL files, scripts, and a VERSION table.It ought to be that easy.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, December 30, 2015 5:37 PM
To: solr-user <solr-user@lucene.apache.org>
Subject: Re: Testing Solr configuration, schema, and other fields

Yeah, the notion of DTDs have gone around several times but always founder on 
the fact that you can, say, define your own Filter with it's own set of 
parameters etc. Sure, you can make a generic DTD that accommodates this, but 
then it becomes so general as to be little more than a syntax checker.

The managed schema stuff allows modifications of the schema via REST calls and 
there is some equivalent functionality for solrconfig.xml, but the interesting 
bit about that is that then your VCS is not the "one true source" of the 
configs, it almost goes backwards: Modify the configs in Zookeeper then check 
in to Git.
And even that doesn't really solve, say, putting default search fields in 
solrconfig.xml that do not exist in the schema file.

Frankly what I usually do when heavily editing either one is just do it on my 
local laptop, either stand alone or SolrCloud, _then_ check it in and/or test 
it on my cloud setup. So I guess the take-away is that I don't have any very 
good solution here.

Best,
Erick


On Wed, Dec 30, 2015 at 1:10 PM, Davis, Daniel (NIH/NLM) [C] 
<daniel.da...@nih.gov> wrote:
> Your bottom line point is that EmbeddedSolrServer is different, and some 
> configurations will not work on it where they would work on a SolrCloud.   
> This is well taken.   Maybe creating a new collection on existing dev nodes 
> could be done.
>
> As far as VDI and Puppet.   My requirements are different because my 
> organization is different.   I would prefer not to go into how different.   I 
> have written puppet modules for other system configurations, tested them on 
> AWS EC2, and yet those modules have not been adopted by my organization.
>
>
> -Original Message-
> From: Mark Horninger [mailto:mhornin...@grayhairsoftware.com]
> Sent: Wednesday, December 30, 2015 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Testing Solr configuration, schema, and other fields
>
> Daniel,
>
>
> Sounds almost like you're reinventing the wheel.  Could you possibly automate 
> this through puppet or Chef?  With a VDI environment, then all you would need 
> to do is build a new VM Node based on original setup.  Then you can just roll 
> out the node as one of the zk nodes.
>
> Just a thought on that subject.
>
> v/r,
>
> -Mark H.
>
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
> Sent: Wednesday, December 30, 2015 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Testing Solr configuration, schema, and other fields
>
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of 
> Computer and Communications Systems, National Library of Medicine, NIH
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or distribution of this 
> information is prohibited, and may be punishable by law. If this was sent to 
> you in error, please notify the sender by reply e-mail and destroy all copies 
> of the original message.
>
> GrayHair Software <http://www.grayhairSoftware.com>
>

Re: Testing Solr configuration, schema, and other fields

2015-12-30 Thread Erick Erickson

Yeah, the notion of DTDs have gone around several times but always founder
on the fact that you can, say, define your own Filter with it's own set of
parameters etc. Sure, you can make a generic DTD that accommodates
this, but then it becomes so general as to be little more than a syntax checker.

The managed schema stuff allows modifications of the schema via REST calls
and there is some equivalent functionality for solrconfig.xml, but the
interesting
bit about that is that then your VCS is not the "one true source" of
the configs,
it almost goes backwards: Modify the configs in Zookeeper then check in to Git.
And even that doesn't really solve, say, putting default search fields in
solrconfig.xml that do not exist in the schema file.

Frankly what I usually do when heavily editing either one is just do
it on my local
laptop, either stand alone or SolrCloud, _then_ check it in and/or test it on
my cloud setup. So I guess the take-away is that I don't have any very good
solution here.

Best,
Erick


On Wed, Dec 30, 2015 at 1:10 PM, Davis, Daniel (NIH/NLM) [C]
<daniel.da...@nih.gov> wrote:
> Your bottom line point is that EmbeddedSolrServer is different, and some 
> configurations will not work on it where they would work on a SolrCloud.   
> This is well taken.   Maybe creating a new collection on existing dev nodes 
> could be done.
>
> As far as VDI and Puppet.   My requirements are different because my 
> organization is different.   I would prefer not to go into how different.   I 
> have written puppet modules for other system configurations, tested them on 
> AWS EC2, and yet those modules have not been adopted by my organization.
>
>
> -Original Message-
> From: Mark Horninger [mailto:mhornin...@grayhairsoftware.com]
> Sent: Wednesday, December 30, 2015 3:25 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Testing Solr configuration, schema, and other fields
>
> Daniel,
>
>
> Sounds almost like you're reinventing the wheel.  Could you possibly automate 
> this through puppet or Chef?  With a VDI environment, then all you would need 
> to do is build a new VM Node based on original setup.  Then you can just roll 
> out the node as one of the zk nodes.
>
> Just a thought on that subject.
>
> v/r,
>
> -Mark H.
>
> -Original Message-
> From: Davis, Daniel (NIH/NLM) [C] [mailto:daniel.da...@nih.gov]
> Sent: Wednesday, December 30, 2015 3:10 PM
> To: solr-user@lucene.apache.org
> Subject: Testing Solr configuration, schema, and other fields
>
> At my organization, I want to create a tool that allows users to keep a solr 
> configuration as a Git repository.   Then, I want my Continuous Integration 
> environment to take some branch of the git repository and "publish" it into 
> ZooKeeper/SolrCloud.
>
> Working on my own, it is only a very small pain to note foolish errors I've 
> made, fix them, and restart.However, I want my users to be able to edit 
> their own Solr schema and config *most* of the time, at least on development 
> servers.They will not have command-line access to these servers, and I 
> want to avoid endless restarts.
>
> I'm not interested in fighting to maintain such a useless thing as a DTD/XSD 
> without community support; what I really want to know is whether Solr will 
> start and can index some sample documents.   I'm wondering whether I might be 
> able to build a tool to fire up an EmbeddedSolrServer and capture error 
> messages/exceptions in a reasonable way. This tool could then be run by 
> my users before they commit to git, and then again by the CI server before it 
> "publishes" the configuration to ZooKeeper/SolrCloud.
>
> Any suggestions?
>
> Dan Davis, Systems/Applications Architect (Contractor), Office of Computer 
> and Communications Systems, National Library of Medicine, NIH
>
> [GrayHair]
> GHS Confidentiality Notice
>
> This e-mail message, including any attachments, is for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. Any unauthorized review, use, disclosure or distribution of this 
> information is prohibited, and may be punishable by law. If this was sent to 
> you in error, please notify the sender by reply e-mail and destroy all copies 
> of the original message.
>
> GrayHair Software <http://www.grayhairSoftware.com>
>

Re: Performance testing on SOLR cloud

2015-11-18 Thread Emir Arnautovic


Hi Aswath,
It is not common to test only QPS unless it is static index most of the 
time. Usually you have to test and tune worst case scenario - max 
expected indexing rate + queries. You can get more QPS by reducing query 
latency or by increasing number of replicas. You manage latency by 
tuning Solr/JVM/queries and/or by sharding index. You first tune index 
without replication and when sure it is best single index can provide, 
you introduce replicas to achieve required throughput.


Hard part is tuning Solr. You can do it without specialized tools, but 
tools help a lot. One such tool is Sematext's SPM - 
https://sematext.com/spm/index.html where you can see all necessary 
Solr/JVM/OS metrics needed to tune Solr. It also provides QPS graph.


With index your size, unless documents are really big, you can start 
without sharding. After tuning, if not satisfied with query latency, you 
can try splitting to two shards.


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 17.11.2015 23:45, Aswath Srinivasan (TMS) wrote:

Hi fellow developers,

Please share your experience, on how you did performance testing on SOLR? What 
I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index 
a total of 2.2 million. Yet to decide how many shards and replicas to have (Any 
hint on this is welcome too, basically 'only' performance testing, so suggest 
the number of shards and replicas if you can). Ultimately, I'm trying to find 
the QPS that this SOLR cloud set up can handle.

To summarize,

1.   Find the QPS that my solr cloud set up can support

2.   Using 5.3.1 version with external zookeeper

3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents

4.   Yet to decide number of shards and replicas

5.   Not using any custom search application (performance testing for SOLR and 
not for Search portal)

Thank you

Performance testing on SOLR cloud

2015-11-17 Thread Aswath Srinivasan (TMS)

Hi fellow developers,

Please share your experience, on how you did performance testing on SOLR? What 
I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index 
a total of 2.2 million. Yet to decide how many shards and replicas to have (Any 
hint on this is welcome too, basically 'only' performance testing, so suggest 
the number of shards and replicas if you can). Ultimately, I'm trying to find 
the QPS that this SOLR cloud set up can handle.

To summarize,

1.   Find the QPS that my solr cloud set up can support

2.   Using 5.3.1 version with external zookeeper

3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents

4.   Yet to decide number of shards and replicas

5.   Not using any custom search application (performance testing for SOLR and 
not for Search portal)

Thank you

RE: Performance testing on SOLR cloud

2015-11-17 Thread Markus Jelsma

Hi - we use the Siege load testing program. It can take a seed list of URL's, 
taken from actual user input, and can put load in parallel. It won't reuse 
common queries unless you prepare your seed list appropriately. If your setup 
achieves the goal your client anticipates, then you are fine. Siege is not a 
good tool to test extreme QPS due to obvious single machine and network 
limitations.

Assuming your JVM heap settings and Solr cache settings are optimal, and your 
only question is how many shards, then increase the number of shards. 
Oversharding can be beneficial because more threads process less data. Every 
single core search is single threaded, so oversharding on the same hardware 
makes sense, and it seems to pay off.

Make sure you run multiple long stress tests and restart JVM's in between 
because a) query times and load tend to regress to the mean and b) because 
HotSpot needs to 'warm up' so short tests make less sense.

M.

 
 
-Original message-
> From:Aswath Srinivasan (TMS) <aswath.sriniva...@toyota.com>
> Sent: Tuesday 17th November 2015 23:46
> To: solr-user@lucene.apache.org
> Subject: Performance testing on SOLR cloud
> 
> Hi fellow developers,
> 
> Please share your experience, on how you did performance testing on SOLR? 
> What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM 
> and index a total of 2.2 million. Yet to decide how many shards and replicas 
> to have (Any hint on this is welcome too, basically 'only' performance 
> testing, so suggest the number of shards and replicas if you can). 
> Ultimately, I'm trying to find the QPS that this SOLR cloud set up can handle.
> 
> To summarize,
> 
> 1.   Find the QPS that my solr cloud set up can support
> 
> 2.   Using 5.3.1 version with external zookeeper
> 
> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents
> 
> 4.   Yet to decide number of shards and replicas
> 
> 5.   Not using any custom search application (performance testing for SOLR 
> and not for Search portal)
> 
> Thank you
>

Re: Performance testing on SOLR cloud

2015-11-17 Thread Erick Erickson

I wouldn't bother to shard either. YMMV of course, but 2.2M documents
is actually a pretty small number unless the docs themselves are huge.
Sharding introduces inevitable overhead, so it's usually the last
thing you resort to.

As far as the number of replicas is concerned, that's strictly a
function of what QPS you need. Let's say you do not shard and have a
query rate of 20 queries-per-second. If you need to support 100 QPS,
just add 4 more replicas, this can be done any time.

Best,
Erick

On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma
<markus.jel...@openindex.io> wrote:
> Hi - we use the Siege load testing program. It can take a seed list of URL's, 
> taken from actual user input, and can put load in parallel. It won't reuse 
> common queries unless you prepare your seed list appropriately. If your setup 
> achieves the goal your client anticipates, then you are fine. Siege is not a 
> good tool to test extreme QPS due to obvious single machine and network 
> limitations.
>
> Assuming your JVM heap settings and Solr cache settings are optimal, and your 
> only question is how many shards, then increase the number of shards. 
> Oversharding can be beneficial because more threads process less data. Every 
> single core search is single threaded, so oversharding on the same hardware 
> makes sense, and it seems to pay off.
>
> Make sure you run multiple long stress tests and restart JVM's in between 
> because a) query times and load tend to regress to the mean and b) because 
> HotSpot needs to 'warm up' so short tests make less sense.
>
> M.
>
>
>
> -Original message-
>> From:Aswath Srinivasan (TMS) <aswath.sriniva...@toyota.com>
>> Sent: Tuesday 17th November 2015 23:46
>> To: solr-user@lucene.apache.org
>> Subject: Performance testing on SOLR cloud
>>
>> Hi fellow developers,
>>
>> Please share your experience, on how you did performance testing on SOLR? 
>> What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM 
>> and index a total of 2.2 million. Yet to decide how many shards and replicas 
>> to have (Any hint on this is welcome too, basically 'only' performance 
>> testing, so suggest the number of shards and replicas if you can). 
>> Ultimately, I'm trying to find the QPS that this SOLR cloud set up can 
>> handle.
>>
>> To summarize,
>>
>> 1.   Find the QPS that my solr cloud set up can support
>>
>> 2.   Using 5.3.1 version with external zookeeper
>>
>> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million 
>> documents
>>
>> 4.   Yet to decide number of shards and replicas
>>
>> 5.   Not using any custom search application (performance testing for SOLR 
>> and not for Search portal)
>>
>> Thank you
>>

Re: Performance testing on SOLR cloud

2015-11-17 Thread Keith L

to add to Ericks point:

It's also highly dependent on the types of queries you expect (sorting,
faceting, fq, q, size of documents) and how many concurrent updates you
expect. If most queries are going to be similar and you are not going to be
updating very often, you can expect most of your index to be loaded into
page cache and lots of your queries to loaded from doc or query cache
(especially if you can optimize your fq to be similar vs using q and which
introduces scoring overhead). Adding more replicas will help distribute the
load. Adding shards will allow you to parallelize things but add some
memory and latency overhead because results still need to be merged. If
your shards are across multiple machine you now introduce network latency.
I've seen good success with using many shards in the same jvm but this is
with collections with billions of documents.

On Tue, Nov 17, 2015 at 9:07 PM Erick Erickson <erickerick...@gmail.com>
wrote:

> I wouldn't bother to shard either. YMMV of course, but 2.2M documents
> is actually a pretty small number unless the docs themselves are huge.
> Sharding introduces inevitable overhead, so it's usually the last
> thing you resort to.
>
> As far as the number of replicas is concerned, that's strictly a
> function of what QPS you need. Let's say you do not shard and have a
> query rate of 20 queries-per-second. If you need to support 100 QPS,
> just add 4 more replicas, this can be done any time.
>
> Best,
> Erick
>
> On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma
> <markus.jel...@openindex.io> wrote:
> > Hi - we use the Siege load testing program. It can take a seed list of
> URL's, taken from actual user input, and can put load in parallel. It won't
> reuse common queries unless you prepare your seed list appropriately. If
> your setup achieves the goal your client anticipates, then you are fine.
> Siege is not a good tool to test extreme QPS due to obvious single machine
> and network limitations.
> >
> > Assuming your JVM heap settings and Solr cache settings are optimal, and
> your only question is how many shards, then increase the number of shards.
> Oversharding can be beneficial because more threads process less data.
> Every single core search is single threaded, so oversharding on the same
> hardware makes sense, and it seems to pay off.
> >
> > Make sure you run multiple long stress tests and restart JVM's in
> between because a) query times and load tend to regress to the mean and b)
> because HotSpot needs to 'warm up' so short tests make less sense.
> >
> > M.
> >
> >
> >
> > -Original message-
> >> From:Aswath Srinivasan (TMS) <aswath.sriniva...@toyota.com>
> >> Sent: Tuesday 17th November 2015 23:46
> >> To: solr-user@lucene.apache.org
> >> Subject: Performance testing on SOLR cloud
> >>
> >> Hi fellow developers,
> >>
> >> Please share your experience, on how you did performance testing on
> SOLR? What I'm trying to do is have SOLR cloud on 3 Linux servers with 16
> GB RAM and index a total of 2.2 million. Yet to decide how many shards and
> replicas to have (Any hint on this is welcome too, basically 'only'
> performance testing, so suggest the number of shards and replicas if you
> can). Ultimately, I'm trying to find the QPS that this SOLR cloud set up
> can handle.
> >>
> >> To summarize,
> >>
> >> 1.   Find the QPS that my solr cloud set up can support
> >>
> >> 2.   Using 5.3.1 version with external zookeeper
> >>
> >> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million
> documents
> >>
> >> 4.   Yet to decide number of shards and replicas
> >>
> >> 5.   Not using any custom search application (performance testing for
> SOLR and not for Search portal)
> >>
> >> Thank you
> >>
>

RE: testing with EmbeddedSolrServer

2015-09-01 Thread Moen Endre

Mikhail,

The purpose of using EmbeddedSolrServer is for testing, not for running as 
main().

Is there a best practice for doing integration-testing of solr? Or of 
validating that queries to solr returns the expected result?

E.g. I have this bit of production code:
private String getStartAndStopDateIntersectsRange( Date beginDate, Date 
EndDate) {
...
  dateQuery = "( (Start_Date:[* TO "+ endDate +"] AND Stop_Date:["+beginDate+" 
TO *])"+
   " OR (Start_Date:[* TO "+ endDate +"] AND !Stop_Date:[* TO *])" +
   " OR (!Start_Date:[* TO *] AND Stop_Date:["+beginDate+" TO *]) )";
..
}

And I would like to write a test-case that only returns the records that 
intersects a given daterange.


Cheers
Endre




-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: 31. august 2015 15:02
To: solr-user
Subject: Re: testing with EmbeddedSolrServer

Endre,

As I suggested before, consider to avoid test framework, just put all code 
interacting with EmbeddedSolrServer into main() method.

On Mon, Aug 31, 2015 at 12:15 PM, Moen Endre <endre.m...@imr.no> wrote:

> Hi Mikhail,
>
> Im trying to read 7-8 xml files of data that contain realistic data 
> from our production server. Then I would like to read this data into 
> EmbeddedSolrServer to test for edge cases for our custom date search. 
> The use of EmbeddedSolrServer is purely to separate the data testing 
> from any environment that might change over time.
>
> I would also like to avoid writing plumbing-code to import each field 
> from the xml since I already have a working DIH.
>
> I tried adding synchronous=true but it doesn’t look like it makes solr 
> complete the import before doing a search.
>
> Looking at the log it doesn’t seem process the import request:
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
> o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null 
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> =firstSearcher}
> ...
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  org.apache.solr.core.CoreContainer - registering core: 
> nmdc
> 10:48:31.613
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  o.apache.solr.core.SolrCore.Request - [nmdc] 
> webapp=null
> path=/dataimport2
> params={qt=%2Fdataimport2=full-import%26clean%3Dtrue%26synchro
> nous%3Dtrue}
> status=0 QTime=1
>
> {responseHeader={status=0,QTime=1},initArgs={defaults={config=dih-conf
> ig.xml}},command=full-import=true=true,status=idle,i
> mportResponse=,statusMessages={}} 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] DEBUG o.apache.solr.core.SolrCore.Request - [nmdc] 
> webapp=null path=/select params={q=*%3A*} 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] DEBUG o.a.s.h.component.QueryComponent - process:
> q=*:*=text=10=explicit
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
> o.a.s.h.component.QueryComponent - process:
> q=static+firstSearcher+warming+in+solrconfig.xml=false=text
> =firstSearcher=10=explicit
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
> o.a.s.search.stats.LocalStatsCache - ## GET 
> {q=static+firstSearcher+warming+in+solrconfig.xml=false=tex
> t=firstSearcher=10=explicit}
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO 
> o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null 
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> =firstSearcher}
> hits=0 status=0 QTime=36
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO 
> org.apache.solr.core.SolrCore - QuerySenderListener done.
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO 
> org.apache.solr.core.SolrCore - [nmdc] Registered new searcher 
> Searcher@28be2785[nmdc] 
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> ...
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  org.apache.solr.update.SolrCoreState - Closing 
> SolrCoreState 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  o.a.solr.update.DefaultSolrCoreState - SolrCoreState 
> ref count has reached 0 - closing IndexWriter 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] INFO  o.a.solr.update.DefaultSolrCoreState - closing 
> IndexWriter with IndexWriterCloser 
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> 20DD5CE]] DEBUG o.apache.solr.update.SolrIndexWriter - Closing Writer
> DirectUpdateHandler2
>
> Cheers
> Endre
>
> -Original Message-
> From

Re: testing with EmbeddedSolrServer

2015-09-01 Thread Mikhail Khludnev

Endre,
Here is the problem. SolrTestCase4J already brings solr core/container and
a sort of server already orchestrated by a complex harness. Thus, adding
EmbeddedSS makes all things quite complicated, it's challenging to
understand which ones misbehaves. Giving that you need to debug DIH config
I can suggest you look at the short
org.apache.solr.handler.dataimport.TestNestedChildren and use it as a
sample to start from.


On Tue, Sep 1, 2015 at 11:54 AM, Moen Endre <endre.m...@imr.no> wrote:

> Mikhail,
>
> The purpose of using EmbeddedSolrServer is for testing, not for running as
> main().
>
> Is there a best practice for doing integration-testing of solr? Or of
> validating that queries to solr returns the expected result?
>
> E.g. I have this bit of production code:
> private String getStartAndStopDateIntersectsRange( Date beginDate, Date
> EndDate) {
> ...
>   dateQuery = "( (Start_Date:[* TO "+ endDate +"] AND
> Stop_Date:["+beginDate+" TO *])"+
>" OR (Start_Date:[* TO "+ endDate +"] AND !Stop_Date:[* TO *])" +
>" OR (!Start_Date:[* TO *] AND Stop_Date:["+beginDate+" TO *]) )";
> ..
> }
>
> And I would like to write a test-case that only returns the records that
> intersects a given daterange.
>
>
> Cheers
> Endre
>
>
>
>
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: 31. august 2015 15:02
> To: solr-user
> Subject: Re: testing with EmbeddedSolrServer
>
> Endre,
>
> As I suggested before, consider to avoid test framework, just put all code
> interacting with EmbeddedSolrServer into main() method.
>
> On Mon, Aug 31, 2015 at 12:15 PM, Moen Endre <endre.m...@imr.no> wrote:
>
> > Hi Mikhail,
> >
> > Im trying to read 7-8 xml files of data that contain realistic data
> > from our production server. Then I would like to read this data into
> > EmbeddedSolrServer to test for edge cases for our custom date search.
> > The use of EmbeddedSolrServer is purely to separate the data testing
> > from any environment that might change over time.
> >
> > I would also like to avoid writing plumbing-code to import each field
> > from the xml since I already have a working DIH.
> >
> > I tried adding synchronous=true but it doesn’t look like it makes solr
> > complete the import before doing a search.
> >
> > Looking at the log it doesn’t seem process the import request:
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> > o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null
> > params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> > =firstSearcher}
> > ...
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] INFO  org.apache.solr.core.CoreContainer - registering core:
> > nmdc
> > 10:48:31.613
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] INFO  o.apache.solr.core.SolrCore.Request - [nmdc]
> > webapp=null
> > path=/dataimport2
> > params={qt=%2Fdataimport2=full-import%26clean%3Dtrue%26synchro
> > nous%3Dtrue}
> > status=0 QTime=1
> >
> > {responseHeader={status=0,QTime=1},initArgs={defaults={config=dih-conf
> > ig.xml}},command=full-import=true=true,status=idle,i
> > mportResponse=,statusMessages={}}
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] DEBUG o.apache.solr.core.SolrCore.Request - [nmdc]
> > webapp=null path=/select params={q=*%3A*}
> > [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE
> > 20DD5CE]] DEBUG o.a.s.h.component.QueryComponent - process:
> > q=*:*=text=10=explicit
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> > o.a.s.h.component.QueryComponent - process:
> > q=static+firstSearcher+warming+in+solrconfig.xml=false=text
> > =firstSearcher=10=explicit
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> > o.a.s.search.stats.LocalStatsCache - ## GET
> > {q=static+firstSearcher+warming+in+solrconfig.xml=false=tex
> > t=firstSearcher=10=explicit}
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> > o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null
> > params={q=static+firstSearcher+warming+in+solrconfig.xml=false
> > =firstSearcher}
> > hits=0 status=0 QTime=36
> > [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> > org.apache.solr.core.SolrCore - QuerySenderListener done.
> > [searcherExecutor-6-thread-1-processing-{c

RE: testing with EmbeddedSolrServer

2015-08-31 Thread Moen Endre

Hi Mikhail,

Im trying to read 7-8 xml files of data that contain realistic data from our 
production server. Then I would like to read this data into EmbeddedSolrServer 
to test for edge cases for our custom date search. The use of 
EmbeddedSolrServer is purely to separate the data testing from any environment 
that might change over time.

I would also like to avoid writing plumbing-code to import each field from the 
xml since I already have a working DIH. 

I tried adding synchronous=true but it doesn’t look like it makes solr complete 
the import before doing a search. 

Looking at the log it doesn’t seem process the import request:
[searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null 
params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}

...
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
INFO  org.apache.solr.core.CoreContainer - registering core: nmdc
10:48:31.613 
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
INFO  o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null 
path=/dataimport2 
params={qt=%2Fdataimport2=full-import%26clean%3Dtrue%26synchronous%3Dtrue}
 status=0 QTime=1 
{responseHeader={status=0,QTime=1},initArgs={defaults={config=dih-config.xml}},command=full-import=true=true,status=idle,importResponse=,statusMessages={}}
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
DEBUG o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=/select 
params={q=*%3A*} 
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
DEBUG o.a.s.h.component.QueryComponent - process: 
q=*:*=text=10=explicit
[searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
o.a.s.h.component.QueryComponent - process: 
q=static+firstSearcher+warming+in+solrconfig.xml=false=text=firstSearcher=10=explicit
[searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG 
o.a.s.search.stats.LocalStatsCache - ## GET 
{q=static+firstSearcher+warming+in+solrconfig.xml=false=text=firstSearcher=10=explicit}
[searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO  
o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null 
params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
 hits=0 status=0 QTime=36 
[searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO  
org.apache.solr.core.SolrCore - QuerySenderListener done.
[searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO  
org.apache.solr.core.SolrCore - [nmdc] Registered new searcher 
Searcher@28be2785[nmdc] 
main{ExitableDirectoryReader(UninvertingDirectoryReader())}
...
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
INFO  org.apache.solr.update.SolrCoreState - Closing SolrCoreState
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
INFO  o.a.solr.update.DefaultSolrCoreState - SolrCoreState ref count has 
reached 0 - closing IndexWriter
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
INFO  o.a.solr.update.DefaultSolrCoreState - closing IndexWriter with 
IndexWriterCloser
[TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]] 
DEBUG o.apache.solr.update.SolrIndexWriter - Closing Writer DirectUpdateHandler2

Cheers
Endre

-Original Message-
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: 25. august 2015 19:43
To: solr-user
Subject: Re: testing with EmbeddedSolrServer

Hello,

I'm trying to guess what are you doing. It's not clear so far.
I found http://stackoverflow.com/questions/11951695/embedded-solr-dih
My conclusion, if you play with DIH and EmbeddedSolrServer you'd better to 
avoid the third beast, you don't need to bother with tests.
I guess that main() is over while DIH runs in background thread. You need to 
loop status command until import is over. or add synchronous=true parameter to 
full-import command it should switch to synchronous mode:
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L199

Take care

On Tue, Aug 25, 2015 at 4:41 PM, Moen Endre <endre.m...@imr.no> wrote:

> Is there an example of integration-testing with EmbeddedSolrServer 
> that loads data from a data importhandler - then queries the data? Ive 
> tried doing this based on 
> org.apache.solr.client.solrj.embedded.TestEmbeddedSolrServerConstructors.
>
> But no data is being imported.  Here is the test-class ive tried:
> https://gist.github.com/emoen/5d0a28df91c4c1127238
>
> Ive also tried writing a test by extending AbstractSolrTestCase - but 
> havnt got this working. Ive documented some of the log output here:
> http://stackoverflow.com/questions/32052642/solrcorestate-already-clos
> ed-with-unit-test-using-embeddedsolrserver-v-5-2-

Re: testing with EmbeddedSolrServer

2015-08-31 Thread Mikhail Khludnev

Endre,

As I suggested before, consider to avoid test framework, just put all code
interacting with EmbeddedSolrServer into main() method.

On Mon, Aug 31, 2015 at 12:15 PM, Moen Endre <endre.m...@imr.no> wrote:

> Hi Mikhail,
>
> Im trying to read 7-8 xml files of data that contain realistic data from
> our production server. Then I would like to read this data into
> EmbeddedSolrServer to test for edge cases for our custom date search. The
> use of EmbeddedSolrServer is purely to separate the data testing from any
> environment that might change over time.
>
> I would also like to avoid writing plumbing-code to import each field from
> the xml since I already have a working DIH.
>
> I tried adding synchronous=true but it doesn’t look like it makes solr
> complete the import before doing a search.
>
> Looking at the log it doesn’t seem process the import request:
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
> ...
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> INFO  org.apache.solr.core.CoreContainer - registering core: nmdc
> 10:48:31.613
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> INFO  o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null
> path=/dataimport2
> params={qt=%2Fdataimport2=full-import%26clean%3Dtrue%26synchronous%3Dtrue}
> status=0 QTime=1
>
> {responseHeader={status=0,QTime=1},initArgs={defaults={config=dih-config.xml}},command=full-import=true=true,status=idle,importResponse=,statusMessages={}}
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> DEBUG o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=/select
> params={q=*%3A*}
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> DEBUG o.a.s.h.component.QueryComponent - process:
> q=*:*=text=10=explicit
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> o.a.s.h.component.QueryComponent - process:
> q=static+firstSearcher+warming+in+solrconfig.xml=false=text=firstSearcher=10=explicit
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] DEBUG
> o.a.s.search.stats.LocalStatsCache - ## GET
> {q=static+firstSearcher+warming+in+solrconfig.xml=false=text=firstSearcher=10=explicit}
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> o.apache.solr.core.SolrCore.Request - [nmdc] webapp=null path=null
> params={q=static+firstSearcher+warming+in+solrconfig.xml=false=firstSearcher}
> hits=0 status=0 QTime=36
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> org.apache.solr.core.SolrCore - QuerySenderListener done.
> [searcherExecutor-6-thread-1-processing-{core=nmdc}] INFO
> org.apache.solr.core.SolrCore - [nmdc] Registered new searcher
> Searcher@28be2785[nmdc]
> main{ExitableDirectoryReader(UninvertingDirectoryReader())}
> ...
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> INFO  org.apache.solr.update.SolrCoreState - Closing SolrCoreState
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> INFO  o.a.solr.update.DefaultSolrCoreState - SolrCoreState ref count has
> reached 0 - closing IndexWriter
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> INFO  o.a.solr.update.DefaultSolrCoreState - closing IndexWriter with
> IndexWriterCloser
> [TEST-TestSolrEmbeddedServer.testNodeConfigConstructor-seed#[41C3C11DE20DD5CE]]
> DEBUG o.apache.solr.update.SolrIndexWriter - Closing Writer
> DirectUpdateHandler2
>
> Cheers
> Endre
>
> -Original Message-
> From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com]
> Sent: 25. august 2015 19:43
> To: solr-user
> Subject: Re: testing with EmbeddedSolrServer
>
> Hello,
>
> I'm trying to guess what are you doing. It's not clear so far.
> I found http://stackoverflow.com/questions/11951695/embedded-solr-dih
> My conclusion, if you play with DIH and EmbeddedSolrServer you'd better to
> avoid the third beast, you don't need to bother with tests.
> I guess that main() is over while DIH runs in background thread. You need
> to loop status command until import is over. or add synchronous=true
> parameter to full-import command it should switch to synchronous mode:
>
> https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L199
>
> Take care
>
>
> On Tue, Aug 25, 2015 at 4:41 PM, Moen Endre <endre.m...@imr.no> wrote:
>
> > Is there an example of integration-testing with EmbeddedSolrServer
&g

Re: testing with EmbeddedSolrServer

2015-08-25 Thread Mikhail Khludnev

Hello,

I'm trying to guess what are you doing. It's not clear so far.
I found http://stackoverflow.com/questions/11951695/embedded-solr-dih
My conclusion, if you play with DIH and EmbeddedSolrServer you'd better to
avoid the third beast, you don't need to bother with tests.
I guess that main() is over while DIH runs in background thread. You need
to loop status command until import is over. or add synchronous=true
parameter to full-import command it should switch to synchronous mode:
https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/dataimporthandler/src/java/org/apache/solr/handler/dataimport/DataImportHandler.java#L199

Take care


On Tue, Aug 25, 2015 at 4:41 PM, Moen Endre endre.m...@imr.no wrote:

 Is there an example of integration-testing with EmbeddedSolrServer that
 loads data from a data importhandler - then queries the data? Ive tried
 doing this based on
 org.apache.solr.client.solrj.embedded.TestEmbeddedSolrServerConstructors.

 But no data is being imported.  Here is the test-class ive tried:
 https://gist.github.com/emoen/5d0a28df91c4c1127238

 Ive also tried writing a test by extending AbstractSolrTestCase - but
 havnt got this working. Ive documented some of the log output here:
 http://stackoverflow.com/questions/32052642/solrcorestate-already-closed-with-unit-test-using-embeddedsolrserver-v-5-2-1

 Should I extend AbstractSolrTestCase or SolrTestCaseJ4 when writing tests?

 Cheers
 Endre




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com

testing with EmbeddedSolrServer

2015-08-25 Thread Moen Endre

Is there an example of integration-testing with EmbeddedSolrServer that loads 
data from a data importhandler - then queries the data? Ive tried doing this 
based on 
org.apache.solr.client.solrj.embedded.TestEmbeddedSolrServerConstructors.

But no data is being imported.  Here is the test-class ive tried: 
https://gist.github.com/emoen/5d0a28df91c4c1127238

Ive also tried writing a test by extending AbstractSolrTestCase - but havnt got 
this working. Ive documented some of the log output here: 
http://stackoverflow.com/questions/32052642/solrcorestate-already-closed-with-unit-test-using-embeddedsolrserver-v-5-2-1

Should I extend AbstractSolrTestCase or SolrTestCaseJ4 when writing tests?

Cheers
Endre

Re: weird drastic query latency during performance testing and DIH import delay after performance testing

2014-07-17 Thread Erick Erickson

This is very strange. I have no idea why DIH is
taking so long. What happens if you execute the DIH
query in some SQL front end? It's possible that DIH
is just taking forever to execute the SQL query because
of how it's written.

I'm having trouble following the query results, but again
this is super-slow. How many docs do you have in
your index? How much memory have you allocated
for your JVM? Your query is actually pretty simple, so I have
no clue whatsoever why your response times are
so slow. But this is such bad performance that something
major is wrong.

I'd approach it one problem at a time. Understand what's
happening with your queries, _then_ tackle DIH (or the other
way around)...

Best,
Erick


On Wed, Jul 16, 2014 at 2:03 AM, YouPeng Yang yypvsxf19870...@gmail.com
wrote:

 Hi
   I build my SolrCloud using Solr 4.6.0 (java version:1.7.0_45). In my
 cloud,I have a collection with 30 shard,and each shard has one replica.
 each core of the shard contains nearly  50 million docs  that is 15GB in
 size,so does the replica.
   Before applying my cloud in the real world,I do a performance test with
 JMeter 2.11.
   The scenario of the my test is simple:100 threads sending requests for 20
 seconds ,and these requests are only sent to  a specific core of a specific
 shard.the request is similar to the following :
  http://IP:port/solr/tv_201407/select?q=*:*fq=BEGINTIME:[2014-06-01
 00:00:00+TO+*]+AND+(CONTACT:${user})+AND (TV_STATE:00)shards=tv_201407
 rows=2000sort=BEGINTIME+desc.

   I encountered the drastic  query latency during performance testing and
 DIH import delay after performance testing.Please help me. I have tested
  several times and get the same problem and can not handle it by myself.Any
 suggestion will be apprecaited.

  The following steps describes what I have done .

 Step 1: Before the test,the DIH import job is very fast.As the statistics
 [1], the DIH importing takes only 1s for 10 docs.
 [1]---
 Indexing completed. Added/Updated: 10 documents. Deleted 0 documents.
 (Duration: 01s)
 Requests: 1 (1/s), Fetched: 10 (10/s), Skipped: 0, Processed: 10 (10/s)
 Started: less than a minute ago
 ---

 Step 2:  Then ,Doing the test under the caches are cleaned. The summery
 statistics data is as [2]. Although I have clean the caches,I never think
 the query latency becomes so drastic that it cannot be acceptable in my
 real application.
   The red font describes the latency of the query performance test on the
 core tv_201407 of the shard tv_201407 .

   So would you experts can give some hints about the drastic  query latency
 ?

 [2]---
 [solr@solr2 test]$ ../bin/jmeter.sh  -n -t solrCoudKala20140401.jmx  -l
 logfile_solrCloud_20.jtl
 Creating summariser aggregate
 Created the tree successfully using solrCoudKala20140401.jmx
 Starting the test @ Wed Jul 16 15:59:28 CST 2014 (1405497568104)
 Waiting for possible shutdown message on port 4445
 aggregate +  1 in   8.1s =0.1/s Avg:  8070 Min:  8070 Max:  8070
 Err:
 0 (0.00%) Active: 100 Started: 100 Finished: 0
 aggregate +103 in  13.4s =7.7/s Avg:  8027 Min:  4191 Max:  8434
 Err:
 0 (0.00%) Active: 97 Started: 100 Finished: 3
 aggregate =104 in  13.4s =7.7/s Avg:  8027 Min:  4191 Max:  8434
 Err:
 0 (0.00%)
 aggregate + 96 in 7s =   14.5/s Avg:  6160 Min:  5295 Max:  6625
 Err:
 0 (0.00%) Active: 0 Started: 100 Finished: 100
 aggregate =200 in15s =   13.6/s Avg:  7131 Min:  4191 Max:  8434
 Err:
 0 (0.00%)
 Tidying up ...@ Wed Jul 16 15:59:43 CST 2014 (1405497583461)
 ... end of run
 [solr@solr2 test]$
 ---
 Step 3:To be continued,after the test,I do the DIH importing job again
 using  the same import expresion.However the performance of the DIH becomes
 so unacceptable.
 to import  the 10 docs takes 2 m 15 s [3]!
   Having noticing that ,solr can fetched the 10 docs fast,the processing is
 slow.

 [3]---
 *Indexing completed. Added/Updated: 10 documents. Deleted 0 documents.
 (Duration: 2m 15s)*
 Requests: 1 (0/s), Fetched: 10 (0/s), Skipped: 0, Processed: 10 (0/s)
 Started: about an hour ago
 ---

  By the way. jvm gc goes normal,and there is no long full gc during the
 test. the load of my system(rhel 6.5) are also normal.

 Regards

weird drastic query latency during performance testing and DIH import delay after performance testing

2014-07-16 Thread YouPeng Yang

Hi
  I build my SolrCloud using Solr 4.6.0 (java version:1.7.0_45). In my
cloud,I have a collection with 30 shard,and each shard has one replica.
each core of the shard contains nearly  50 million docs  that is 15GB in
size,so does the replica.
  Before applying my cloud in the real world,I do a performance test with
JMeter 2.11.
  The scenario of the my test is simple:100 threads sending requests for 20
seconds ,and these requests are only sent to  a specific core of a specific
shard.the request is similar to the following :
 http://IP:port/solr/tv_201407/select?q=*:*fq=BEGINTIME:[2014-06-01
00:00:00+TO+*]+AND+(CONTACT:${user})+AND (TV_STATE:00)shards=tv_201407
rows=2000sort=BEGINTIME+desc.

  I encountered the drastic  query latency during performance testing and
DIH import delay after performance testing.Please help me. I have tested
 several times and get the same problem and can not handle it by myself.Any
suggestion will be apprecaited.

 The following steps describes what I have done .

Step 1: Before the test,the DIH import job is very fast.As the statistics
[1], the DIH importing takes only 1s for 10 docs.
[1]---
Indexing completed. Added/Updated: 10 documents. Deleted 0 documents.
(Duration: 01s)
Requests: 1 (1/s), Fetched: 10 (10/s), Skipped: 0, Processed: 10 (10/s)
Started: less than a minute ago
---

Step 2:  Then ,Doing the test under the caches are cleaned. The summery
statistics data is as [2]. Although I have clean the caches,I never think
the query latency becomes so drastic that it cannot be acceptable in my
real application.
  The red font describes the latency of the query performance test on the
core tv_201407 of the shard tv_201407 .

  So would you experts can give some hints about the drastic  query latency
?

[2]---
[solr@solr2 test]$ ../bin/jmeter.sh  -n -t solrCoudKala20140401.jmx  -l
logfile_solrCloud_20.jtl
Creating summariser aggregate
Created the tree successfully using solrCoudKala20140401.jmx
Starting the test @ Wed Jul 16 15:59:28 CST 2014 (1405497568104)
Waiting for possible shutdown message on port 4445
aggregate +  1 in   8.1s =0.1/s Avg:  8070 Min:  8070 Max:  8070 Err:
0 (0.00%) Active: 100 Started: 100 Finished: 0
aggregate +103 in  13.4s =7.7/s Avg:  8027 Min:  4191 Max:  8434 Err:
0 (0.00%) Active: 97 Started: 100 Finished: 3
aggregate =104 in  13.4s =7.7/s Avg:  8027 Min:  4191 Max:  8434 Err:
0 (0.00%)
aggregate + 96 in 7s =   14.5/s Avg:  6160 Min:  5295 Max:  6625 Err:
0 (0.00%) Active: 0 Started: 100 Finished: 100
aggregate =200 in15s =   13.6/s Avg:  7131 Min:  4191 Max:  8434 Err:
0 (0.00%)
Tidying up ...@ Wed Jul 16 15:59:43 CST 2014 (1405497583461)
... end of run
[solr@solr2 test]$
---
Step 3:To be continued,after the test,I do the DIH importing job again
using  the same import expresion.However the performance of the DIH becomes
so unacceptable.
to import  the 10 docs takes 2 m 15 s [3]!
  Having noticing that ,solr can fetched the 10 docs fast,the processing is
slow.

[3]---
*Indexing completed. Added/Updated: 10 documents. Deleted 0 documents.
(Duration: 2m 15s)*
Requests: 1 (0/s), Fetched: 10 (0/s), Skipped: 0, Processed: 10 (0/s)
Started: about an hour ago
---

 By the way. jvm gc goes normal,and there is no long full gc during the
test. the load of my system(rhel 6.5) are also normal.

Regards

Independent/Selfcontained Solr Unit testing with JUnit

2014-05-13 Thread Vijay Balakrishnan

Hi,

Is there any way to run self-contained JUnit tests for say a Solr dependent
class where it doesn't depend on Solr being up and running at
localhost:8983 ? I have a collection etc. setup on the Solr server.

Is it possible to mockit with an EmbeddedSolr easily  with a @Before or
@BeforeClass annotation in JUnit4 ?

Any pointers to examples would be awesome(I am also trying to look in the
source).

 TIA,

Vijay

Re: Independent/Selfcontained Solr Unit testing with JUnit

2014-05-13 Thread Shawn Heisey

On 5/13/2014 12:46 PM, Vijay Balakrishnan wrote:
 Is there any way to run self-contained JUnit tests for say a Solr dependent
 class where it doesn't depend on Solr being up and running at
 localhost:8983 ? I have a collection etc. setup on the Solr server.
 
 Is it possible to mockit with an EmbeddedSolr easily  with a @Before or
 @BeforeClass annotation in JUnit4 ?
 
 Any pointers to examples would be awesome(I am also trying to look in the
 source).

An example of a Solr unit test that fires up Jetty (actually, more than
one instance of Jetty) before testing is located here in the source
download or checkout:

solr/solrj/src/test/org/apache/solr/client/solrj/TestLBHttpSolrServer.java

Thanks,
Shawn

Re: Solr Load Testing Issues

2014-02-17 Thread Annette Newton

Sorry I didn't make myself clear.  I have 20 machines in the configuration,
each shard/replica is on it's own machine.


On 14 February 2014 19:44, Shawn Heisey s...@elyograg.org wrote:

 On 2/14/2014 5:28 AM, Annette Newton wrote:
  Solr Version: 4.3.1
  Number Shards: 10
  Replicas: 1
  Heap size: 15GB
  Machine RAM: 30GB
  Zookeeper timeout: 45 seconds
 
  We are continuing the fight to keep our solr setup functioning.  As a
  result of this we have made significant changes to our schema to reduce
 the
  amount of data we write.  I setup a new cluster to reindex our data,
  initially I ran the import with no replicas, and achieved quite
 impressive
  results.  Our peak was 60,000 new documents per minute, no shard loses,
 no
  outages due to garbage collection (which is an issue we see in
 production),
  at the end of the load the index stood at 97,000,000 documents and 20GB
 per
  shard.  During the highest insertion rate I would say that querying
  suffered, but that is not of concern right now.

 Solr 4.3.1 has a number of problems when it comes to large clouds.
 Upgrading to 4.6.1 would be strongly advisable, but that's only
 something to try after looking into the rest of what I have to say.

 If I read what you've written correctly, you are running all this on one
 machine.  To put it bluntly, this isn't going to work well unless you
 put a LOT more memory into that machine.

 For good performance, Solr relies on the OS disk cache, because reading
 from the disk is VERY expensive in terms of time.  The OS will
 automatically use RAM that's not being used for other purposes for the
 disk cache, so that it can avoid reading off the disk as much as possible.

 http://wiki.apache.org/solr/SolrPerformanceProblems

 Below is a summary of what that Wiki page says, with your numbers as I
 understand them.  If I am misunderstanding your numbers, then this
 advice may need adjustment.  Note that when I see one replica I take
 that to mean replicationFactor=1, so there is only one copy of the
 index.  If you actually mean that you have *two* copies, then you have
 twice as much data as I've indicated below, and your requirements will
 be even larger:

 With ten shards that are each 20GB in size, your total index size is
 200GB.  With 15 GB of heap, your ideal memory size for that server would
 be 215GB -- the 15GB heap plus enough extra to fit the entire 200GB
 index into RAM.

 In reality you probably don't need that much, but it's likely that you
 would need at least half the index to fit into RAM at any one moment,
 which adds up to 115GB.  If you're prepared to deal with
 moderate-to-severe performance problems, you **MIGHT** be able to get
 away with only 25% of the index fitting into RAM, which still requires
 65GB of RAM, but with SolrCloud, such performance problems usually mean
 that the cloud won't be stable, so it's not advisable to even try it.

 One of the bits of advice on the wiki page is to split your index into
 shards and put it on more machines, which drops the memory requirements
 for each machine.  You're already using a multi-shard SolrCloud, so you
 probably just need more hardware.  If you had one 20GB shard on a
 machine with 30GB of RAM, you could probably use a heap size of 4-8GB
 per machine and have plenty of RAM left over to cache the index very
 well.  You could most likely add another 50% to the index size and still
 be OK.

 Thanks,
 Shawn




-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com http://www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by the 
addressee. The contents should not be disclosed to any other person or 
copies taken unless authorised to do so. If you are not the intended 
recipient, please notify the sender and permanently delete this message. As 
Internet communications are not secure ServiceTick accepts neither legal 
responsibility for the contents of this message nor responsibility for any 
change made to this message after it was forwarded by the original author.*

Solr Load Testing Issues

2014-02-14 Thread Annette Newton

Solr Version: 4.3.1
Number Shards: 10
Replicas: 1
Heap size: 15GB
Machine RAM: 30GB
Zookeeper timeout: 45 seconds

We are continuing the fight to keep our solr setup functioning.  As a
result of this we have made significant changes to our schema to reduce the
amount of data we write.  I setup a new cluster to reindex our data,
initially I ran the import with no replicas, and achieved quite impressive
results.  Our peak was 60,000 new documents per minute, no shard loses, no
outages due to garbage collection (which is an issue we see in production),
at the end of the load the index stood at 97,000,000 documents and 20GB per
shard.  During the highest insertion rate I would say that querying
suffered, but that is not of concern right now.

I have now added in 1 replica for each shard, indexing time has doubled -
not surprising - and as it was so good to start with not a problem.  I
continue to just write to the leaders and the issue is that that replicas
are continually going into recovery.

The leaders show:


ERROR - 2014-02-14 11:47:45.757; org.apache.solr.common.SolrException;
shard update error StdNode:
http://10.35.133.176:8983/solr/sessionfilterset/:org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at:
http://10.35.133.176:8983/solr/sessionfilterset
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.NoHttpResponseException: The target server
failed to respond
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
... 11 more

The replica is not busy garbage collecting, as it doesn't coincide with a
full gc and the collection times are low.  The replica appears to be
accepting adds milliseconds before this appears in the log:

INFO  - 2014-02-14 11:59:54.366;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover

I have reduced the load down to 5,000 documents per minute and they appear
to only stay up for a couple of minutes, I would like to be confident that
we could handle more than this during our peak times.

Initially I was getting connection reset errors on the leaders, but I
changed the jetty connector to the nio one and now the above message is
what I have received.  I have also upped the header request and response
sizes.

Any ideas - other than not using replicas as proposed by a colleague?

Thanks very much in advance.


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com http://www.sessioncam.com*

-- 
*This message is confidential and is intended to be read solely by

Re: Solr Load Testing Issues

2014-02-14 Thread Shawn Heisey

On 2/14/2014 5:28 AM, Annette Newton wrote:
 Solr Version: 4.3.1
 Number Shards: 10
 Replicas: 1
 Heap size: 15GB
 Machine RAM: 30GB
 Zookeeper timeout: 45 seconds
 
 We are continuing the fight to keep our solr setup functioning.  As a
 result of this we have made significant changes to our schema to reduce the
 amount of data we write.  I setup a new cluster to reindex our data,
 initially I ran the import with no replicas, and achieved quite impressive
 results.  Our peak was 60,000 new documents per minute, no shard loses, no
 outages due to garbage collection (which is an issue we see in production),
 at the end of the load the index stood at 97,000,000 documents and 20GB per
 shard.  During the highest insertion rate I would say that querying
 suffered, but that is not of concern right now.

Solr 4.3.1 has a number of problems when it comes to large clouds.
Upgrading to 4.6.1 would be strongly advisable, but that's only
something to try after looking into the rest of what I have to say.

If I read what you've written correctly, you are running all this on one
machine.  To put it bluntly, this isn't going to work well unless you
put a LOT more memory into that machine.

For good performance, Solr relies on the OS disk cache, because reading
from the disk is VERY expensive in terms of time.  The OS will
automatically use RAM that's not being used for other purposes for the
disk cache, so that it can avoid reading off the disk as much as possible.

http://wiki.apache.org/solr/SolrPerformanceProblems

Below is a summary of what that Wiki page says, with your numbers as I
understand them.  If I am misunderstanding your numbers, then this
advice may need adjustment.  Note that when I see one replica I take
that to mean replicationFactor=1, so there is only one copy of the
index.  If you actually mean that you have *two* copies, then you have
twice as much data as I've indicated below, and your requirements will
be even larger:

With ten shards that are each 20GB in size, your total index size is
200GB.  With 15 GB of heap, your ideal memory size for that server would
be 215GB -- the 15GB heap plus enough extra to fit the entire 200GB
index into RAM.

In reality you probably don't need that much, but it's likely that you
would need at least half the index to fit into RAM at any one moment,
which adds up to 115GB.  If you're prepared to deal with
moderate-to-severe performance problems, you **MIGHT** be able to get
away with only 25% of the index fitting into RAM, which still requires
65GB of RAM, but with SolrCloud, such performance problems usually mean
that the cloud won't be stable, so it's not advisable to even try it.

One of the bits of advice on the wiki page is to split your index into
shards and put it on more machines, which drops the memory requirements
for each machine.  You're already using a multi-shard SolrCloud, so you
probably just need more hardware.  If you had one 20GB shard on a
machine with 30GB of RAM, you could probably use a heap size of 4-8GB
per machine and have plenty of RAM left over to cache the index very
well.  You could most likely add another 50% to the index size and still
be OK.

Thanks,
Shawn

Unit testing custom update request processor

2014-01-07 Thread Jorge Luis Betancourt Gonzalez

Happy new year!

I’ve developed some custom update request processors to accomplish some custom 
logic needed in some user cases. I’m trying to write test for this processor, 
but I’d like to test in a very similar way of how the built in processors are 
tested in the solr source code. Is there any advice on how accomplish this or 
some experience that someone more experienced could share?

Greetings!


III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 
2014. Ver www.uci.cu

How to decide proper cache size at load testing?

2013-05-02 Thread Furkan KAMACI

I read that at wiki:

Sometimes a smaller cache size will help avoid full garbage collections at
the cost of more evictions. Load testing should be used to help determine
proper cache sizes throughout the searching/indexing lifecycle.

Could anybody give me an example scenario of how can I make a test, what
should I do and find a proper cache size at load testing?

Re: How to decide proper cache size at load testing?

2013-05-02 Thread Otis Gospodnetic

You simply need to monitor and adjust. Both during testing and in
production because search patterns change over time. Hook up alerting to it
to get notified of high evictions and low cache hit rate so you don't have
to actively look at stats all day.

Here is the graph of Query Cache metrics for http://search-lucene.com/ for
example:

https://apps.sematext.com/spm-reports/s.do?k=eDcirzHG7i


Otis
Solr  ElasticSearch Support
http://sematext.com/

On May 2, 2013 5:14 PM, Furkan KAMACI furkankam...@gmail.com wrote:

 I read that at wiki:

 Sometimes a smaller cache size will help avoid full garbage collections at
 the cost of more evictions. Load testing should be used to help determine
 proper cache sizes throughout the searching/indexing lifecycle.

 Could anybody give me an example scenario of how can I make a test, what
 should I do and find a proper cache size at load testing?

Re: Multi-core and replicated Solr cloud testing. Data-directory mis-configures

2013-03-25 Thread Trevor Campbell

That example does not work if you have  1 collection (core) per node, all
end up sharing the same index and overwrite one another.


On Mon, Mar 25, 2013 at 6:27 PM, Gopal Patwa gopalpa...@gmail.com wrote:

 if you use default directory then it will use solr.home directory, I have
 tested solr cloud example on local machine with 5-6 nodes.And data
 directory was created under core name, like

 example2/solr/collection1/data. you could see example startup script from
 source code solr/cloud-dev/solrcloud-multi-start.sh

 example solrconfig.xml

   dataDir${solr.data.dir:}/dataDir

 On Sun, Mar 24, 2013 at 10:44 PM, Trevor Campbell
 tcampb...@atlassian.comwrote:

  I have three indexes which I have set up as three separate cores, using
  this solr.xml config.
 
cores adminPath=/admin/cores host=${host:}
  hostPort=${jetty.port:}
  core name=jira-issue instanceDir=jira-issue 
 property name=dataDir value=jira-issue/data/ /
  /core
  core name=jira-comment instanceDir=jira-comment 
 property name=dataDir value=jira-comment/data/ /
  /core
  core name=jira-change-history instanceDir=jira-change-**history
 
 property name=dataDir value=jira-change-history/**data/ /
  /core
/cores
 
  This works just fine a standalone solr.
 
  I duplicated this setup on the same machine under a completely separate
  solr installation (solr-nodeb) and modified all the data directroies to
  point to the direstories in nodeb.  This all worked fine.
 
  I then connected the 2 instances together with zoo-keeper using settings
  -Dbootstrap_conf=true -Dcollection.configName=**jiraCluster -DzkRun
  -DnumShards=1 for the first intsance and -DzkHost=localhost:9080 for
   the second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr
  instances)
 
  Now the data directories of the second node point to the data directories
  in the first node.
 
  I have tried many settings in the solrconfig.xml for each core but am now
  using absolute paths, e.g.
  dataDir/home//solr-**4.2.0-nodeb/example/multicore/**
  jira-comment/data/dataDir
 
  previously I used
  ${solr.jira-comment.data.dir:/**home/tcampbell/solr-4.2.0-**
  nodeb/example/multicore/jira-**comment/data}
  but that had the same result.
 
  It seems zookeeper is forcing data directory config from the uploaded
  configuration on all the nodes in the cluster?
 
  How can I do testing on a single machine? Do I really need identical
  directory layouts on all machines?

Re: Multi-core and replicated Solr cloud testing. Data-directory mis-configures

2013-03-25 Thread Trevor Campbell

Solved.

I was able to solve this by removing any reference to dataDir from the
solrconfig.xml.  So in solr.xml for each node I have:
  cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:}
core name=jira-issue instanceDir=jira-issue 
   property name=dataDir value=jira-issue/data/ /
/core
core name=jira-comment instanceDir=jira-comment 
   property name=dataDir value=jira-comment/data/ /
/core
core name=jira-change-history instanceDir=jira-change-history 
   property name=dataDir value=jira-change-history/data/ /
/core
  /cores

and in solrconfig.xml in each core I have removed the reference to dataDir
completely.
!-- dataDir${solr.core0.data.dir:}/dataDir --



On Tue, Mar 26, 2013 at 8:41 AM, Trevor Campbell tcampb...@atlassian.comwrote:

 That example does not work if you have  1 collection (core) per node, all
 end up sharing the same index and overwrite one another.


 On Mon, Mar 25, 2013 at 6:27 PM, Gopal Patwa gopalpa...@gmail.com wrote:

 if you use default directory then it will use solr.home directory, I have
 tested solr cloud example on local machine with 5-6 nodes.And data
 directory was created under core name, like

 example2/solr/collection1/data. you could see example startup script
 from
 source code solr/cloud-dev/solrcloud-multi-start.sh

 example solrconfig.xml

   dataDir${solr.data.dir:}/dataDir

 On Sun, Mar 24, 2013 at 10:44 PM, Trevor Campbell
 tcampb...@atlassian.comwrote:

  I have three indexes which I have set up as three separate cores, using
  this solr.xml config.
 
cores adminPath=/admin/cores host=${host:}
  hostPort=${jetty.port:}
  core name=jira-issue instanceDir=jira-issue 
 property name=dataDir value=jira-issue/data/ /
  /core
  core name=jira-comment instanceDir=jira-comment 
 property name=dataDir value=jira-comment/data/ /
  /core
  core name=jira-change-history
 instanceDir=jira-change-**history 
 property name=dataDir value=jira-change-history/**data/ /
  /core
/cores
 
  This works just fine a standalone solr.
 
  I duplicated this setup on the same machine under a completely separate
  solr installation (solr-nodeb) and modified all the data directroies to
  point to the direstories in nodeb.  This all worked fine.
 
  I then connected the 2 instances together with zoo-keeper using settings
  -Dbootstrap_conf=true -Dcollection.configName=**jiraCluster -DzkRun
  -DnumShards=1 for the first intsance and -DzkHost=localhost:9080 for
   the second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr
  instances)
 
  Now the data directories of the second node point to the data
 directories
  in the first node.
 
  I have tried many settings in the solrconfig.xml for each core but am
 now
  using absolute paths, e.g.
  dataDir/home//solr-**4.2.0-nodeb/example/multicore/**
  jira-comment/data/dataDir
 
  previously I used
  ${solr.jira-comment.data.dir:/**home/tcampbell/solr-4.2.0-**
  nodeb/example/multicore/jira-**comment/data}
  but that had the same result.
 
  It seems zookeeper is forcing data directory config from the uploaded
  configuration on all the nodes in the cluster?
 
  How can I do testing on a single machine? Do I really need identical
  directory layouts on all machines?

Multi-core and replicated Solr cloud testing. Data-directory mis-configures

2013-03-24 Thread Trevor Campbell


I have three indexes which I have set up as three separate cores, using this 
solr.xml config.

  cores adminPath=/admin/cores host=${host:} hostPort=${jetty.port:}
core name=jira-issue instanceDir=jira-issue 
   property name=dataDir value=jira-issue/data/ /
/core
core name=jira-comment instanceDir=jira-comment 
   property name=dataDir value=jira-comment/data/ /
/core
core name=jira-change-history instanceDir=jira-change-history 
   property name=dataDir value=jira-change-history/data/ /
/core
  /cores

This works just fine a standalone solr.

I duplicated this setup on the same machine under a completely separate solr installation (solr-nodeb) and modified all 
the data directroies to point to the direstories in nodeb.  This all worked fine.


I then connected the 2 instances together with zoo-keeper using settings -Dbootstrap_conf=true 
-Dcollection.configName=jiraCluster -DzkRun -DnumShards=1 for the first intsance and -DzkHost=localhost:9080 for  the 
second. (I'm using tomcat and ports 8080 and 8081 for the 2 Solr instances)


Now the data directories of the second node point to the data directories in 
the first node.

I have tried many settings in the solrconfig.xml for each core but am now using 
absolute paths, e.g.
dataDir/home//solr-4.2.0-nodeb/example/multicore/jira-comment/data/dataDir

previously I used
${solr.jira-comment.data.dir:/home/tcampbell/solr-4.2.0-nodeb/example/multicore/jira-comment/data}
but that had the same result.

It seems zookeeper is forcing data directory config from the uploaded 
configuration on all the nodes in the cluster?

How can I do testing on a single machine? Do I really need identical directory 
layouts on all machines?

Re: What to expect when testing Japanese search index

2013-03-23 Thread Hayden Muhl

A search for a single character will only return hits if that character
makes up a whole word, and only if the tokenizer recognizes that character
as a word. It's just like in other languages, where a search for p won't
return documents with the word apple.

If I were you, I would go into the Solr admin UI and start playing around
with the analysis tool. You can paste a phrase in there and it will show
you what tokens that phrase will be broken into. I think that will give you
a better understanding of why you are getting these search results.

You also don't mention which version of Solr you are using. Can you also
include the definition of your text_ja field type?

- Hayden


On Thu, Mar 21, 2013 at 7:01 AM, Van Tassell, Kristian 
kristian.vantass...@siemens.com wrote:

 I’m trying to set up our search index to handle Japanese data, and while
 some searches yield results, others do not. This is especially true the
 smaller the search term.

 For example, searching for this term: 更

 Yields no results even though I know it appears in the text. I understand
 that this character alone may not be a full word without further context,
 and thus, perhaps it should not return a hit(?).

 What about putting a star after it? 更*

 Should that return hits? I had been using the text_ja boilerplate setup,
 but wonder if a bigram (text_cjk) may work better for my non-Japanese
 speaking testing phase. Thanks in advance for any insight!

What to expect when testing Japanese search index

2013-03-21 Thread Van Tassell, Kristian

I’m trying to set up our search index to handle Japanese data, and while some 
searches yield results, others do not. This is especially true the smaller the 
search term.

For example, searching for this term: 更

Yields no results even though I know it appears in the text. I understand that 
this character alone may not be a full word without further context, and thus, 
perhaps it should not return a hit(?).

What about putting a star after it? 更*

Should that return hits? I had been using the text_ja boilerplate setup, but 
wonder if a bigram (text_cjk) may work better for my non-Japanese speaking 
testing phase. Thanks in advance for any insight!

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-15 Thread Chantal Ackermann

Hi,

@Lance - thanks, it's a pleasure to give something back to the community. Even 
if it is comparatively small. :-)

@Paul - it's definitly not 15 min but rather 2 min. Actually, the testing part 
of this setup is very regular compared to other Maven projects. The copying of 
the WAR file and repackaging is not that time consuming. (This is still Maven - 
widely used and proven - it wouldn't be if it was not practical?)


Cheers,
Chantal

Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Chantal Ackermann

Hi all,


this is not a question. I just wanted to announce that I've written a blog post 
on how to set up Maven for packaging and automatic testing of a SOLR index 
configuration.

http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/

Feedback or comments appreciated!
And again, thanks for that great piece of software.

Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread David Philip

Informative. Useful.Thanks


On Thu, Mar 14, 2013 at 1:59 PM, Chantal Ackermann 
c.ackerm...@it-agenten.com wrote:

 Hi all,


 this is not a question. I just wanted to announce that I've written a blog
 post on how to set up Maven for packaging and automatic testing of a SOLR
 index configuration.


 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/

 Feedback or comments appreciated!
 And again, thanks for that great piece of software.

 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht

Nice,

Chantal can you indicate there or here what kind of speed for integration tests 
you've reached with this, from a bare source to a successfully tested 
application?
(e.g. with 100 documents)

thanks in advance

Paul


On 14 mars 2013, at 09:29, Chantal Ackermann wrote:

 Hi all,
 
 
 this is not a question. I just wanted to announce that I've written a blog 
 post on how to set up Maven for packaging and automatic testing of a SOLR 
 index configuration.
 
 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/
 
 Feedback or comments appreciated!
 And again, thanks for that great piece of software.
 
 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Chantal Ackermann

Hi Paul,

I'm sorry I cannot provide you with any numbers. I also doubt it would be wise 
to post any as I think the speed depends highly on what you are doing in your 
integration tests.

Say you have several request handlers that you want to test (on different 
cores), and some more complex use cases like using output from one request 
handler as input to others. You would also import test data that would be 
representative enough to test these request handlers and use cases.

The requests themselves, of course, only take as long as SolrJ takes to run and 
SOLR takes to answer them.
In addition, there is the overhead of Maven starting up, running all the 
plugins, importing the data, executing the tests. Well, Maven is certainly not 
the fastest tool to start up and get going…

If you are asking because you want to run rather a lot requests and test their 
output - JMeter might be preferrable?

Hope that was not too vague an answer,
Chantal


Am 14.03.2013 um 09:51 schrieb Paul Libbrecht:

 Nice,
 
 Chantal can you indicate there or here what kind of speed for integration 
 tests you've reached with this, from a bare source to a successfully tested 
 application?
 (e.g. with 100 documents)
 
 thanks in advance
 
 Paul
 
 
 On 14 mars 2013, at 09:29, Chantal Ackermann wrote:
 
 Hi all,
 
 
 this is not a question. I just wanted to announce that I've written a blog 
 post on how to set up Maven for packaging and automatic testing of a SOLR 
 index configuration.
 
 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/
 
 Feedback or comments appreciated!
 And again, thanks for that great piece of software.
 
 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Paul Libbrecht

Chantal,

the goal is different: get a general feeling how practical it is to integrate 
this in the routine.
If you are able, on your contemporary machine which I assume is not a 
supercomputer of some special sort, to run this whole process somewhat useful 
for you in about 2 minutes then I'll be very interested.

If, like quite many things where maven starts and integration is measured from 
all facets, it takes more than 15 minutes to run this process, once useful, 
then I will be less motivated.

I'm not asking for performance measurement and certainly not for that of solr 
which I trust largely and depends a lot on good caching. Yes, for this, jMeter 
or others are useful.

Paul


On 14 mars 2013, at 12:20, Chantal Ackermann wrote:

 Hi Paul,
 
 I'm sorry I cannot provide you with any numbers. I also doubt it would be 
 wise to post any as I think the speed depends highly on what you are doing in 
 your integration tests.
 
 Say you have several request handlers that you want to test (on different 
 cores), and some more complex use cases like using output from one request 
 handler as input to others. You would also import test data that would be 
 representative enough to test these request handlers and use cases.
 
 The requests themselves, of course, only take as long as SolrJ takes to run 
 and SOLR takes to answer them.
 In addition, there is the overhead of Maven starting up, running all the 
 plugins, importing the data, executing the tests. Well, Maven is certainly 
 not the fastest tool to start up and get going…
 
 If you are asking because you want to run rather a lot requests and test 
 their output - JMeter might be preferrable?
 
 Hope that was not too vague an answer,
 Chantal
 
 
 Am 14.03.2013 um 09:51 schrieb Paul Libbrecht:
 
 Nice,
 
 Chantal can you indicate there or here what kind of speed for integration 
 tests you've reached with this, from a bare source to a successfully tested 
 application?
 (e.g. with 100 documents)
 
 thanks in advance
 
 Paul
 
 
 On 14 mars 2013, at 09:29, Chantal Ackermann wrote:
 
 Hi all,
 
 
 this is not a question. I just wanted to announce that I've written a blog 
 post on how to set up Maven for packaging and automatic testing of a SOLR 
 index configuration.
 
 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/
 
 Feedback or comments appreciated!
 And again, thanks for that great piece of software.
 
 Chantal

Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread Lance Norskog

Wow! That's great. And it's a lot of work, especially getting it all 
keyboard-complete. Thank you.


On 03/14/2013 01:29 AM, Chantal Ackermann wrote:

Hi all,


this is not a question. I just wanted to announce that I've written a blog post 
on how to set up Maven for packaging and automatic testing of a SOLR index 
configuration.

http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/

Feedback or comments appreciated!
And again, thanks for that great piece of software.

Chantal

Please ignore, testing my email

2013-02-27 Thread johnmunir

Hi,


Please ignore, I'm testing my email (I have not received any email from Solr 
mailing list for over 12 hours now).


-- MJ

Re: what do you use for testing relevance?

2013-02-13 Thread Amit Nithian

Ultimately this is dependent on what your metrics for success are. For some
places it may be just raw CTR (did my click through rate increase) but for
other places it may be a function of money (either it may be gross revenue,
profits, # items sold etc). I don't know if there is a generic answer for
this question which is leading those to write their own frameworks b/c it's
very specific to your needs. A scoring change that leads to an increase in
CTR may not necessarily lead to an increase in the metric that makes your
business go.


On Tue, Feb 12, 2013 at 10:31 PM, Steffen Elberg Godskesen 
steffen.godske...@gmail.com wrote:


 Hi Roman,

 If you're looking for regression testing then
 https://github.com/sul-dlss/rspec-solr might be worth looking at. If
 you're not a ruby shop, doing something similar in another language
 shouldn't be to hard.


 The basic idea is that you setup a set of tests like

 If the query is X, then the document with id Y should be in the first 10
 results
 If the query is S, then a document with title T should be the first
 result
 If the query is P, then a document with author Q should not be in the
 first 10 result

 and that you run these whenever you tune your scoring formula to ensure
 that you haven't introduced unintended effects. New ideas/requirements for
 your relevance ranking should always result in writing new tests - that
 will probably fail until you tune your scoring formula. This is certainly
 no magic bullet, but it will give you some confidence that you didn't make
 things worse. And - in my humble opinion - it also gives you the benefit of
 discouraging you from tuning your scoring just for fun. To put it bluntly:
 if you cannot write up a requirement in form of a test, you probably have
 no need to tune your scoring.


 Regards,

 --
 Steffen



 On Tuesday, February 12, 2013 at 23:03 , Roman Chyla wrote:

  Hi,
  I do realize this is a very broad question, but still I need to ask it.
  Suppose you make a change into the scoring formula. How do you
  test/know/see what impact it had? Any framework out there?
 
  It seems like people are writing their own tools to measure relevancy.
 
  Thanks for any pointers,
 
  roman

Re: what do you use for testing relevance?

2013-02-13 Thread Roman Chyla

All,

Thank you for your comments and links, I will explore them.

I think that many people are facing similar questions - when they tune
their search engines. Especially in Solr/Lucene community. While the
requirements will be different, ultimately it is what they can do w
lucene/solr that guides such efforts. As an example, let me use this

https://github.com/romanchyla/r-ranking-fun/blob/master/plots/raw/test-plot-showing-factors.pdf?raw=true

The graph shows you the effect of different values of qf parameter. This
usecase is probably very common, so somebody already had probably done st
similar

In the real world, I would like to: 1) change something, 2) collect
(clicks) data 3) apply statistical test (of my choice) to see if changes
had the effect (be it worse or better) and see if that change is
statistically significant. But do we have to write these tools from scratch
again?

All your comments are very valuable and useful. But I am still wondering if
there are more tools one could use to tune the search. More comments
welcome!

Thank you!

roman

On Wed, Feb 13, 2013 at 1:04 PM, Amit Nithian anith...@gmail.com wrote:

Ultimately this is dependent on what your metrics for success are. For some
places it may be just raw CTR (did my click through rate increase) but for
other places it may be a function of money (either it may be gross revenue,
profits, # items sold etc). I don't know if there is a generic answer for
this question which is leading those to write their own frameworks b/c it's
very specific to your needs. A scoring change that leads to an increase in
CTR may not necessarily lead to an increase in the metric that makes your
business go.

On Tue, Feb 12, 2013 at 10:31 PM, Steffen Elberg Godskesen
steffen.godske...@gmail.com wrote:

Hi Roman,

If you're looking for regression testing then
https://github.com/sul-dlss/rspec-solr might be worth looking at. If
you're not a ruby shop, doing something similar in another language
shouldn't be to hard.

The basic idea is that you setup a set of tests like

If the query is X, then the document with id Y should be in the first 10
results
If the query is S, then a document with title T should be the first
result
If the query is P, then a document with author Q should not be in the
first 10 result

and that you run these whenever you tune your scoring formula to ensure
that you haven't introduced unintended effects. New ideas/requirements
for
your relevance ranking should always result in writing new tests - that
will probably fail until you tune your scoring formula. This is certainly
no magic bullet, but it will give you some confidence that you didn't
make
things worse. And - in my humble opinion - it also gives you the benefit
of
discouraging you from tuning your scoring just for fun. To put it
bluntly:
if you cannot write up a requirement in form of a test, you probably have
no need to tune your scoring.

Regards,

--
Steffen

On Tuesday, February 12, 2013 at 23:03 , Roman Chyla wrote:

Hi,
I do realize this is a very broad question, but still I need to ask it.
Suppose you make a change into the scoring formula. How do you
test/know/see what impact it had? Any framework out there?

It seems like people are writing their own tools to measure relevancy.

Thanks for any pointers,

roman

RE: what do you use for testing relevance?

2013-02-12 Thread Markus Jelsma

Roman,

Logging clicks and their position in the result list is one useful method to 
measure the relevance. Using the position you can calculate the mean reciprocal 
rank, a value near 1.0 is very good so over time you can clearly see whether 
changes actually improve user experience/expectations. Keep in mind that there 
is some noise because users tend to click one or more of the first few results 
anyway. 

You may also be interested in A/B testing.

http://en.wikipedia.org/wiki/Mean_reciprocal_rank
http://en.wikipedia.org/wiki/A/B_testing

Cheers
Markus
 
 
-Original message-
 From:Roman Chyla roman.ch...@gmail.com
 Sent: Tue 12-Feb-2013 23:04
 To: solr-user@lucene.apache.org
 Subject: what do you use for testing relevance?
 
 Hi,
 I do realize this is a very broad question, but still I need to ask it.
 Suppose you make a change into the scoring formula. How do you
 test/know/see what impact it had? Any framework out there?
 
 It seems like people are writing their own tools to measure relevancy.
 
 Thanks for any pointers,
 
   roman

Re: what do you use for testing relevance?

2013-02-12 Thread Sebastian Saip

What do you want to achieve with these tests?

Is it meant as a regression, to make sure that only the queries/boosts you
changed are affected?
Then you will have to implement tests that cover your specific
schema/boosts. I'm not aware of any frameworks that do this - we're using
Java based tests that retrieve documents from solr,  map them to our domain
model (objects representing a document) and do assertions on debug values
(e.g. score)

Or is it more about whats more relevant for the user? Then you will need
some kind of user tracking, as Markus described already.

BR


On 12 February 2013 23:16, Markus Jelsma markus.jel...@openindex.io wrote:

 Roman,

 Logging clicks and their position in the result list is one useful method
 to measure the relevance. Using the position you can calculate the mean
 reciprocal rank, a value near 1.0 is very good so over time you can clearly
 see whether changes actually improve user experience/expectations. Keep in
 mind that there is some noise because users tend to click one or more of
 the first few results anyway.

 You may also be interested in A/B testing.

 http://en.wikipedia.org/wiki/Mean_reciprocal_rank
 http://en.wikipedia.org/wiki/A/B_testing

 Cheers
 Markus


 -Original message-
  From:Roman Chyla roman.ch...@gmail.com
  Sent: Tue 12-Feb-2013 23:04
  To: solr-user@lucene.apache.org
  Subject: what do you use for testing relevance?
 
  Hi,
  I do realize this is a very broad question, but still I need to ask it.
  Suppose you make a change into the scoring formula. How do you
  test/know/see what impact it had? Any framework out there?
 
  It seems like people are writing their own tools to measure relevancy.
 
  Thanks for any pointers,
 
roman

Re: what do you use for testing relevance?

2013-02-12 Thread Otis Gospodnetic

Hi Roman,

We use our own Search Analytics service. It's free and open to anyone - see
http://sematext.com/search-analytics/index.html

And this post talks exactly about the topic you are asking about:
http://blog.sematext.com/2012/01/06/relevance-tuning-and-competitive-advantage-via-search-analytics

It includes a screenshot with MRR (Mean Reciprocal Rank) that Markus
mentioned.

Otis
Solr  ElasticSearch Support
http://sematext.com/


On Feb 12, 2013 5:04 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi,
 I do realize this is a very broad question, but still I need to ask it.
 Suppose you make a change into the scoring formula. How do you
 test/know/see what impact it had? Any framework out there?

 It seems like people are writing their own tools to measure relevancy.

 Thanks for any pointers,

   roman

Re: what do you use for testing relevance?

2013-02-12 Thread Steffen Elberg Godskesen


Hi Roman,

If you're looking for regression testing then 
https://github.com/sul-dlss/rspec-solr might be worth looking at. If you're not 
a ruby shop, doing something similar in another language shouldn't be to hard.
 

The basic idea is that you setup a set of tests like

If the query is X, then the document with id Y should be in the first 10 
results
If the query is S, then a document with title T should be the first result
If the query is P, then a document with author Q should not be in the first 10 
result

and that you run these whenever you tune your scoring formula to ensure that 
you haven't introduced unintended effects. New ideas/requirements for your 
relevance ranking should always result in writing new tests - that will 
probably fail until you tune your scoring formula. This is certainly no magic 
bullet, but it will give you some confidence that you didn't make things worse. 
And - in my humble opinion - it also gives you the benefit of discouraging you 
from tuning your scoring just for fun. To put it bluntly: if you cannot write 
up a requirement in form of a test, you probably have no need to tune your 
scoring.


Regards,

-- 
Steffen



On Tuesday, February 12, 2013 at 23:03 , Roman Chyla wrote:

 Hi,
 I do realize this is a very broad question, but still I need to ask it.
 Suppose you make a change into the scoring formula. How do you
 test/know/see what impact it had? Any framework out there?
 
 It seems like people are writing their own tools to measure relevancy.
 
 Thanks for any pointers,
 
 roman

Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing

2013-01-24 Thread Ted Merchant

We recently updated from Solr 4.0.0 to Solr 4.1.0.  Because of the change we 
were forced to upgrade a custom query parser.  While the code change itself was 
minimal, we found that our unit tests stopped working because of a 
NullPointerException on line 181 of handler.component.SearchHandler:
ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();
We determined that the cause of this exception was that shardHandlerFactory was 
never initialized in the solr container.  The reason for this seems to be that 
the shard handler is setup in core.CoreContainer::initShardHandler which is 
called from core.CoreContainer::load.
When setting up the core container we were using the  public 
CoreContainer(SolrResourceLoader loader) constructor.  This constructor never 
calls the load method, so initShardHandler is never called and the shardHandler 
is never initialized.

In Solr 4.0.0 the shardHandler was initialized on the calling of 
getShardHandlerFactory.  This code was modified and moved by revision 1422728: 
SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 
environments.

We fixed our issue by using the public CoreContainer(String dir, File 
configFile) constructor which calls the load method.
I just wanted to make sure that people were aware of this issue and to 
determine if it really is an issue or if having the shardHandler be null was 
expected behavior unless someone called the load(String dir, File configFile ) 
method.

Thank you,

Ted



Stack trace of error:
org.apache.solr.client.solrj.SolrServerException: 
org.apache.solr.client.solrj.SolrServerException: java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
at 
com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at 
org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at 
org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.client.solrj.SolrServerException: 
java.lang.NullPointerException
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:155)
... 27 more
Caused by: java.lang.NullPointerException
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)
at

Re: Solr 4.1.0 shardHandlerFactory Null Pointer Exception when setting up embedded solrj solr server for unit testing

2013-01-24 Thread Mark Miller

This is my fault - I discovered this myself a few days ago. I've been meaning 
to file a jira ticket and have not gotten around to it yet.

You can also work around it like this:

CoreContainer container = new CoreContainer(loader) {
  // workaround since we don't call container#load
  {initShardHandler(null);}
};

- Mark

On Jan 24, 2013, at 9:22 AM, Ted Merchant ted.merch...@cision.com wrote:

 We recently updated from Solr 4.0.0 to Solr 4.1.0.  Because of the change we 
 were forced to upgrade a custom query parser.  While the code change itself 
 was minimal, we found that our unit tests stopped working because of a 
 NullPointerException on line 181 of handler.component.SearchHandler:
 ShardHandler shardHandler1 = shardHandlerFactory.getShardHandler();
 We determined that the cause of this exception was that shardHandlerFactory 
 was never initialized in the solr container.  The reason for this seems to be 
 that the shard handler is setup in core.CoreContainer::initShardHandler which 
 is called from core.CoreContainer::load. 
 When setting up the core container we were using the  public 
 CoreContainer(SolrResourceLoader loader) constructor.  This constructor never 
 calls the load method, so initShardHandler is never called and the 
 shardHandler is never initialized. 
 In Solr 4.0.0 the shardHandler was initialized on the calling of 
 getShardHandlerFactory.  This code was modified and moved by revision 
 1422728: SOLR-4204: Make SolrCloud tests more friendly to FreeBSD blackhole 2 
 environments.
  
 We fixed our issue by using the public CoreContainer(String dir, File 
 configFile) constructor which calls the load method.
 I just wanted to make sure that people were aware of this issue and to 
 determine if it really is an issue or if having the shardHandler be null was 
 expected behavior unless someone called the load(String dir, File configFile 
 ) method.
  
 Thank you,
  
 Ted
  
  
  
 Stack trace of error:
 org.apache.solr.client.solrj.SolrServerException: 
 org.apache.solr.client.solrj.SolrServerException: 
 java.lang.NullPointerException
 at 
 org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:223)
 at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
 at 
 org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:301)
 at 
 com.cision.search.solr.ProximityQParserTest.testInit(ProximityQParserTest.java:72)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
 Source)
 at java.lang.reflect.Method.invoke(Unknown Source)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
 at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
 at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
 at 
 org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
 at 
 org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
 at 
 org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
 at 
 org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
 at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
 at 
 org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
 Caused by: org.apache.solr.client.solrj.SolrServerException: 
 java.lang.NullPointerException
 at

Re: stress testing Solr 4.x

2012-12-10 Thread Alain Rogister

Hi Mark,

Usually I was stopping them with ctrl-c but several times, one of the
servers was hung and had to be stopped with kill -9.

Thanks,

Alain

On Mon, Dec 10, 2012 at 5:09 AM, Mark Miller markrmil...@gmail.com wrote:

 Hmmm...EOF on the segments file is odd...

 How were you killing the nodes? Just stopping them or kill -9 or what?

 - Mark

 On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com
 wrote:
  Hi,
 
  I have re-ran my tests today after I updated Solr 4.1 to apply the patch.
 
  First, the good news : it works i.e. if I stop all three Solr servers and
  then restart one, it will try to find the other two for a while (about 3
  minutes I think) then give up, become the leader and start processing
  requests.
 
  Now, the not-so-good : I encountered several exceptions that seem to
  indicate 2 other issues. Here are the relevant bits.
 
  1) The ZK session expiry problem : not sure what caused it but I did a
 few
  Solr or ZK node restarts while the system was under load.
 
  SEVERE: There was a problem finding the leader in
  zk:org.apache.solr.common.SolrException: Could not get leader props
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
  at
 
 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
  at
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
  at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
  at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
  at
 
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
  at
 
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
  at
 
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
  at
 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
  Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for
 /collections/adressage/leaders/shard1
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
  at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
  at
 org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
  at
 org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
  ... 10 more
  SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
  KeeperErrorCode = Session expired for /overseer/queue/qn-
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
  at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
  at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
  at
 
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
  at
 
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
  at
 org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
  at
 org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
  at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
  at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
  at
 
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
  at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
  at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
  at
 
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
  at
 
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
  at
 
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
  at
 
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 
  2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
  to start due to the exceptions below and both servers went into a
 seemingly
  endless loop of exponential retries. The fix was to stop both faulty
  servers, remove the data directory of this core and restart : replication
  then took place correctly. As above, not sure what exactly caused this to
  happen; no updates were taking place, only searches.
 
  On server 1 :
 
  INFO: Closing

Re: stress testing Solr 4.x

2012-12-09 Thread Mark Miller

Hmmm...EOF on the segments file is odd...

How were you killing the nodes? Just stopping them or kill -9 or what?

- Mark

On Sun, Dec 9, 2012 at 1:37 PM, Alain Rogister alain.rogis...@gmail.com wrote:
 Hi,

 I have re-ran my tests today after I updated Solr 4.1 to apply the patch.

 First, the good news : it works i.e. if I stop all three Solr servers and
 then restart one, it will try to find the other two for a while (about 3
 minutes I think) then give up, become the leader and start processing
 requests.

 Now, the not-so-good : I encountered several exceptions that seem to
 indicate 2 other issues. Here are the relevant bits.

 1) The ZK session expiry problem : not sure what caused it but I did a few
 Solr or ZK node restarts while the system was under load.

 SEVERE: There was a problem finding the leader in
 zk:org.apache.solr.common.SolrException: Could not get leader props
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:732)
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:696)
 at
 org.apache.solr.cloud.ZkController.waitForLeaderToSeeDownState(ZkController.java:1095)
 at
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:265)
 at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
 at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
 at
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
 at
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
 at
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
 at
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
 Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:
 KeeperErrorCode = Session expired for /collections/adressage/leaders/shard1
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
 at
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
 at
 org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
 at
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
 at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
 at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:710)
 ... 10 more
 SEVERE: :org.apache.zookeeper.KeeperException$SessionExpiredException:
 KeeperErrorCode = Session expired for /overseer/queue/qn-
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
 at
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:210)
 at
 org.apache.solr.common.cloud.SolrZkClient$5.execute(SolrZkClient.java:207)
 at
 org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
 at org.apache.solr.common.cloud.SolrZkClient.create(SolrZkClient.java:207)
 at org.apache.solr.cloud.DistributedQueue.offer(DistributedQueue.java:229)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:824)
 at org.apache.solr.cloud.ZkController.publish(ZkController.java:797)
 at
 org.apache.solr.cloud.ZkController.registerAllCoresAsDown(ZkController.java:258)
 at org.apache.solr.cloud.ZkController.access$100(ZkController.java:84)
 at org.apache.solr.cloud.ZkController$1.command(ZkController.java:184)
 at
 org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:116)
 at
 org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
 at
 org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:90)
 at
 org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
 at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)

 2) Data corruption of 1 core on 2 out of 3 Solr servers. This core failed
 to start due to the exceptions below and both servers went into a seemingly
 endless loop of exponential retries. The fix was to stop both faulty
 servers, remove the data directory of this core and restart : replication
 then took place correctly. As above, not sure what exactly caused this to
 happen; no updates were taking place, only searches.

 On server 1 :

 INFO: Closing
 directory:/Users/arogister/Dev/apache-solr-4.1-branch/solr/forem/solr/formabanque/data/index.20121209152525785
 Dec 09, 2012 3:25:25 PM org.apache.solr.common.SolrException log
 SEVERE: SnapPull failed :org.apache.solr.common.SolrException: Index fetch
 failed :
 at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:400)
 at

Re: stress testing Solr 4.x

2012-12-08 Thread Mark Miller

Hmm…I've tried to replicate what looked like a bug from your report (3 Solr 
servers stop/start ), but on 5x it works no problem for me. It shouldn't be any 
different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it won't 
work currently. Cores won't be able to register with ZooKeeper and will fail 
loading. It would probably be nicer to come up in search only mode and keep 
trying to reconnect to zookeeper - file a JIRA issue if you are interested.

On the zk data dir, see 
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

 Hey, I'll try and answer this tomorrow.
 
 There is a def an unreported bug in there that needs to be fixed for the 
 restarting the all nodes case.
 
 Also, a 404 one is generally when jetty is starting or stopping - there are 
 points where 404's can be returned. I'm not sure why else you'd see one. 
 Generally we do retries when that happens.
 
 - Mark
 
 On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com wrote:
 
 I am reporting the results of my stress tests against Solr 4.x. As I was
 getting many error conditions with 4.0, I switched to the 4.1 trunk in the
 hope that some of the issues would be fixed already. Here is my setup :
 
 - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
 this is not representative of a production environment but it's a fine way
 to find out what happens under resource-constrained conditions.
 - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
 MB of data)
 - single shard
 - 3 Zookeeper instances
 - HAProxy load balancing requests across Solr servers
 - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
 each, sending search requests continuously (no updates)
 
 In nominal conditions, it all works fine i.e. it can process a million
 requests, maxing out the CPUs at all time, without experiencing nasty
 failures. There are errors in the logs about replication failures though;
 they should be benigne in this case as no updates are taking place but it's
 hard to tell what is going on exactly. Example :
 
 Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
 exception talking to
 http://192.168.0.101:8985/solr/adressage/, failed
 org.apache.solr.common.SolrException: Server at
 http://192.168.0.101:8985/solr/adressage returned non ok status:404,
 message:Not Found
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 
 Then I simulated various failure scenarios :
 
 - 1 Solr server stop/start
 - 2 Solr servers stop/start
 - 3 Solr servers stop/start : it seems that in this case, the Solr servers
 *cannot* be restarted : more exactly, the restarted server will consider
 that it is number 1 out of 4 and wait for the other 3 to come up. The only
 way out is to stop it again, then stop all Zookeeper instances *and* clean
 up their zkdata directory, start them, then start the Solr servers.
 
 I noticed that these zkdata directory had grown to 200 MB after a while.
 What exactly is in there besides the configuration data ? Does it stop
 growing ?
 
 Then I tried this :
 
 - kill 1 Zookeeper process
 - kill 2 Zookeeper processes
 - stop/start 1 Solr server
 
 When doing this, I experienced (many times) situations where the Solr
 servers could not reconnect and threw scary exceptions. The only way out
 was to restart the whole cluster.
 
 Q : when, if ever, is one supposed to clean up the zkdata directories ?
 
 Here are the errors I found in the logs. It seems that some of them have
 been reported in JIRA but 4.1-trunk seems to experience basically the same
 issues as 4.0 in my test scenarios.
 
 Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
 couldn't connect to
 http://192.168.0.101:8984/solr/cachede/, counting as success
 Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log

Re: stress testing Solr 4.x

2012-12-08 Thread Mark Miller

After some more playing around on 5x I have duplicated the issue. I'll file a
JIRA issue for you and fix it shortly.

- Mark

On Dec 8, 2012, at 8:43 AM, Mark Miller markrmil...@gmail.com wrote:

Hmm…I've tried to replicate what looked like a bug from your report (3 Solr
servers stop/start ), but on 5x it works no problem for me. It shouldn't be
any different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it won't
work currently. Cores won't be able to register with ZooKeeper and will fail
loading. It would probably be nicer to come up in search only mode and keep
trying to reconnect to zookeeper - file a JIRA issue if you are interested.

On the zk data dir, see
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for the
restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there are
points where 404's can be returned. I'm not sure why else you'd see one.
Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com wrote:

I am reporting the results of my stress tests against Solr 4.x. As I was
getting many error conditions with 4.0, I switched to the 4.1 trunk in the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
this is not representative of a production environment but it's a fine way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures though;
they should be benigne in this case as no updates are taking place but it's
hard to tell what is going on exactly. Example :

Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
exception talking to
http://192.168.0.101:8985/solr/adressage/, failed
org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/adressage returned non ok status:404,
message:Not Found
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Then I simulated various failure scenarios :

- 1 Solr server stop/start
- 2 Solr servers stop/start
- 3 Solr servers stop/start : it seems that in this case, the Solr servers
*cannot* be restarted : more exactly, the restarted server will consider
that it is number 1 out of 4 and wait for the other 3 to come up. The only
way out is to stop it again, then stop all Zookeeper instances *and* clean
up their zkdata directory, start them, then start the Solr servers.

I noticed that these zkdata directory had grown to 200 MB after a while.
What exactly is in there besides the configuration data ? Does it stop
growing ?

Then I tried this :

- kill 1 Zookeeper process
- kill 2 Zookeeper processes
- stop/start 1 Solr server

When doing this, I experienced (many times) situations where the Solr
servers could not reconnect and threw scary exceptions. The only way out
was to restart the whole cluster.

Q : when, if ever, is one supposed to clean up the zkdata directories ?

Here are the errors I found in the logs. It seems that some of them have
been reported in JIRA but 4.1-trunk seems to experience basically the same
issues as 4.0 in my test scenarios.

Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse

Re: stress testing Solr 4.x

2012-12-08 Thread Alain Rogister

Great, thanks Mark ! I'll test the fix and post my results.

Alain

On Saturday, December 8, 2012, Mark Miller wrote:

After some more playing around on 5x I have duplicated the issue. I'll
file a JIRA issue for you and fix it shortly.

- Mark

On Dec 8, 2012, at 8:43 AM, Mark Miller markrmil...@gmail.com wrote:

Hmm…I've tried to replicate what looked like a bug from your report (3
Solr servers stop/start ), but on 5x it works no problem for me. It
shouldn't be any different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it
won't work currently. Cores won't be able to register with ZooKeeper and
will fail loading. It would probably be nicer to come up in search only
mode and keep trying to reconnect to zookeeper - file a JIRA issue if you
are interested.

On the zk data dir, see
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for
the restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there
are points where 404's can be returned. I'm not sure why else you'd see
one. Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com
wrote:

I am reporting the results of my stress tests against Solr 4.x. As I
was
getting many error conditions with 4.0, I switched to the 4.1 trunk in
the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
realize
this is not representative of a production environment but it's a fine
way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one
has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20
threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures
though;
they should be benigne in this case as no updates are taking place but
it's
hard to tell what is going on exactly. Example :

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.

Re: stress testing Solr 4.x

2012-12-08 Thread Mark Miller

No problem!

Here is the JIRA issue: https://issues.apache.org/jira/browse/SOLR-4158

- Mark

On Sat, Dec 8, 2012 at 6:03 PM, Alain Rogister alain.rogis...@gmail.com wrote:
Great, thanks Mark ! I'll test the fix and post my results.

Alain

On Saturday, December 8, 2012, Mark Miller wrote:

After some more playing around on 5x I have duplicated the issue. I'll
file a JIRA issue for you and fix it shortly.

- Mark

On Dec 8, 2012, at 8:43 AM, Mark Miller markrmil...@gmail.com wrote:

Hmm…I've tried to replicate what looked like a bug from your report (3
Solr servers stop/start ), but on 5x it works no problem for me. It
shouldn't be any different on 4x, but I'll try that next.

In terms of starting up Solr without a working ZooKeeper ensemble - it
won't work currently. Cores won't be able to register with ZooKeeper and
will fail loading. It would probably be nicer to come up in search only
mode and keep trying to reconnect to zookeeper - file a JIRA issue if you
are interested.

On the zk data dir, see
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#Ongoing+Data+Directory+Cleanup

- Mark

On Dec 7, 2012, at 10:22 PM, Mark Miller markrmil...@gmail.com wrote:

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for
the restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there
are points where 404's can be returned. I'm not sure why else you'd see
one. Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com
wrote:

I am reporting the results of my stress tests against Solr 4.x. As I
was
getting many error conditions with 4.0, I switched to the 4.1 trunk in
the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I
realize
this is not representative of a production environment but it's a fine
way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one
has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20
threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures
though;
they should be benigne in this case as no updates are taking place but
it's
hard to tell what is going on exactly. Example :

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at

org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at

org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at

--
- Mark

Re: stress testing Solr 4.x

2012-12-07 Thread Mark Miller

Hey, I'll try and answer this tomorrow.

There is a def an unreported bug in there that needs to be fixed for the 
restarting the all nodes case.

Also, a 404 one is generally when jetty is starting or stopping - there are 
points where 404's can be returned. I'm not sure why else you'd see one. 
Generally we do retries when that happens.

- Mark

On Dec 7, 2012, at 1:07 PM, Alain Rogister alain.rogis...@gmail.com wrote:

 I am reporting the results of my stress tests against Solr 4.x. As I was
 getting many error conditions with 4.0, I switched to the 4.1 trunk in the
 hope that some of the issues would be fixed already. Here is my setup :
 
 - Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
 this is not representative of a production environment but it's a fine way
 to find out what happens under resource-constrained conditions.
 - 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
 MB of data)
 - single shard
 - 3 Zookeeper instances
 - HAProxy load balancing requests across Solr servers
 - JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
 each, sending search requests continuously (no updates)
 
 In nominal conditions, it all works fine i.e. it can process a million
 requests, maxing out the CPUs at all time, without experiencing nasty
 failures. There are errors in the logs about replication failures though;
 they should be benigne in this case as no updates are taking place but it's
 hard to tell what is going on exactly. Example :
 
 Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
 exception talking to
 http://192.168.0.101:8985/solr/adressage/, failed
 org.apache.solr.common.SolrException: Server at
 http://192.168.0.101:8985/solr/adressage returned non ok status:404,
 message:Not Found
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
 at
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
 at java.util.concurrent.FutureTask.run(FutureTask.java:166)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 
 Then I simulated various failure scenarios :
 
 - 1 Solr server stop/start
 - 2 Solr servers stop/start
 - 3 Solr servers stop/start : it seems that in this case, the Solr servers
 *cannot* be restarted : more exactly, the restarted server will consider
 that it is number 1 out of 4 and wait for the other 3 to come up. The only
 way out is to stop it again, then stop all Zookeeper instances *and* clean
 up their zkdata directory, start them, then start the Solr servers.
 
 I noticed that these zkdata directory had grown to 200 MB after a while.
 What exactly is in there besides the configuration data ? Does it stop
 growing ?
 
 Then I tried this :
 
 - kill 1 Zookeeper process
 - kill 2 Zookeeper processes
 - stop/start 1 Solr server
 
 When doing this, I experienced (many times) situations where the Solr
 servers could not reconnect and threw scary exceptions. The only way out
 was to restart the whole cluster.
 
 Q : when, if ever, is one supposed to clean up the zkdata directories ?
 
 Here are the errors I found in the logs. It seems that some of them have
 been reported in JIRA but 4.1-trunk seems to experience basically the same
 issues as 4.0 in my test scenarios.
 
 Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
 WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
 couldn't connect to
 http://192.168.0.101:8984/solr/cachede/, counting as success
 Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
 SEVERE: Sync request error:
 org.apache.solr.client.solrj.SolrServerException: Server refused connection
 at: http://192.168.0.101:8984/solr/cachede
 Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
 SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
 to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
 connection at: http://192.168.0.101:8984/solr
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
 at
 org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
 at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
 at

Re: Testing Solr Cloud with ZooKeeper

2012-11-13 Thread darul

https://issues.apache.org/jira/browse/SOLR-3993 has been resolved.

Just few question, is it in trunk, I mean in main distrib downloadable on
main solr site.

Because I have downloaded it and get still same behaviour while running
first instance..or second shards.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020118.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-13 Thread darul

Looks like after timeout has finished, first solr instance respond



I was not waiting enough. Is it possible to reduce this *timeout* value ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4020190.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread Erick Erickson

you have to have at least one node per shard running for SolrCloud to
function. So when you bring down all nodes and start one, then you have
some shards with no live nodes and SolrCloud goes into a wait state.

Best
Erick


On Thu, Nov 8, 2012 at 6:17 PM, darul daru...@gmail.com wrote:

 Is it same issue as one detailed here

 http://lucene.472066.n3.nabble.com/SolrCloud-leader-election-on-single-node-td4015804.html



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019183.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread darul

- Shards : 2
- ZooKeeper Cluster : 3
- One collection.

Here is how I run it and my scenario case:

In first console, I get first Node (first Shard) running on port 8983:





In second console, I get second Node (second Shard) running on port 8984:





Here I get just 2 nodes for my 2 shards running.

The I decide to add 2 replicates for each shard node.


and


Now everything is fine, a robust collection with 2 shards, 2 replicates
running. 

Result expected is here:

http://lucene.472066.n3.nabble.com/file/n4019257/Solr_Admin_192.168.1.6_.png 

Then, I decide to stop the 2 last predicates running on port 7501/7502.

Results expected is here:
http://lucene.472066.n3.nabble.com/file/n4019257/2.png 

Then I now stop the 2 main instances running on port 8983/8983.

Restart the first one 8983:

I get a lot of this dump in console:


Why not, I start second one running on 8984, and get 



I do not understand why replicates are needed at this phase...first when I
started the first time, no need for replicates. And now, I would like
restart 2 main instances, and maybe start replicates later.

If I start both instances 7501/7502, everything is fine but not what I was
expected.

Any ideas,

Thanks again,

Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019257.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread ku3ia

Hi, I have near the same problems with cloud state
see
http://lucene.472066.n3.nabble.com/Replicated-zookeeper-td4018984.html



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019264.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-09 Thread darul

Yes ku3ia, I read your thread yesterday and looks like we get same issue. I
wish Apache Con is nearly finished and expert can resolve this 
Thanks again to solr community,
Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019271.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul

Hello again,

With the following config :

- 2 zookeeper ensemble
- 2 shards
- 2 main solr instances for the 2 shards
- I added 2, 3 replicates for fun.

While running and I stop one replicate, I see in admin ui graph updates
(replicate disabled/inactivated)...normal.

But if I stopped all solr instance and restart the first main instance
:8983, I always get it waiting for some replicates...is it useful ? Why
replicate are needed to run ? Can not access to admin anymore. 

Solution is to erase zookeeper data and start again, do you have any
solutions to avoid :



What if my replicates are really down in production and I restart everything
?

Another question, 2 shards means 2 zookeeper ensemble, 3 shards, 3 zookeeper
ensemble ?

Thanks,

Jul



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019028.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul

Thanks Otis, 

Indeed here too  zoo doc
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_zkMulitServerSetup
 
, they advise to choose odd number of zk nodes this way To create a
deployment that can tolerate the failure of F machines, you should count on
deploying 2xF+1 machines...

Well, I just do not yet understand why after using replicate, I am not able
to restart solr instances if replicates are not running. (When I start them,
it is ok)

Do I need to erase all zookeeper config every time solr servers are
restarted...I mean send the conf again with bootstrap, looks like I am not
doing the right way ;)





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019102.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-08 Thread darul

Too illustrate:

http://lucene.472066.n3.nabble.com/file/n4019103/SolrAdmin.png 

Taking this example, 8983 and 8984 are Shard owner, 7501/7502 just
replicates.

If I stop all instance, then restart 8983 or 8984 first, they won't run and
asked for replicates too be started...




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4019103.html
Sent from the Solr - User mailing list archive at Nabble.com.

Testing Solr Cloud with ZooKeeper

2012-11-07 Thread darul

Hello everyone,

Having used *Hadoop* (not in charge of deployment, just java code part) and
*Solr 3.6* (deployment and coding) this year, today I made the solr cloud
wiki.

Well,

* I have deployed 2 zookeeper (not embedded) instances
* 2 solr instances with 2 shards (pointing to zookeeper nodes)
* 2 solr replicates

successfully ...thank you for new administration ui, graph and co,
nice.

But I am still confused with all these new amazing features. (compared to
when I was using multicore and master/slave behaviour).

Here in cloud, I am lost (in translation too)

*Few questions:*
- my both zookeeper have their own data directory, as usual, but I did not
see so much change inside after indexing examples docs. Are data stored
their or just /configuration (conf files) /is stored in zookeeper ensemble ?
Can you confirmed /index data/ are also stored in zookeeper cluster ? Or not
?
- In my solr instances directory tree, /solr/mycollection/ sometimes I have
an index or index.20121107185908378 directory and tlog directory, what
is it used for, could you explain me why index directory sometimes looks
like a snapshot ? zookeeper should not store index, sorry I repeat myself,
or is it just a snaphot. what is tlog directory for ?
- Then, playing a little bit, I test following command
http://localhost:8983/solr/admin/collections?action=CREATEname=mynamenumShards=2replicationFactor=1
and see it update configuration of core.xml and create data directory as
well, nice. But when I navigate to admin ui and check schema for instance,
where does this configuration come from ? I do not get any conf directory
for this core, does it take one by default

I have so much questions to ask.

Thanks,

Julien

--
View this message in context:
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread darul

I reply to myself :


darul wrote
 
*
 Few questions:
*
 - my both zookeeper have their own data directory, as usual, but I did not
 see so much change inside after indexing examples docs. Are data stored
 their or just 
/
 configuration (conf files) 
/
 is stored in zookeeper ensemble ? Can you confirmed 
/
 index data
/
  are also stored in zookeeper cluster ? Or not ?

I read again and see Solr embeds and uses Zookeeper as a repository for
cluster configuration and coordination, so meaning just configuration, not
index repository at all ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018902.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread Erick Erickson

Right. Solr uses zookeeper only for configuration information. The index
resides on the machines running solr.

bq: In my solr instances directory tree, /solr/mycollection/ sometimes I
have an index or index.20121107185908378

You can configure Solr to keep snapshots of indexes around under control of
an index deletion policy, which you can configure. I think what you're
seeing is this policy in action, you can check to see how it's set up in
your particular situation. This is independent of SolrCloud, it's local to
the solr node.

About CREATE. I'm not entirely sure where the config comes from, sorry I
can't help there... What does the solr.xml file show? Are there instanceDir
attribute to the newly-created core (or schema or config)?

Best
Erick

On Wed, Nov 7, 2012 at 3:52 PM, darul daru...@gmail.com wrote:

I reply to myself :

darul wrote

I read again and see Solr embeds and uses Zookeeper as a repository for
cluster configuration and coordination, so meaning just configuration, not
index repository at all ?

--
View this message in context:
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018902.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread darul

Yes instanceDir attribute point to new created core (with no conf dir) so it
is stranged...

but looks like I have played to much:



when I start main solr shard. I try everything again tomorrow and give you
feedback.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018909.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Testing Solr Cloud with ZooKeeper

2012-11-07 Thread Otis Gospodnetic

You didn't ask about this, but you'll want an odd number of zookeeper
nodes. Think voting.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 7, 2012 4:43 PM, darul daru...@gmail.com wrote:

 Yes instanceDir attribute point to new created core (with no conf dir) so
 it
 is stranged...

 but looks like I have played to much:



 when I start main solr shard. I try everything again tomorrow and give you
 feedback.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Testing-Solr-Cloud-with-ZooKeeper-tp4018900p4018909.html
 Sent from the Solr - User mailing list archive at Nabble.com.

1 2 3 >

1 - 100 of 211 matches

Mail list logo