DIH

2014-02-14 Thread William Bell
On virtual cores the DIH handler is really slow. On a 12 core box it only
uses 1 core while indexing.

Does anyone know how to do Java threading from a SQL query into Solr?
Examples?

I can use SolrJ to do it, or I might be able to modify DIH to enable
threading.

At some point in 3.x threading was enabled in DIH, but it was removed since
people where having issues with it (we never did).

?

-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: SolrCloud Zookeeper disconnection/reconnection

2014-02-14 Thread Ramkumar R. Aiyengar
Ludovic, recent Solr changes won't do much to prevent ZK session expiry,
you might want to enable GC logging on Solr and Zookeeper to check for
pauses and tune appropriately.

The patch below fixes a situation under which the cloud can get to a bad
state during the recovery after session expiry. The recovery after a
session expiry is unavoidable, but as you guessed, it would be quick if
there aren't too many updates.

4.6.1 also has SOLR-5577 which will prevent updates from unnecessarily
stalling when you are disconnected from ZK for a short while.

These changes (and probably others) will thus probably help the cloud
behave better on ZK expiry and for that reason I would encourage you to
upgrade, but the ZK expiry problem would have to be dealt with ensuring
that ZK and Solr don't pause for too long and by choosing an appropriate
session timeout (which btw will be defaulted up to 30s from 15s in Solr 4.7
onwards).
On 13 Feb 2014 08:23, "lboutros"  wrote:

> Dear all,
>
> we are currenty using Solr 4.3.1 in production (With SolrCloud).
>
> We encounter quite the same problem described in this other old post:
>
>
> http://lucene.472066.n3.nabble.com/SolrCloud-CloudSolrServer-Zookeeper-disconnects-and-re-connects-with-heavy-memory-usage-consumption-td4026421.html
>
> Sometime some nodes are disconnected from Zookeeper and then they try to
> reconnect. The process is quite long because we have a quite long warming
> process. And because of this long warming process, just after the recovery
> process, the node is disconnected again and so on... until OOM sometime.
>
> We already increased the Zk timeout. But it is not enought.
>
> We are thinking to migrate to Solr 4.6.1 at least (perhaps 4.7 will be up
> before the end of the migration :) ).
>
> I know that a lot of SolrCloud bugs are corrected since Solr 4.3.1.
>
> But, could we be sure that this problem will be resolved ? Or can this
> problem occur with the last Solr version ? (I know this is not an easy
> question ;) )
>
> It seems that this correction :
>
> Deadlock while trying to recover after a ZK session expiry :
> https://issues.apache.org/jira/browse/SOLR-5615
>
> is a good point in addressing our current problem.
>
> But do you think it will be enought ?
>
> One last thing, I don't know if it is already adressed by a correction,
> but,
> if there is no updates between disconnection and the reconnection, the
> recovery process should not do anything more than the reconnection, I mean:
> no replication, no tLog replay and no warming process. Is it the case ?
>
> Ludovic.
>
>
>
> -
> Jouve
> France.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-Zookeeper-disconnection-reconnection-tp4117101.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


There is integration of spring-data-solr with tika?

2014-02-14 Thread osymad
There is some configuration to use spring-data-solr with tika? Otherwise some
alternative to solrj ContentStreamUpdateRequest+addfile for
spring-data-solr?

Currently I have another one using sorj + tika as:

*
SolrServer server = new HttpSolrServer(URL);
...
Tika tika = new Tika();
...
String fileType = tika.detect(path.toFile());
up = new ContentStreamUpdateRequest("/update/extract"); 
up.addFile(path.toFile(), fileType);
up.setParam("literal.id", idField);
...
up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);
NamedList request = server.request(up);*

Following the ExtractingRequestHandler guide with success.

Using solr 4.3.0

Is is possible get same result using only spring-data-solr instead solrj
directly?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/There-is-integration-of-spring-data-solr-with-tika-tp4117478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Hot Cpu and high load

2014-02-14 Thread Tri Cao
1. Yes, that's the right way to go, well, in theory at least :)2. Yes, queries are alway fanned to all shards and will be as slow as the slowest shard. When I looked intoSolr distributed querying implementation a few months back, the support for graceful degradation for thingslike network failures and slow shards was not there yet.3. I doubt mmap settings would impact your read-only load, and it seems you can easilyfit your index in RAM. You could try to warm the file cache to make sure with "cat $sorl_dir > /dev/null". It's odd that only 2 nodes are at 100% in your set up. I would check a couple of things:a. Are you docs distributed evenly across shards: number of docs and size of the shardsb. Is your test client querying all nodes, or all the queries go to those 2 busy nodes?Regards,TriOn Feb 14, 2014, at 10:52 AM, Nitin Sharma  wrote:Hell folks  We are currently using solrcloud 4.3.1. We have 8 node solrcloud cluster with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the solrconfig used by our collections  We have many collections and some of them are relatively very large compared to the other. The size of the shard of these big collections are in the order of Gigabytes.We decided to split the bigger collection evenly across all nodes (8 shards and 2 replicas) with maxNumShards > 1.  We did a test with a read load only on one big collection and we still see only 2 nodes running 100% CPU and the rest are blazing through the queries way faster (under 30% cpu). [Despite all of them being sharded across all nodes]  I checked the JVM usage and found that none of the pools have high utilization (except Survivor space which is 100%). The GC cycles are in the order of ms and mostly doing scavenge. Mark and sweep occurs once every 30 minutes  Few questions:  1. Sharding all collections (small and large) across all nodes evenly distributes the load and makes the system characteristics of all machines similar. Is this a recommended way to do ? 2. Solr Cloud does a distributed query by default. So if a node is at 100% CPU does it slow down the response time for the other nodes waiting for this query? (or does it have a timeout if it cannot get a response from a node within x seconds?) 3. Our collections use Mmap directory but i specifically haven't enabled anything related to mmaps (locked pages under ulimit ). Does it adverse affect performance? or can lock pages even without this?  Thanks a lot in advance. Nitin

Re: Solr Load Testing Issues

2014-02-14 Thread Shawn Heisey
On 2/14/2014 5:28 AM, Annette Newton wrote:
> Solr Version: 4.3.1
> Number Shards: 10
> Replicas: 1
> Heap size: 15GB
> Machine RAM: 30GB
> Zookeeper timeout: 45 seconds
> 
> We are continuing the fight to keep our solr setup functioning.  As a
> result of this we have made significant changes to our schema to reduce the
> amount of data we write.  I setup a new cluster to reindex our data,
> initially I ran the import with no replicas, and achieved quite impressive
> results.  Our peak was 60,000 new documents per minute, no shard loses, no
> outages due to garbage collection (which is an issue we see in production),
> at the end of the load the index stood at 97,000,000 documents and 20GB per
> shard.  During the highest insertion rate I would say that querying
> suffered, but that is not of concern right now.

Solr 4.3.1 has a number of problems when it comes to large clouds.
Upgrading to 4.6.1 would be strongly advisable, but that's only
something to try after looking into the rest of what I have to say.

If I read what you've written correctly, you are running all this on one
machine.  To put it bluntly, this isn't going to work well unless you
put a LOT more memory into that machine.

For good performance, Solr relies on the OS disk cache, because reading
from the disk is VERY expensive in terms of time.  The OS will
automatically use RAM that's not being used for other purposes for the
disk cache, so that it can avoid reading off the disk as much as possible.

http://wiki.apache.org/solr/SolrPerformanceProblems

Below is a summary of what that Wiki page says, with your numbers as I
understand them.  If I am misunderstanding your numbers, then this
advice may need adjustment.  Note that when I see "one replica" I take
that to mean replicationFactor=1, so there is only one copy of the
index.  If you actually mean that you have *two* copies, then you have
twice as much data as I've indicated below, and your requirements will
be even larger:

With ten shards that are each 20GB in size, your total index size is
200GB.  With 15 GB of heap, your ideal memory size for that server would
be 215GB -- the 15GB heap plus enough extra to fit the entire 200GB
index into RAM.

In reality you probably don't need that much, but it's likely that you
would need at least half the index to fit into RAM at any one moment,
which adds up to 115GB.  If you're prepared to deal with
moderate-to-severe performance problems, you **MIGHT** be able to get
away with only 25% of the index fitting into RAM, which still requires
65GB of RAM, but with SolrCloud, such performance problems usually mean
that the cloud won't be stable, so it's not advisable to even try it.

One of the bits of advice on the wiki page is to split your index into
shards and put it on more machines, which drops the memory requirements
for each machine.  You're already using a multi-shard SolrCloud, so you
probably just need more hardware.  If you had one 20GB shard on a
machine with 30GB of RAM, you could probably use a heap size of 4-8GB
per machine and have plenty of RAM left over to cache the index very
well.  You could most likely add another 50% to the index size and still
be OK.

Thanks,
Shawn



Re: SolrJ Socket Leak

2014-02-14 Thread Shawn Heisey
On 2/14/2014 2:45 AM, Jared Rodriguez wrote:
> Thanks for the info, I will look into the open file count and try to
> provide more info on how this is occurring.
> 
> Just to make sure that our scenarios were the same, in your tests did you
> simulate many concurrent inbound connections to your web app, with each
> connection sharing the same instance of HttpSolrServer for queries?

I've bumped the max open file limit (in /etc/security/limits.conf on
CentOS) to a soft/hard limit of 49151/65535.  I've also bumped the
process limits to 4096/6144.  These are specific to the user that runs
Solr and other related programs.

My SolrJ program is not actually a web application.  It is my indexing
application, a standalone java program.  We do use SolrJ in our web
application, but that's handled by someone else.  I do know that it uses
a single HttpSolrServer instance across the entire webapp.

When this specific copy of the indexing application (for my dev server)
starts up, it creates 15 HttpSolrServer instances that are used for the
life of the application.  The application will run for weeks or months
at a time and has never had a problem with leaks.

One of these instances points at the /solr URL, which I use for
CoreAdminRequest queries.  Each of the other 14 point at one of the Solr
cores.  My production copy, which has a config file to update two copies
of the index on four servers, creates 32 instances -- four of them for
CoreAdmin requests and 28 of them for cores.

Updates are run once a minute.  One cycle will typically involve several
Solr requests.  Sometimes they are queries, but most of the time they
are update requests.

The application uses database connection pooling (Apache Commons code)
to talk to a MySQL server, pulls in data for indexing, and then sends
requests to Solr.  Most of the time, it only goes to one HttpSolrServer
instance, the core where all new data lives.  Occasionally it will talk
to up to seven of the 14 HttpSolrServer instances -- the ones pointing
at the "live" cores.

When a full rebuild is underway, it starts the dataimport handler on the
seven build cores.  As part of the once-a-minute update cycle, it also
gathers status information on those dataimports.  When the rebuild
finishes, it runs an update on those cores and then does CoreAdmin SWAP
requests to switch to the new index.

I did run a rebuild, and I let the normal indexing run for a really long
time, so I could be sure that it was using all HttpSolrServer instances.
 It never had more than a few dozen connections listed in the netstat
output.

Thanks,
Shawn



Solr Hot Cpu and high load

2014-02-14 Thread Nitin Sharma
Hell folks

  We are currently using solrcloud 4.3.1. We have 8 node solrcloud cluster
with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
solrconfig used by our collections

We have many collections and some of them are relatively very large
compared to the other. The size of the shard of these big  collections are
in the order of Gigabytes.We decided to split the bigger collection evenly
across all nodes (8 shards and 2 replicas) with maxNumShards > 1.

We did a test with a read load only on one big collection and we still see
only 2 nodes running 100% CPU and the rest are blazing through the queries
way faster (under 30% cpu). [Despite all of them being sharded across all
nodes]

I checked the JVM usage and found that none of the pools have high
 utilization (except Survivor space which is 100%). The GC cycles are in
the order of ms and mostly doing scavenge. Mark and sweep occurs once every
30 minutes

Few questions:

   1. Sharding all collections (small and large) across all nodes evenly
   distributes the load and makes the system characteristics of all machines
   similar. Is this a recommended way to do ?
   2. Solr Cloud does a distributed query by default. So if a node is at
   100% CPU does it slow down the response time for the other nodes waiting
   for this query? (or does it have a timeout if it cannot get a response from
   a node within x seconds?)
   3. Our collections use Mmap directory but i specifically haven't enabled
   anything related to mmaps (locked pages under ulimit ). Does it adverse
   affect performance? or can lock pages even without this?

Thanks a lot in advance.
Nitin


Re: Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Eric_Peng
Last 3 lines of the log(only these have something special)

12:17:12
WARN
SolrResourceLoader
Solr loaded a deprecated plugin/analysis class
[solr.JsonUpdateRequestHandler]. Please consult documentation how to replace
it accordingly.
12:17:12
WARN
SolrResourceLoader
Solr loaded a deprecated plugin/analysis class
[solr.JsonUpdateRequestHandler]. Please consult documentation how to replace
it accordingly.
12:17:12
WARN
SolrResourceLoader
Solr loaded a deprecated plugin/analysis class [solr.CSVRequestHandler].
Please consult documentation how to replace it accordingly.

I didn't use the default debugger, I use "Run-Jetty_Run" for debugging in
Eclipse, may be this is the reason?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117461.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiple Column Condition with Relevance/Rank

2014-02-14 Thread Tri Cao
Taminidi,Relevance ranking is tricky and very domain specific. There are usually multiple ways to do the same thing, each is better at some edge cases and worse at some others :)It looks to me that you are trying to rank the products by: exact match on SKU, then exact match on ManufactureSKU, then text match on SKU, …, then finally text match on names and description.One way of doing this is you index your products to fields like exact_sku, text_sku, etc. with the proper text analysis chains. You can then use edismax query, and assign the right weights to these fields, e.g.:exact_sku^10 text_sku^9 …Regards,TriOn Feb 14, 2014, at 10:11 AM, "EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)"  wrote:Thanks for the Info, I am getting all the results, but I need to get the exact match as the first row and after that the near by match. Something like relevance order.   -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com]  Sent: Thursday, February 13, 2014 9:17 AM To: solr-user@lucene.apache.org Subject: Re: Multiple Column Condition with Relevance/Rank  Use the OR operator between the specific clauses.  -- Jack Krupansky  -Original Message-  From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) Sent: Thursday, February 13, 2014 9:09 AM To: solr-user@lucene.apache.org Subject: Multiple Column Condition with Relevance/Rank   Hello, Someone can help me on implementing the below query in Solr, I will  using a rank in MS SQL and a return distinct Productid  Select productid from products where SKU = "101" Select Productid from products where ManufactureSKU = "101" Select Productid from product where SKU Like "101%" Select Productid from Product where ManufactureSKU like "101%" Select Productid from product where Name Like "101%" Select Productid from Product where Description like '%101%"  Is there any way in Solr can search the exact match,starts with and  anywhere.. in single solr query  

RE: Multiple Column Condition with Relevance/Rank

2014-02-14 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Thanks for the Info, I am getting all the results, but I need to get the exact 
match as the first row and after that the near by match. Something like 
relevance order.


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Thursday, February 13, 2014 9:17 AM
To: solr-user@lucene.apache.org
Subject: Re: Multiple Column Condition with Relevance/Rank

Use the OR operator between the specific clauses.

-- Jack Krupansky

-Original Message- 
From: EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)
Sent: Thursday, February 13, 2014 9:09 AM
To: solr-user@lucene.apache.org
Subject: Multiple Column Condition with Relevance/Rank


Hello, Someone can help me on implementing the below query in Solr, I will 
using a rank in MS SQL and a return distinct Productid

Select productid from products where SKU = "101"
Select Productid from products where ManufactureSKU = "101"
Select Productid from product where SKU Like "101%"
Select Productid from Product where ManufactureSKU like "101%"
Select Productid from product where Name Like "101%"
Select Productid from Product where Description like '%101%"

Is there any way in Solr can search the exact match,starts with and 
anywhere.. in single solr query 



Re: Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Erick Erickson
More looking for log messages at the end of the log. There are a bunch of
messages that come out when you start up your server. Also look at enabling
more detailed logging, will take editing the log properties file since, if
you don't
find anything useful at the INFO level, you need to change things when
you start.

An alternative. In the "how to contribute" page on the Solr Wiki there are
instructions on setting up a remote debugging session. Actually, I think the
instructions are linked from there on detail pages for using Eclipse or
IntelliJ.
If you start the Solr following those instructions, you can specify that it
wait
(suspend=y) and connect to it with a debugger to see what it is doing.

You can also use jstack to dump the stack when it's hung to see where all
the threads are.

But overall, you'll have to dig. The behavior you're seeing isn't expected,
and
we can't diagnose it without more info.

Best,
Erick
server


On Fri, Feb 14, 2014 at 9:18 AM, Eric_Peng wrote:

> One another thing wield is that: If I just import a small data
> everything goes very well.
>
> Still have the same warnings in log but everything works fine
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117435.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Eric_Peng
One another thing wield is that: If I just import a small data 
everything goes very well.

Still have the same warnings in log but everything works fine




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117435.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Eric_Peng
I think I juse used hard Commit

// Make the docs we just added searchable using a "hard" commit
solr.commit(true, true);



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117432.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Eric_Peng
Some warning...always,but no errors

12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader:
../../../contrib/extraction/lib (resolved as:
D:\workspaceEE\TestSolr\solr\collection1\..\..\..\contrib\extraction\lib).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader: ../../../dist/
(resolved as: D:\workspaceEE\TestSolr\solr\collection1\..\..\..\dist).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader:
../../../contrib/clustering/lib/ (resolved as:
D:\workspaceEE\TestSolr\solr\collection1\..\..\..\contrib\clustering\lib).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader: ../../../dist/
(resolved as: D:\workspaceEE\TestSolr\solr\collection1\..\..\..\dist).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader:
../../../contrib/langid/lib/ (resolved as:
D:\workspaceEE\TestSolr\solr\collection1\..\..\..\contrib\langid\lib).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader: ../../../dist/
(resolved as: D:\workspaceEE\TestSolr\solr\collection1\..\..\..\dist).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader:
../../../contrib/velocity/lib (resolved as:
D:\workspaceEE\TestSolr\solr\collection1\..\..\..\contrib\velocity\lib).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader: ../../../dist/
(resolved as: D:\workspaceEE\TestSolr\solr\collection1\..\..\..\dist).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader:
../../../contrib/dataimporthandler/lib (resolved as:
D:\workspaceEE\TestSolr\solr\solrpedia\..\..\..\contrib\dataimporthandler\lib).
12:00:23
WARN
SolrResourceLoader
Can't find (or read) directory to add to classloader: ../../../dist/
(resolved as: D:\workspaceEE\TestSolr\solr\solrpedia\..\..\..\dist).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416p4117429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Greg Walters
You should check your server logs for error messages during startup related to 
loading that core. Feel free to post them here if you can't parse them.

Thanks,
Greg

On Feb 14, 2014, at 10:14 AM, Eric_Peng  wrote:

> Need help, Thx in advance.
> 
> About import a big XML document (using DIH or SolrJ) into Solr core(for
> examle "SolrPedia"), it works fine when I query.
> 
> But after I stop the server, and then restart my Jetty server.
> 
> I could not ping the core "SolrPedia"
> 
>  
> 
> I have to remove the core and re-import data again.
> 
> I will be very appreciated your reply. 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Erick Erickson
when you restart the server, what do the logs show?

Take a look at the size of your transaction log. There's
a possibility that if you do not commit correctly, all the
data you put in the core is replaying on startup, see:
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Otherwise, theres's not enough data to say much here,
the behavior you're seeing definitely shouldn't be
happening.

Best,
Erick


On Fri, Feb 14, 2014 at 8:14 AM, Eric_Peng wrote:

> Need help, Thx in advance.
>
> About import a big XML document (using DIH or SolrJ) into Solr core(for
> examle "SolrPedia"), it works fine when I query.
>
> But after I stop the server, and then restart my Jetty server.
>
> I could not ping the core "SolrPedia"
>
> 
>
> I have to remove the core and re-import data again.
>
> I will be very appreciated your reply.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Could not connect or ping a core after import a big data into it...

2014-02-14 Thread Eric_Peng
Need help, Thx in advance.

About import a big XML document (using DIH or SolrJ) into Solr core(for
examle "SolrPedia"), it works fine when I query.

But after I stop the server, and then restart my Jetty server.

I could not ping the core "SolrPedia"

 

I have to remove the core and re-import data again.

I will be very appreciated your reply. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Could-not-connect-or-ping-a-core-after-import-a-big-data-into-it-tp4117416.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr is making Jboss server slower..

2014-02-14 Thread Erick Erickson
Nothing in that stack trace looks like it has anything
to do with Solr/Lucene. What makes you think Solr
is the culprit?

jstack can be used to see what _all_ the threads on
a machine are doing, you might try using that.

You also need to describe what the server "coming down" means.
Gets slow? Crashes? Stops accepting requests from anyone?

Best,
Erick


On Fri, Feb 14, 2014 at 3:11 AM, Ramesh  wrote:

> Hi,
> We are using the solr search, Sometimes due to this our entire system
> is
> becoming slow. We didn't got any clue.. So we tried to test load & stress
> test through the Jmeter. Here we have tried for many times with different
> testing ways. At random times in Jmeter we are getting the following
> message. And randomized times again the server is getting hanged and coming
> down. I have checked in our code as well All connections are closed. So can
> you please check this what could be the reason for failing.
>
> java.lang.Thread.start0(Native Method)
> java.lang.Thread.start(Thread.java:679)
>
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
>
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
>
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:120)
>
> com.ning.http.client.providers.jdk.JDKAsyncHttpProvider.execute(JDKAsyncHttpProvider.java:158)
>
> com.ning.http.client.providers.jdk.JDKAsyncHttpProvider.execute(JDKAsyncHttpProvider.java:120)
>
> com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:512)
>
> com.ning.http.client.AsyncHttpClient$BoundRequestBuilder.execute(AsyncHttpClient.java:234)
> com.digite.utils.GeneralUtils.getTagData(GeneralUtils.java:171)
>
> com.digite.app.kanban.search.web.action.SearchAction.getTagResults(SearchAction.java:337)
>
> com.digite.app.kanban.search.web.action.SearchAction.searchItem(SearchAction.java:73)
> sun.reflect.GeneratedMethodAccessor707.invoke(Unknown Source)
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> java.lang.reflect.Method.invoke(Method.java:622)
>
> com.opensymphony.xwork2.DefaultActionInvocation.invokeAction(DefaultActionInvocation.java:453)
>
>
>
> Thanks in Advance... Waiting for your reply...
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-is-making-Jboss-server-slower-tp4117343.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: update in SolrCloud through C++ client

2014-02-14 Thread Erick Erickson
OK, I'll go way out on a limb here since I know so
little about the guts of the ZK/Solr interactions on the
theory that if I'm wrong someone will jump in and I'll
remember the data due to being embarrassed.

ZK doesn't really know much about Solr. It knows there
are a bunch of nodes out there running this thing called
"Solr". It knows the state/roles of those nodes mostly
because the nodes _tell_ ZK about themselves. It knows
nodes are up/down because it hasn't heard from them in
a while.

What it doesn't know is what those nodes _do_. To "forwards
it to appropriate replica/leader" code ZK would have to be
able to pull apart a (potentially) many document packet of
SolrInputDocuments, grok that there's this  field
(which would mean it would have to understand the schema)
understand the routing mechanism to be used to
identify the right leader and forward each doc appropriately.
IOW it would have to implement much of CloudSolrServer.

And even if it did all that (which IMO would be architecturally
_very_ bad), it would be bad for throughput. Now my 3
ZK nodes have to handle _all_ the routing traffic for my 100
node cluster, introducing a potential bottleneck.

I know you're using C++ so the Java version may not apply,
but the CloudSolrServer class hides all of this and _does_ send
the docs to the right leader all without burdening the ZK nodes.
I know there has been a C++ port of SolrJ, but don't
know whether it has been kept up to date with the more recent
SolrJ improvements.

Whew! Occasionally I write these in order to make myself think
about things, what did I mess up? (Mark, Shalin and Noble may
jump all over this, won't be the first time)...

Erick


On Fri, Feb 14, 2014 at 2:57 AM, neerajp  wrote:

> Hello All,
> I am using Solr for indexing my data. My client is in C++. So I make Curl
> request to Solr server for indexing.
> Now, I want to use indexing in SolrCloud mode using ZooKeeper for HA.  I
> read the wiki link of SolrCloud (http://wiki.apache.org/solr/SolrCloud).
>
> What I understand from wiki that we should always check solr instance
> status(up & running) in solrCloud before making an update request. Can I
> not
> send update request to zookeeper and let the zookeeper forwards it to
> appropriate replica/leader ? In the later case I need not to worry which
> servers are up and running before making indexing request.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/update-in-SolrCloud-through-C-client-tp4117340.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Exact word match

2014-02-14 Thread Jack Krupansky
Set the default query operator (q.op parameter) to AND, or enclosed the full 
phrase in quotes.


-- Jack Krupansky

-Original Message- 
From: Sohan Kalsariya

Sent: Friday, February 14, 2014 4:29 AM
To: solr-user@lucene.apache.org
Subject: Exact word match

Hello,
I want to the exact results for my query.
For e.g. when i search "New york" , then it doesn't give me exact results
about New York
But it also give me results like:
New York
New Delhi
And other cities starting from "New"
So how can i get only results for "New York" ?

--
Regards,
*Sohan Kalsariya* 



Solr Load Testing Issues

2014-02-14 Thread Annette Newton
Solr Version: 4.3.1
Number Shards: 10
Replicas: 1
Heap size: 15GB
Machine RAM: 30GB
Zookeeper timeout: 45 seconds

We are continuing the fight to keep our solr setup functioning.  As a
result of this we have made significant changes to our schema to reduce the
amount of data we write.  I setup a new cluster to reindex our data,
initially I ran the import with no replicas, and achieved quite impressive
results.  Our peak was 60,000 new documents per minute, no shard loses, no
outages due to garbage collection (which is an issue we see in production),
at the end of the load the index stood at 97,000,000 documents and 20GB per
shard.  During the highest insertion rate I would say that querying
suffered, but that is not of concern right now.

I have now added in 1 replica for each shard, indexing time has doubled -
not surprising - and as it was so good to start with not a problem.  I
continue to just write to the leaders and the issue is that that replicas
are continually going into recovery.

The leaders show:


ERROR - 2014-02-14 11:47:45.757; org.apache.solr.common.SolrException;
shard update error StdNode:
http://10.35.133.176:8983/solr/sessionfilterset/:org.apache.solr.client.solrj.SolrServerException:
IOException occured when talking to server at:
http://10.35.133.176:8983/solr/sessionfilterset
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:413)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401)
at
org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:375)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.NoHttpResponseException: The target server
failed to respond
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:95)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:62)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:254)
at
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:289)
at
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:252)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:191)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:300)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:127)
at
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:717)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:522)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
... 11 more

The replica is not busy garbage collecting, as it doesn't coincide with a
full gc and the collection times are low.  The replica appears to be
accepting adds milliseconds before this appears in the log:

INFO  - 2014-02-14 11:59:54.366;
org.apache.solr.handler.admin.CoreAdminHandler; It has been requested that
we recover

I have reduced the load down to 5,000 documents per minute and they appear
to only stay up for a couple of minutes, I would like to be confident that
we could handle more than this during our peak times.

Initially I was getting connection reset errors on the leaders, but I
changed the jetty connector to the nio one and now the above message is
what I have received.  I have also upped the header request and response
sizes.

Any ideas - other than not using replicas as proposed by a colleague?

Thanks very much in advance.


-- 

Annette Newton

Database Administrator

ServiceTick Ltd



T:+44(0)1603 618326



Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ

www.servicetick.com

*www.sessioncam.com *

-- 
*This message is confidential and is intended to be read solely by

Solr is making Jboss server slower..

2014-02-14 Thread Ramesh
Hi,
We are using the solr search, Sometimes due to this our entire system is
becoming slow. We didn't got any clue.. So we tried to test load & stress
test through the Jmeter. Here we have tried for many times with different
testing ways. At random times in Jmeter we are getting the following
message. And randomized times again the server is getting hanged and coming
down. I have checked in our code as well All connections are closed. So can
you please check this what could be the reason for failing. 

java.lang.Thread.start0(Native Method)
java.lang.Thread.start(Thread.java:679)
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:120)
com.ning.http.client.providers.jdk.JDKAsyncHttpProvider.execute(JDKAsyncHttpProvider.java:158)
com.ning.http.client.providers.jdk.JDKAsyncHttpProvider.execute(JDKAsyncHttpProvider.java:120)
com.ning.http.client.AsyncHttpClient.executeRequest(AsyncHttpClient.java:512)
com.ning.http.client.AsyncHttpClient$BoundRequestBuilder.execute(AsyncHttpClient.java:234)
com.digite.utils.GeneralUtils.getTagData(GeneralUtils.java:171)
com.digite.app.kanban.search.web.action.SearchAction.getTagResults(SearchAction.java:337)
com.digite.app.kanban.search.web.action.SearchAction.searchItem(SearchAction.java:73)
sun.reflect.GeneratedMethodAccessor707.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:622)
com.opensymphony.xwork2.DefaultActionInvocation.invokeAction(DefaultActionInvocation.java:453)



Thanks in Advance... Waiting for your reply...






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-is-making-Jboss-server-slower-tp4117343.html
Sent from the Solr - User mailing list archive at Nabble.com.


update in SolrCloud through C++ client

2014-02-14 Thread neerajp
Hello All,
I am using Solr for indexing my data. My client is in C++. So I make Curl
request to Solr server for indexing.
Now, I want to use indexing in SolrCloud mode using ZooKeeper for HA.  I
read the wiki link of SolrCloud (http://wiki.apache.org/solr/SolrCloud). 

What I understand from wiki that we should always check solr instance
status(up & running) in solrCloud before making an update request. Can I not
send update request to zookeeper and let the zookeeper forwards it to
appropriate replica/leader ? In the later case I need not to worry which
servers are up and running before making indexing request. 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/update-in-SolrCloud-through-C-client-tp4117340.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrJ Socket Leak

2014-02-14 Thread Jared Rodriguez
Thanks for the info, I will look into the open file count and try to
provide more info on how this is occurring.

Just to make sure that our scenarios were the same, in your tests did you
simulate many concurrent inbound connections to your web app, with each
connection sharing the same instance of HttpSolrServer for queries?


On Thu, Feb 13, 2014 at 6:58 PM, Shawn Heisey  wrote:

> On 2/13/2014 3:17 PM, Jared Rodriguez wrote:
>
>> I just regressed to Solrj 4.6.1 with http client 4.2.6 and am trying to
>> reproduce the problem.  Using YourKit to profile and even just manually
>> simulating a few users at once, I see the same problem of open sockets.  6
>> sockets opened to the solr server and 2 of them still open after all is
>> done and there is no server activity.  Although this could be sockets kept
>> in a connection pool.
>>
>
> I did two separate upgrade steps, SolrJ 4.5.1 to 4.6.1, and HttpClient
> 4.3.1 to 4.3.2, and I'm not seeing any evidence of connection leaks.
>
>
> On your connections, if they are in TIME_WAIT, I'm pretty sure that means
> that the program is done with them because it's closed the connection and
> it's the operating system that is in charge.  See the answer with the green
> checkmark here:
>
> http://superuser.com/questions/173535/what-are-close-wait-and-time-wait-
> states
>
> I think the default timeout for WAIT states on a modern Linux system is 60
> seconds, not four minutes as described on that answer.
>
> With your connection rate and the default 60 second timeout for WAIT
> states, another resource that might be in short supply is file descriptors.
>
> Thanks,
> Shawn
>
>


-- 
Jared Rodriguez


Exact word match

2014-02-14 Thread Sohan Kalsariya
Hello,
I want to the exact results for my query.
For e.g. when i search "New york" , then it doesn't give me exact results
about New York
But it also give me results like:
New York
New Delhi
And other cities starting from "New"
So how can i get only results for "New York" ?

-- 
Regards,
*Sohan Kalsariya*