Re: master/slave failure scenario

2009-05-21 Thread Bryan Talbot
Indexing is usually much more expensive that replication so it won't  
scale well as you add more servers.  Also, what would a client do if  
it was able to send the update to only some of the servers because  
others were down (for maintenance, etc)?




-Bryan




On May 21, 2009, at May 21, 6:04 AM, nk 11 wrote:

Just curious. What would be the disadvantages of a no replication /  
multi

master (no slave) setup?
The client code should do the updates for evey master ofc, but if one
machine would fail then I can imediatly continue the indexing  
process and

also I can query the index on any machine for a valid result.
I might be missing something...
On Thu, May 14, 2009 at 4:19 PM, nk 11 nick.cass...@gmail.com wrote:


wow! that was just a couple of days old!
thanks as lot!
 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com



yeah there is a hack

https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel 
#action_12708316


On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com  
wrote:

sorry for the mail. I wanted to hit reply :(

On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com  
wrote:


oh, so the configuration must be manualy changed?
Can't something be passed at (re)start time?

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com


On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com

wrote:
Ok so the VIP will point to the new master. but what makes a  
slave

promoted
to a master? Only the fact that it will receive add/update  
requests?

And I suppose that this hot promotion is possible only if the

slave

is
convigured as master also...
right.. By default you can setup all slaves to be master also.  
It does

not cost anything if it is not serving any requests.

so , if you have such a setting you will have to disable that  
slave to
be a slave and restart it and you will have to make the VIP  
point to

this new slave as master.

so hot promotion is still not possible.


2009/5/14 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com


ideally , we don't do that.
you can just keep the master host behind a VIP so if you wish  
to

change the master make the VIP point to the new host

On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass. 
1...@gmail.com

wrote:

This is more interesting.Such a procedure would involve taking

down

and
reconfiguring the slave?

On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
btal...@aeriagames.comwrote:


Or ...

1. Promote existing slave to new master
2. Add new slave to cluster




-Bryan





On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

- Migrate configuration files from old master (or backup) to

new

master.

- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com



wrote:

Nice.

What if the master fails permanently (like a disk crash...)

and

the
new
master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नो 
ब्ळ् noble.p...@corp.aol.com


On Wed, May 13, 2009 at 12:10 PM, nk 11 

nick.cass...@gmail.com

wrote:



Hello

I'm kind of new to Solr and I've read about  
replication, and

the
fact


that a


node can act as both master and slave.
I a replica fails and then comes back on line I suppose  
that

it

will


resyncs


with the master.


right



But what happnes if the master fails? A slave that is
configured as


master


will kick in? What if that slave is not yes fully sync'ed

with

the


failed



master and has old data?


if the master fails you can't index the data. but the  
slaves

will
continue serving the requests with the last index. You an

bring

back
the master up and resume indexing.


What happens when the original master comes back on  
line? He

will


remain



a


slave because there is another node with the master role?

Thank you!





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com












--
-
Noble Paul | Principal Engineer| AOL | http://aol.com







--
-
Noble Paul | Principal Engineer| AOL | http://aol.com









--
-
Noble Paul | Principal Engineer| AOL | http://aol.com








Re: Howto? Applying a filter across schema fileds using state information

2009-05-18 Thread Bryan Talbot
I needed to do something like this recently as well.  I needed to copy  
a date field (with full precision to the millisecond) to a string  
field of just MMDD.  I didn't see a way to do it in solr core.  I  
ended up doing it in the Data Import Handler during import.  I'd  
rather have code like that in the core someplace in case documents are  
added via some other mechanism.




-Bryan




On May 18, 2009, at May 18, 1:44 AM, Yatir wrote:



Hi,
I need to write a filter that extracts information from the content  
of one

filed (say the Body field)
and then applies some transformation based on this content, to a  
*different*

filed (say: the Title field)

is this possible ?

Example: I will find certain keywords in the body and then locate  
them and

transform them in the title



--
View this message in context: 
http://www.nabble.com/Howto--Applying-a-filter-across-schema-fileds-using-state-information-tp23593424p23593424.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: replication of lucene-write.lock file

2009-05-15 Thread Bryan Talbot


https://issues.apache.org/jira/browse/SOLR-1170


-Bryan




On May 15, 2009, at May 15, 12:24 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



the replication relies on lucene API to know what are the files
associated with an index version. If it returns the lock file also it
is replicated too.

I guess we must ignore the .lock file if it is returned in the list  
of files.


you can raise an issue and we can fix it.
--Noble

On Fri, May 15, 2009 at 12:38 AM, Bryan Talbot  
btal...@aeriagames.com wrote:


When using solr 1.4 replication, I see that the lucene-write.lock  
file is
being replicated to slaves.  I'm importing data from a db every 5  
minutes

using cron to trigger a DIH delta-import.  Replication polls every 60
seconds and the master is configured to take a snapshot  
(replicateAfter)

commit.

Why should the lock file be replicated to slaves?

The lock file isn't stale on the master and is absent unless the
delta-import is in process.  I've not tried it yet, but with the  
lock file
replicated, it seems like promotion of a slave to a master in a  
failure

recovery scenario requires the manual removal of the lock file.



-Bryan









--
-
Noble Paul | Principal Engineer| AOL | http://aol.com




Re: Replication master+slave

2009-05-14 Thread Bryan Talbot

https://issues.apache.org/jira/browse/SOLR-1167



-Bryan




On May 13, 2009, at May 13, 7:20 PM, Otis Gospodnetic wrote:



Bryan, maybe it's time to stick this in JIRA?
http://wiki.apache.org/solr/HowToContribute

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot btal...@aeriagames.com
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 10:11:21 PM
Subject: Re: Replication master+slave

I think the patch I included earlier covers solr core, but it looks  
like at
least some other extensions (DIH) create and use their own XML  
parser.  So, if
this functionality is to extend to all XML files, those will need  
similar

patches.

Here's one for DIH:

--- src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java

(revision 774137)
+++ src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java  (working

copy)
@@ -148,8 +148,10 @@
  void loadDataConfig(String configFile) {

try {
-  DocumentBuilder builder = DocumentBuilderFactory.newInstance()
-  .newDocumentBuilder();
+  DocumentBuilderFactory dbf =  
DocumentBuilderFactory.newInstance();

+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  DocumentBuilder builder = dbf.newDocumentBuilder();
  Document document = builder.parse(new InputSource(new  
StringReader(

  configFile)));



The only down side I can see to this is it doesn't offer very  
expressive
conditional inclusion: the file is included if it's present  
otherwise fallback
inclusions can be used.  It's also specific to XML files and  
obviously won't
work for other types of configuration files.  However, it is simple  
and

effective.


-Bryan




On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote:



Coincidentally, from
http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ 
 :


Hadoop configuration files now support XInclude elements for  
including
portions of another configuration file (HADOOP-4944). This  
mechanism allows you

to make configuration files more modular and reusable.


So others are doing it, too.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 11:26:41 AM
Subject: Re: Replication master+slave

I see that Nobel's final comment in SOLR-1154 is that config  
files need to be
able to include snippets from external files.  In my limited  
testing, a

simple

patch to enable XInclude support seems to work.



--- src/java/org/apache/solr/core/Config.java   (revision 774137)
+++ src/java/org/apache/solr/core/Config.java   (working copy)
@@ -100,8 +100,10 @@
if (lis == null) {
  lis = loader.openConfig(name);
}
-  javax.xml.parsers.DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
-  doc = builder.parse(lis);
+  javax.xml.parsers.DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  doc = dbf.newDocumentBuilder().parse(lis);

  DOMUtil.substituteProperties(doc, loader.getCoreProperties());
} catch (ParserConfigurationException e)  {



This allows a clause like this to include the contents of  
replication.xml if

it

exists.  If it's not found an exception will be thrown.



href=http://localhost:8983/solr/corename/admin/file/?file=replication.xml 


   xmlns:xi=http://www.w3.org/2001/XInclude;



If the file is optional and no exception should be thrown if the  
file is
missing, simply include a fallback action: in this case the  
fallback is empty

and does nothing.



href=http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml 


   xmlns:xi=http://www.w3.org/2001/XInclude;




-Bryan




On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:

I was looking at the same problem, and had a discussion with  
Noble. You can

use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:

So how are people managing solrconfig.xml files which are  
largely the same

other than differences for replication?

I don't think it's a good thing to maintain two copies of the  
same file
and I'd like to avoid that.  Maybe enabling the XInclude  
feature in
DocumentBuilders would make it possible to modularize  
configuration files

to

make this possible?






http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)



-Bryan





On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar  
wrote:


On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot

wrote:


For replication in 1.4, the wiki at
http://wiki.apache.org/solr/SolrReplication says that a node  
can be both

the master and a slave:

A node can act as both master and slave. In that case

replication of lucene-write.lock file

2009-05-14 Thread Bryan Talbot


When using solr 1.4 replication, I see that the lucene-write.lock file  
is being replicated to slaves.  I'm importing data from a db every 5  
minutes using cron to trigger a DIH delta-import.  Replication polls  
every 60 seconds and the master is configured to take a snapshot  
(replicateAfter) commit.


Why should the lock file be replicated to slaves?

The lock file isn't stale on the master and is absent unless the delta- 
import is in process.  I've not tried it yet, but with the lock file  
replicated, it seems like promotion of a slave to a master in a  
failure recovery scenario requires the manual removal of the lock file.




-Bryan






Re: Replication master+slave

2009-05-13 Thread Bryan Talbot
I see that Nobel's final comment in SOLR-1154 is that config files  
need to be able to include snippets from external files.  In my  
limited testing, a simple patch to enable XInclude support seems to  
work.




--- src/java/org/apache/solr/core/Config.java   (revision 774137)
+++ src/java/org/apache/solr/core/Config.java   (working copy)
@@ -100,8 +100,10 @@
  if (lis == null) {
lis = loader.openConfig(name);
  }
-  javax.xml.parsers.DocumentBuilder builder =  
DocumentBuilderFactory.newInstance().newDocumentBuilder();

-  doc = builder.parse(lis);
+  javax.xml.parsers.DocumentBuilderFactory dbf =  
DocumentBuilderFactory.newInstance();

+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  doc = dbf.newDocumentBuilder().parse(lis);

DOMUtil.substituteProperties(doc, loader.getCoreProperties());
} catch (ParserConfigurationException e)  {



This allows a clause like this to include the contents of  
replication.xml if it exists.  If it's not found an exception will be  
thrown.


!-- include external file to define replication configuration --
xi:include href=http://localhost:8983/solr/corename/admin/file/?file=replication.xml 


 xmlns:xi=http://www.w3.org/2001/XInclude;
/xi:include


If the file is optional and no exception should be thrown if the file  
is missing, simply include a fallback action: in this case the  
fallback is empty and does nothing.


!-- include external file to define replication configuration --
xi:include href=http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml 


 xmlns:xi=http://www.w3.org/2001/XInclude;
xi:fallback/
/xi:include


-Bryan




On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:

I was looking at the same problem, and had a discussion with Noble.  
You can

use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot  
btal...@aeriagames.comwrote:


So how are people managing solrconfig.xml files which are largely  
the same

other than differences for replication?

I don't think it's a good thing to maintain two copies of the  
same file

and I'd like to avoid that.  Maybe enabling the XInclude feature in
DocumentBuilders would make it possible to modularize configuration  
files to

make this possible?


http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean) 
http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware%28boolean%29 




-Bryan





On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:

On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot  
btal...@aeriagames.com

wrote:


For replication in 1.4, the wiki at
http://wiki.apache.org/solr/SolrReplication says that a node can  
be both

the master and a slave:

A node can act as both master and slave. In that case both the  
master and

slave configuration lists need to be present inside the
ReplicationHandler
requestHandler in the solrconfig.xml.

What does this mean?  Does the core then poll itself for updates?




No. This type of configuration is meant for repeaters. Suppose  
there are
slaves in multiple data-centers (say data center A and B). There  
is always

a
single master (say in A). One of the slaves in B is used as a  
master for

the
other slaves in B. Therefore, this one slave in B is both a master  
as well

as the slave.



I'd like to have a single set of configuration files that are  
shared by

masters and slaves and avoid duplicating configuration details in
multiple
files (one for master and one for slave) to ease management and  
failover.

Is this possible?


You wouldn't want the master to be a slave. So I guess you'd need  
to have

a
separate file. Also, it needs to be a separate file so that the  
slave does

not become a master when the solrconfig.xml is replicated.



When I attempt to setup a multi server master-slave configuration  
and
include both master and slave replication configuration options,  
I into

some
problems.  I'm  running a nightly build from May 7.


Not sure what happened. Is that the url for this solr (meaning  
same solr

url
is master and slave of itself)? If yes, that is not a valid  
configuration.


--
Regards,
Shalin Shekhar Mangar.








Re: master/slave failure scenario

2009-05-13 Thread Bryan Talbot

Or ...

1. Promote existing slave to new master
2. Add new slave to cluster




-Bryan




On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

- Migrate configuration files from old master (or backup) to new  
master.

- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com wrote:


Nice.
What if the master fails permanently (like a disk crash...) and the  
new

master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com


On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com  
wrote:

Hello

I'm kind of new to Solr and I've read about replication, and the  
fact

that a

node can act as both master and slave.
I a replica fails and then comes back on line I suppose that it  
will

resyncs

with the master.

right


But what happnes if the master fails? A slave that is configured as

master

will kick in? What if that slave is not yes fully sync'ed with the

failed

master and has old data?

if the master fails you can't index the data. but the slaves will
continue serving the requests with the last index. You an bring back
the master up and resume indexing.



What happens when the original master comes back on line? He will

remain

a

slave because there is another node with the master role?

Thank you!





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com







Re: Replication master+slave

2009-05-13 Thread Bryan Talbot
I think the patch I included earlier covers solr core, but it looks  
like at least some other extensions (DIH) create and use their own XML  
parser.  So, if this functionality is to extend to all XML files,  
those will need similar patches.


Here's one for DIH:

--- src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java  (revision 774137)
+++ src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java  (working copy)

@@ -148,8 +148,10 @@
   void loadDataConfig(String configFile) {

 try {
-  DocumentBuilder builder = DocumentBuilderFactory.newInstance()
-  .newDocumentBuilder();
+  DocumentBuilderFactory dbf =  
DocumentBuilderFactory.newInstance();

+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  DocumentBuilder builder = dbf.newDocumentBuilder();
   Document document = builder.parse(new InputSource(new  
StringReader(

   configFile)));



The only down side I can see to this is it doesn't offer very  
expressive conditional inclusion: the file is included if it's present  
otherwise fallback inclusions can be used.  It's also specific to XML  
files and obviously won't work for other types of configuration  
files.  However, it is simple and effective.



-Bryan




On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote:



Coincidentally, from http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ 
 :


Hadoop configuration files now support XInclude elements for  
including portions of another configuration file (HADOOP-4944). This  
mechanism allows you to make configuration files more modular and  
reusable.


So others are doing it, too.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot btal...@aeriagames.com
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 11:26:41 AM
Subject: Re: Replication master+slave

I see that Nobel's final comment in SOLR-1154 is that config files  
need to be
able to include snippets from external files.  In my limited  
testing, a simple

patch to enable XInclude support seems to work.



--- src/java/org/apache/solr/core/Config.java   (revision 774137)
+++ src/java/org/apache/solr/core/Config.java   (working copy)
@@ -100,8 +100,10 @@
 if (lis == null) {
   lis = loader.openConfig(name);
 }
-  javax.xml.parsers.DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
-  doc = builder.parse(lis);
+  javax.xml.parsers.DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  doc = dbf.newDocumentBuilder().parse(lis);

   DOMUtil.substituteProperties(doc, loader.getCoreProperties());
} catch (ParserConfigurationException e)  {



This allows a clause like this to include the contents of  
replication.xml if it

exists.  If it's not found an exception will be thrown.



href=http://localhost:8983/solr/corename/admin/file/?file=replication.xml 


xmlns:xi=http://www.w3.org/2001/XInclude;



If the file is optional and no exception should be thrown if the  
file is
missing, simply include a fallback action: in this case the  
fallback is empty

and does nothing.



href=http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml 


xmlns:xi=http://www.w3.org/2001/XInclude;




-Bryan




On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:

I was looking at the same problem, and had a discussion with  
Noble. You can

use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:

So how are people managing solrconfig.xml files which are largely  
the same

other than differences for replication?

I don't think it's a good thing to maintain two copies of the  
same file

and I'd like to avoid that.  Maybe enabling the XInclude feature in
DocumentBuilders would make it possible to modularize  
configuration files to

make this possible?




http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)



-Bryan





On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar wrote:

On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot

wrote:


For replication in 1.4, the wiki at
http://wiki.apache.org/solr/SolrReplication says that a node  
can be both

the master and a slave:

A node can act as both master and slave. In that case both the  
master and

slave configuration lists need to be present inside the
ReplicationHandler
requestHandler in the solrconfig.xml.

What does this mean?  Does the core then poll itself for updates?




No. This type of configuration is meant for repeaters. Suppose  
there are
slaves in multiple data-centers (say data center A and B). There  
is always

a
single master (say in A). One of the slaves in B is used as a  
master for

the
other slaves

Replication master+slave

2009-05-12 Thread Bryan Talbot
For replication in 1.4, the wiki at http://wiki.apache.org/solr/SolrReplication 
 says that a node can be both the master and a slave:


A node can act as both master and slave. In that case both the master  
and slave configuration lists need to be present inside the  
ReplicationHandler requestHandler in the solrconfig.xml.


What does this mean?  Does the core then poll itself for updates?

I'd like to have a single set of configuration files that are shared  
by masters and slaves and avoid duplicating configuration details in  
multiple files (one for master and one for slave) to ease management  
and failover.  Is this possible?


When I attempt to setup a multi server master-slave configuration and  
include both master and slave replication configuration options, I  
into some problems.  I'm  running a nightly build from May 7.



  requestHandler name=/replication class=solr.ReplicationHandler 
lst name=master
  str name=replicateAftercommit/str
/lst
lst name=slave
  str name=masterUrlhttp://master_core01:8983/solr/core01/ 
replication/str

  str name=pollInterval00:00:60/str
/lst
  /requestHandler


When the replication admin page (http://master_core01:8983/solr/core01/ 
admin/replication/index.jsp) is visited, the severe error show below  
appears in the solr log.  The server is otherwise idle so there is no  
reason all threads should be busy unless the replication code is  
getting itself into a loop.


What's the right way to do this?



May 11, 2009 8:01:22 PM org.apache.tomcat.util.threads.ThreadPool  
logFull
SEVERE: All threads (150) are currently busy, waiting. Increase  
maxThreads (150) or check the servlet status
May 11, 2009 8:01:41 PM org.apache.solr.handler.ReplicationHandler  
getReplicationDetails

WARNING: Exception while invoking a 'details' method on master
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java: 
218)
at java.io.BufferedInputStream.read(BufferedInputStream.java: 
237)
at  
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at  
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at  
org 
.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java: 
1116)
at  
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager 
$ 
HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java: 
1413)
at  
org 
.apache 
.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java: 
1973)
at  
org 
.apache 
.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java: 
1735)
at  
org 
.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java: 
1098)
at  
org 
.apache 
.commons 
.httpclient 
.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at  
org 
.apache 
.commons 
.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java: 
171)
at  
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 
397)
at  
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java: 
323)
at  
org 
.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java: 
183)
at  
org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java: 
178)
at  
org 
.apache 
.solr 
.handler 
.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:555)
at  
org 
.apache 
.solr 
.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java: 
147)
at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1330)
at  
org 
.apache.jsp.admin.replication.index_jsp.executeCommand(index_jsp.java: 
34)
at  
org.apache.jsp.admin.replication.index_jsp._jspService(index_jsp.java: 
208)
at  
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at  
org 
.apache 
.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:331)
at  
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:329)
at  
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
269)
at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at  
org 
.apache 
.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java: 
679)
at  
org 
.apache 
.catalina 

Re: Garbage Collectors

2009-04-16 Thread Bryan Talbot
If you're using java 5 or 6 jmap is a useful tool in tracking down  
memory leaks.


http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html

jmap -histo:live pid

will print a histogram of all live objects in the heap.  Start at the  
top and work your way down until you find something suspicious -- the  
trick is in knowing what is suspicious of course.



-Bryan




On Apr 16, 2009, at Apr 16, 3:40 PM, David Baker wrote:


Otis Gospodnetic wrote:

Personally, I'd start from scratch:
-Xmx -Xms...

-server is not even needed any more.

If you are not using Java 1.6, I suggest you do.

Next, I'd try to investigate why objects are not being cleaned up -  
this should not be happening in the first place.  Is Solr the only  
webapp running?



Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 


From: David Baker dav...@mate1inc.com
To: solr-user@lucene.apache.org
Sent: Thursday, April 16, 2009 3:33:18 PM
Subject: Garbage Collectors

I have an issue with garbage collection on our solr servers.  We  
have an issue where the  old generation  never  gets cleaned up on  
one of our servers.  This server has a little over 2 million  
records which are updated every hour or so.  I have tried the  
parallel GC and the concurrent GC.  The parallel seems more stable  
for us, but both end up running out of memory.  I have increased  
the memory allocated to the servers, but this just seems to delay  
the problem.  My question is, what are the suggested options for  
using the parallel GC.  Currently we are using something of this  
nature:


-server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX: 
+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m - 
XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr


I am new to solr and GC tuning, so any advice is appreciated.

Thanks for the reply, yes, solr is the only app running under this  
tomcat server. I will remove -server, and other options except the  
heap allocation options and see how it performs. Any suggestions on  
how to go about finding out why objects are not being cleaned up if  
these changes dont work?






Re: DataImporter : Java heap space

2009-04-15 Thread Bryan Talbot
I think there is a bug in the 1.4 daily builds of data import handler  
which is causing the batchSize parameter to be ignored.  This was  
probably introduced with more recent patches to resolve variables.


The affected code is in JdbcDataSource.java

String bsz = initProps.getProperty(batchSize);
if (bsz != null) {
  bsz = (String) context.getVariableResolver().resolve(bsz);
  try {
batchSize = Integer.parseInt(bsz);
if (batchSize == -1)
  batchSize = Integer.MIN_VALUE;
  } catch (NumberFormatException e) {
LOG.warn(Invalid batch size:  + bsz);
  }
}


The call to context.getVariableResolver().resolve(bsz) is returning  
null, leading to a NumberFormatException and the batchSize never being  
set to Integer.MIN_VALUE.  MySql won't use streaming result sets in  
this case which can lead to the OOM we're seeing.



If your log file contains this entry like mine does, you're being  
affected by this bug too.


Apr 15, 2009 1:21:58 PM  
org.apache.solr.handler.dataimport.JdbcDataSource init

WARNING: Invalid batch size: null



-Bryan




On Apr 13, 2009, at Apr 13, 11:48 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



DIH streams 1 row at a time.

DIH is just a component in Solr. Solr indexing also takes a lot of  
memory


On Tue, Apr 14, 2009 at 12:02 PM, Mani Kumar manikumarchau...@gmail.com 
 wrote:

Yes its throwing the same OOM error and from same place...
yes i will try increasing the size ... just curious : how this  
dataimport

works?

Does it loads the whole table into memory?

Is there any estimate about how much memory it needs to create  
index for 1GB

of data.

thx
mani

On Tue, Apr 14, 2009 at 11:48 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


On Tue, Apr 14, 2009 at 11:36 AM, Mani Kumar manikumarchau...@gmail.com

wrote:



Hi Shalin:
yes i tried with batchSize=-1 parameter as well

here the config i tried with

dataConfig

   dataSource type=JdbcDataSource batchSize=-1 name=sp
driver=com.mysql.jdbc.Driver
url=jdbc:mysql://localhost/mydb_development
user=root password=** /


I hope i have used batchSize parameter @ right place.



Yes that is correct. Did it still throw OOM from the same place?

I'd suggest you increase the heap and see what works for you. Also  
try

-server on the jvm.

--
Regards,
Shalin Shekhar Mangar.







--
--Noble Paul




Re: Changing to multicore

2009-01-28 Thread Bryan Talbot
I would think that using a servlet filter to rewrite the URL should be  
pretty strait forward.  You could write your own or use a tool like http://tuckey.org/urlrewrite/ 
 and just configure that.


Using something like this, I think the upgrade procedure could be:
- install rewrite filter to rewrite multi-core URL to non-multi-core  
URL for all solr instances.

- upgrade app to use multi-core URL
- upgrade solr instances to multi-core when convenient and remove  
rewrite filter




-Bryan




On Jan 28, 2009, at Jan 28, 7:17 AM, Jeff Newburn wrote:

We are moving from single core to multicore. We have a few servers  
that we

want to migrate one at a time to ensure that each one functions.  This
process is proving  difficult as there is no default core to allow the
application to talk to the solr servers uniformly (ie without a core  
name
during conversion).  Would it be possible to re-add the default core  
as a
configuration setting in solr.xml to allow for a smoother  
conversion?  Am I

missing a setting that would help with this process?

-Jeff




Re: Help with Solr 1.3 lockups?

2009-01-16 Thread Bryan Talbot
I think it's pretty easy to check if SOLR is alive.  Even from a shell  
script, a simple command like


curl -iIs --url http://solrhost/solr/select?start=0rows=0; | grep -c  
HTTP/1.1 200 OK


will return 1 if the response is an HTTP 200.  If the return is not 1,  
then there is a problem.  A load balancer or other tool can probably  
internalize the check and not need to fork processes like a shell  
script would, but the check can be the same.  This simply requests an  
HTTP HEAD (doesn't return any content) to for a fast executing query.   
In this case, the query with no q= specified seems to default to *:*  
when using dismax which is my default handler.



-Bryan




On Jan 15, 2009, at Jan 15, 2:13 PM, Stephen Weiss wrote:

I've been wondering about this one myself - most of the services we  
have installed work this way, if they crash out for whatever reason  
they restart automatically (Apache, MySQL, even the OS itself).   
Failures are detected and corrected by the load balancers and also  
in some cases by the machine itself (like with kernel panics).   But  
not SOLR, and I'm not quite sure what to do to get it there.  We use  
Jetty but it's the same story.  It's not like it fails out all that  
often, but when it does it will still respond to HTTP requests  
(because Jetty itself is still working), which makes it a lot harder  
to detect a failure... I've tried writing something for nagios but  
the problem is that most responses solr would give to a request vary  
depending on index updates, so it's not like I can just take a  
checksum and compare it - and even then, it would only really alert  
us to the problem, we'd still have to go in and restart everything  
(personally I don't enjoy restarting servers from my blackberry  
nearly as much as I should).


I'd have to come up with something that can intelligently interpret  
the response and decide if the server's still working properly or  
not, and the processing time on that alone might make it too  
inefficient to run every few seconds, but at least with that we'd be  
able to tell the cluster don't send anything to this server for  
now.  Is there some really obvious way to track if a particular  
servlet is still running properly (in either Tomcat or Jetty,  
because if Tomcat has this I'd switch) and restart the container if  
it's not?


Thanks!!

--
Steve

On Jan 15, 2009, at 1:57 PM, Jerome L Quinn wrote:



An even bigger problem is the fact that once Solr is wedged, it  
stays that
way until a human notices and restarts things.  The tomcat stays  
running

and there's no automatic detection that will either restart Solr, or
restart the Tomcat container.

Any suggestions on either front?

Thanks,
Jerry Quinn







Re: Solr - DataImportHandler - Large Dataset results ?

2008-12-12 Thread Bryan Talbot
It only supports streaming if properly enabled which is completely  
lame: http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html


 By default, ResultSets are completely retrieved and stored in  
memory. In most cases this is the most efficient way to operate, and  
due to the design of the MySQL network protocol is easier to  
implement. If you are working with ResultSets that have a large number  
of rows or large values, and can not allocate heap space in your JVM  
for the memory required, you can tell the driver to stream the results  
back one row at a time.


To enable this functionality, you need to create a Statement instance  
in the following manner:


stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
  java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

The combination of a forward-only, read-only result set, with a fetch  
size of Integer.MIN_VALUE serves as a signal to the driver to stream  
result sets row-by-row. After this any result sets created with the  
statement will be retrieved row-by-row.




-Bryan




On Dec 12, 2008, at Dec 12, 2:15 PM, Kay Kay wrote:


I am using MySQL. I believe (since MySQL 5) supports streaming.

On more about streaming - can we assume that when the database  
driver supports streaming , the resultset iterator is a forward  
directional iterator.


If , say the streaming size is 10K records and we are trying to  
retrieve a total of 100K records - what exactly happens when the  
threshold is reached , (say , the first 10K records were retrieved ).


Are the previous set of records thrown away and replaced in memory  
by the new batch of records.




--- On Fri, 12/12/08, Shalin Shekhar Mangar shalinman...@gmail.com  
wrote:

From: Shalin Shekhar Mangar shalinman...@gmail.com
Subject: Re: Solr - DataImportHandler - Large Dataset results ?
To: solr-user@lucene.apache.org
Date: Friday, December 12, 2008, 9:41 PM

DataImportHandler is designed to stream rows one by one to create Solr
documents. As long as your database driver supports streaming, you  
should be

fine. Which database are you using?

On Sat, Dec 13, 2008 at 2:20 AM, Kay Kay kaykay.uni...@yahoo.com  
wrote:



As per the example in the wiki -
http://wiki.apache.org/solr/DataImportHandler  - I am seeing the  
following

fragment.

dataSource driver=org.hsqldb.jdbcDriver
url=jdbc:hsqldb:/temp/example/ex user=sa /
  document name=products
  entity name=item query=select * from

item

  field column=ID name=id /
  field column=NAME name=name /
..
  /entity
/document
/dataSource

My scaled-down application looks very similar along these lines but  
where

my resultset is so big that it cannot fit within main memory by any

chance.


So I was planning to split this single query into multiple  
subqueries -

with another conditional based on the id . ( id  0 and id  100 ,

say ) .


I am curious if there is any way to specify another conditional  
clause ,

(splitData Column = id  batch=1 /,

where the column is supposed to
be an integer value) - and internally , the implementation could  
actually

generate the subqueries -

i) get the min , max of the numeric column , and send queries to the
database based on the batch size

ii) Add Documents for each batch and close the resultset .

This might end up putting more load on the database (but at least the
dataset would fit in the main memory ).

Let me know if anyone else had run into similar issues and how this  
was

encountered.








--
Regards,
Shalin Shekhar Mangar.