solr 1.3 how to parse "rich" documents

2010-11-11 Thread Nikola Garafolic

Hi,

I use solr 1.3 with patch for parsing rich documents, and when uploading 
for example pdf file, only thing I see in solr.log is following:


INFO: [] webapp=/solr path=/update/rich 
params={id=250&stream.type=pdf&fieldnames=id,name&commit=true&stream.fieldname=body&name=iphone+user+guide+pdf+iphone_user_guide.pdf} 
status=0 QTime=12656


solrconfig.xml contains the line:

 class="solr.RichDocumentRequestHandler" startup="lazy" />


What else am I missing?

Since I am running solr as standalone, I do not need to build it with 
ant, or?


Regards,
Nikola

--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr


Re: solr init.d script

2010-11-09 Thread Nikola Garafolic

On 11/09/2010 07:00 PM, Israel Ekpo wrote:

Yes.

I recommend running Solr via a servlet container.

It is much easier to manage compared to running it by itself.

On Tue, Nov 9, 2010 at 10:03 AM, Nikola Garafolic
wrote:


But in my case, that would make things more complex as I see it. Two 
jboss servers with solr as servlet container, and then I need the same 
data dir, right? I am now running single solr instance as cluster 
service, with data dir set to shared lun, that can be started on any of 
two hosts.


Can you explain my benefits with two solr instances via servlet, maybe 
more performance?


Regards,
Nikola

--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr




Re: solr init.d script

2010-11-09 Thread Nikola Garafolic
I  have two nodes running one jboss server each and using one (single) 
solr instance, thats how I run it for now.


Do you recommend running jboss with solr via servlet? Two jboss run in 
load-balancing for high availability purpose.


For now it seems to be ok.

On 11/09/2010 03:17 PM, Israel Ekpo wrote:

I think it would be a better idea to load solr via a servlet container like
Tomcat and then create the init.d script for tomcat instead.

http://wiki.apache.org/solr/SolrTomcat#Installing_Tomcat_6



--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr


Re: solr init.d script

2010-11-09 Thread Nikola Garafolic

Sorry, forgot to mention, Centos.
Thanks.

I have very similar script to this Centos one and I am missing status 
portion of the script.


On 11/09/2010 08:47 AM, Eric Martin wrote:

Er, what flavor?

RHEL / CentOS

#!/bin/sh

# Starts, stops, and restarts Apache Solr.
#
# chkconfig: 35 92 08
# description: Starts and stops Apache Solr

SOLR_DIR="/var/solr"
JAVA_OPTIONS="-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=mustard -jar start.jar"
LOG_FILE="/var/log/solr.log"
JAVA="/usr/bin/java"

case $1 in
 start)
 echo "Starting Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS 2>  $LOG_FILE&
 ;;
 stop)
 echo "Stopping Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS --stop
 ;;
 restart)
 $0 stop
 sleep 1
 $0 start
 ;;
 *)
 echo "Usage: $0 {start|stop|restart}">&2
 exit 1
 ;;
esac




Debian

http://xdeb.org/node/1213

__

Ubuntu

STEPS
Type in the following command in TERMINAL to install nano text editor.
sudo apt-get install nano
Type in the following command in TERMINAL to add a new script.
sudo nano /etc/init.d/solr
TERMINAL will display a new page title "GNU nano 2.0.x".
Paste the below script in this TERMINAL window.
#!/bin/sh -e

# Starts, stops, and restarts solr

SOLR_DIR="/apache-solr-1.4.0/example"
JAVA_OPTIONS="-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar"
LOG_FILE="/var/log/solr.log"
JAVA="/usr/bin/java"

case $1 in
 start)
 echo "Starting Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS 2>  $LOG_FILE&
 ;;
 stop)
 echo "Stopping Solr"
 cd $SOLR_DIR
 $JAVA $JAVA_OPTIONS --stop
 ;;
 restart)
 $0 stop
 sleep 1
 $0 start
 ;;
 *)
 echo "Usage: $0 {start|stop|restart}">&2
 exit 1
 ;;
esac
Note: In above script you might have to replace /apache-solr-1.4.0/example
with appropriate directory name.
Press CTRL-X keys.
Type in Y
When ask File Name to Write press ENTER key.
You're now back to TERMINAL command line.

Type in the following command in TERMINAL to create all the links to the
script.
sudo update-rc.d solr defaults
Type in the following command in TERMINAL to make the script executable.
sudo chmod a+rx /etc/init.d/solr
To test. Reboot your Ubuntu Server.
Wait until Ubuntu Server reboot is completed.
Wait 2 minutes for Apache Solr to startup.
Using your internet browser go to your website and try a Solr search.



-Original Message-
From: Nikola Garafolic [mailto:nikola.garafo...@srce.hr]
Sent: Monday, November 08, 2010 11:42 PM
To: solr-user@lucene.apache.org
Subject: solr init.d script

Hi,

Does anyone have some kind of init.d script for solr, that can start,
stop and check solr status?




--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr


solr init.d script

2010-11-08 Thread Nikola Garafolic

Hi,

Does anyone have some kind of init.d script for solr, that can start, 
stop and check solr status?


--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr


Re: indexing rich documents

2010-07-13 Thread Nikola Garafolic

On 07/13/2010 02:11 PM, satya swaroop wrote:

Hi all,
  i am new to solr and followed with the wiki and got the solr admin
run sucessfully. It is good going for xml files. But to index the rich
documents i am unable to get it. I followed wiki to make the richer
documents also,  but i didnt get it.The error comes when i send an pdf/html
file is a lazy error. can anyone give some detail description about how to
make richer documents indexable
  i use tomcat and working in ubuntu. The home directory for solr is
/opt/solr/example and catalina home is /opt/tomcat6.


thanks&  regards,
  swaroop



I also have exact problem, but my enviroment is different.
I use Jboss AS 5.1.0 GA with HornetQ 2.0.0 and solr 1.3.0 patched to 
support indexing rich text documents.
I copied example/solr directory to conf directory on Jboss, and solr.war 
to deploy directory on Jboss. Everything seem to work except indexing 
rich text documents. I am using default schema.xml that is included in 
example/solr/conf directory.

I use all that for gss ( http://code.google.com/p/gss/ ).
Is there some generic schema.xml file that should work out of the box?
Guys from gss send me some other schema.xml file, but I get "undefined 
field text" error in log. With default schema.xml file (that came with 
solr) I get "undefined field 'body'".


Attached is file I got from guys at gss project, that is also not 
working for me.


Regards,
Nikola

--
Nikola Garafolic
SRCE, Sveucilisni racunski centar
tel: +385 1 6165 804
email: nikola.garafo...@srce.hr






  

  




































  




  

  




  








  
  







  





  







  




  



  




  








  


 
 

 


 
   


   
   
   
 

 
 id

 
 body