RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
Thanks for your reply,
I need the Federated Search. You mean this is not yet 
supported out of the box. So I have a question that 
in this situation what can Collection Distribution used for?

Jarvis

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 1:47 PM
To: solr-user@lucene.apache.org
Subject: Re: How can i make a distribute search on Solr?

 
 So it means that distributed search is not a basic component in Solr
project.
 

I think you just need load balancing.  Solr is not a load balancer, you 
need to find something that works for you and configure that elsewhere. 
  Solr works fine without persistent connections, so simple round robin 
DNS but it works find.

Depending on your usage/loads/requirements it may or may not make sense 
to have your master DB in the mix.

Stu is referring to Federated Search - where each index has some of the 
data and results are combined before they are returned.  This is not yet 
supported out of the box

ryan



Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
Hi ,

Product : Solr  (Embedded)Version : 1.2

Problem Description :
While trying to add and search over the index, we are stumbling on this
error again and again.
Do note that the SolrCore is committed and closed suitably in our Embedded
Solr.

Error (StackTrace) :
Sep 19, 2007 9:41:41 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.FSDirectory@/data/pub/index: files:
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(
SegmentInfos.java:516)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
at org.apache.solr.search.SolrIndexSearcher.init(
SolrIndexSearcher.java:87)
at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122)
at com.serendio.diskoverer.core.entextor.CreateSolrIndex.init(
CreateSolrIndex.java:70)
at org.apache.jsp.AddToPubIndex_jsp._jspService
(AddToPubIndex_jsp.java:57)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java
:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at org.apache.jasper.servlet.JspServletWrapper.service(
JspServletWrapper.java:393)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(
JspServlet.java:320)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:290)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:175)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:263)
at org.apache.coyote.http11.Http11Processor.process(
Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:584)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(
JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Extra Information :
/data/pub is the Solr Home.
/data/pub/index contains the index.
CreateSolrIndex.java is our program that creates and searches over the index

Regards,
Venkat
--


multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson


TestJettyLargeVolume.java
Description: Binary data
we were doing some performance testing for the updating aspects of solr and ran into what seems to be a large problem.  we're creating small documents with an id and one field of 1 term only submitting them in batches of 200 with commits every 5000 docs.  when we run the client with 1 thread everything is fine.  when we run it win 1 threads things go south (stack trace is below).  i've attached the junit test which shows the problem.  this happens on both a mac and a pc and when running solr in both jetty and tomcat.  i'll create a junit issue if necessary but i thought i'd see if anyone else had run into this problem first.  (output from junit test)Started thread: 0Started thread: 1org.apache.solr.common.SolrException: Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjeCurrent_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayjerequest: http://localhost:8983/solr/update?wt=xmlversion=2.2	at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:230)	at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:199)	at org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:46)	at org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:61)	at org.apache.solr.client.solrj.embedded.TestJettyLargeVolume$DocThread.run(TestJettyLargeVolume.java:69)Exception in thread 

Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
I am using Tomcat 6 and Solr 1.2 on a Windows 2003 server using the
following java code.   I am trying to index pdf files, and I'm
constantly getting errors on larger files (the same ones).  

 

  SolrServer server = new CommonsHttpSolrServer(solrPostUrl);

  SolrInputDocument addDoc = new SolrInputDocument();

  addDoc.addField(url, url);

  addDoc.addField(site, site);

  addDoc.addField(author, author);

  addDoc.addField(title, title);

  addDoc.addField(subject, subject);

  addDoc.addField(keywords, keywords);

  addDoc.addField(text, docText);

  UpdateRequest ur = new UpdateRequest();

  ur.setAction( UpdateRequest.ACTION.COMMIT, false, false );  //Auto
Commits on Update...

  ur.add(addDoc);

  UpdateResponse rsp = ur.process(server);

 

The java error I received is: class
org.apache.solr.client.solrj.SolrServerException
(java.net.SocketException: Software caused connection abort: recv
failed)

Tomcat Log:

SEVERE: java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

at
java.net.SocketInputStream.read(SocketInputStream.java:129)

at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.ja
va:716)

at
org.apache.coyote.http11.InternalInputBuffer$InputStreamInputBuffer.doRe
ad(InternalInputBuffer.java:746)

at
org.apache.coyote.http11.filters.IdentityInputFilter.doRead(IdentityInpu
tFilter.java:116)

at
org.apache.coyote.http11.InternalInputBuffer.doRead(InternalInputBuffer.
java:675)

at org.apache.coyote.Request.doRead(Request.java:428)

at
org.apache.catalina.connector.InputBuffer.realReadBytes(InputBuffer.java
:297)

at
org.apache.tomcat.util.buf.ByteChunk.substract(ByteChunk.java:405)

at
org.apache.catalina.connector.InputBuffer.read(InputBuffer.java:312)

at
org.apache.catalina.connector.CoyoteInputStream.read(CoyoteInputStream.j
ava:193)

at
sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:264)

at
sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)

at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:158)

at
java.io.InputStreamReader.read(InputStreamReader.java:167)

at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2972)

at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)

at
org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384)

at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)

at
org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)

at
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequest
Handler.java:332)

at
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestH
andler.java:162)

at
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpd
ateRequestHandler.java:84)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB
ase.java:77)

at
org.apache.solr.core.SolrCore.execute(SolrCore.java:658)

at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja
va:191)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j
ava:159)

at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica
tionFilterChain.java:235)

at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt
erChain.java:206)

at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv
e.java:233)

at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv
e.java:175)

at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java
:128)

at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java
:102)

at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.
java:109)

at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2
63)

at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84
4)

at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
Http11Protocol.java:584)

at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)

 

This happens when I try to index a field containing the contents of the
PDF file.  It's string length is 189002.  If I only do a substring on
the field of say length 15, it usually will work.  Does anyone have
any idea on why this might be happening?

 

I have had this and other files index correctly 

Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Bill Au
What files are there in your /data/pub/index directory?

Bill

On 9/19/07, Venkatraman S [EMAIL PROTECTED] wrote:

 Hi ,

 Product : Solr  (Embedded)Version : 1.2

 Problem Description :
 While trying to add and search over the index, we are stumbling on this
 error again and again.
 Do note that the SolrCore is committed and closed suitably in our Embedded
 Solr.

 Error (StackTrace) :
 Sep 19, 2007 9:41:41 AM org.apache.catalina.core.StandardWrapperValveinvoke
 SEVERE: Servlet.service() for servlet jsp threw exception
 java.io.FileNotFoundException: no segments* file found in
 org.apache.lucene.store.FSDirectory@/data/pub/index: files:
 at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(
 SegmentInfos.java:516)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:185)
 at org.apache.lucene.index.IndexReader.open(IndexReader.java:148)
 at org.apache.solr.search.SolrIndexSearcher.init(
 SolrIndexSearcher.java:87)
 at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122)
 at com.serendio.diskoverer.core.entextor.CreateSolrIndex.init(
 CreateSolrIndex.java:70)
 at org.apache.jsp.AddToPubIndex_jsp._jspService
 (AddToPubIndex_jsp.java:57)
 at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java
 :70)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
 at org.apache.jasper.servlet.JspServletWrapper.service(
 JspServletWrapper.java:393)
 at org.apache.jasper.servlet.JspServlet.serviceJspFile(
 JspServlet.java:320)
 at org.apache.jasper.servlet.JspServlet.service(JspServlet.java
 :266)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
 ApplicationFilterChain.java:290)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(
 ApplicationFilterChain.java:206)
 at org.apache.catalina.core.StandardWrapperValve.invoke(
 StandardWrapperValve.java:233)
 at org.apache.catalina.core.StandardContextValve.invoke(
 StandardContextValve.java:175)
 at org.apache.catalina.core.StandardHostValve.invoke(
 StandardHostValve.java:128)
 at org.apache.catalina.valves.ErrorReportValve.invoke(
 ErrorReportValve.java:102)
 at org.apache.catalina.core.StandardEngineValve.invoke(
 StandardEngineValve.java:109)
 at org.apache.catalina.connector.CoyoteAdapter.service(
 CoyoteAdapter.java:263)
 at org.apache.coyote.http11.Http11Processor.process(
 Http11Processor.java:844)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
 Http11Protocol.java:584)
 at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(
 JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)

 Extra Information :
 /data/pub is the Solr Home.
 /data/pub/index contains the index.
 CreateSolrIndex.java is our program that creates and searches over the
 index

 Regards,
 Venkat
 --



Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
Quite inetersting actually (this is for 5 documents that were indexed) :

_0.fdt  _0.prx  _1.fnm  _1.tis  _2.nrm  _3.fdx  _3.tii  _4.frq  segments.gen
_0.fdx  _0.tii  _1.frq  _2.fdt  _2.prx  _3.fnm  _3.tis  _4.nrm  segments_6
_0.fnm  _0.tis  _1.nrm  _2.fdx  _2.tii  _3.frq  _4.fdt  _4.prx
_0.frq  _1.fdt  _1.prx  _2.fnm  _2.tis  _3.nrm  _4.fdx  _4.tii
_0.nrm  _1.fdx  _1.tii  _2.frq  _3.fdt  _3.prx  _4.fnm  _4.tis


On 9/19/07, Bill Au [EMAIL PROTECTED] wrote:

 What files are there in your /data/pub/index directory?

 Bill

 On 9/19/07, Venkatraman S [EMAIL PROTECTED] wrote:
 
  Hi ,
 
  Product : Solr  (Embedded)Version : 1.2
 
  Problem Description :
  While trying to add and search over the index, we are stumbling on this
  error again and again.
  Do note that the SolrCore is committed and closed suitably in our
 Embedded
  Solr.
 
  Error (StackTrace) :
  Sep 19, 2007 9:41:41 AM
 org.apache.catalina.core.StandardWrapperValveinvoke
  SEVERE: Servlet.service() for servlet jsp threw exception
  java.io.FileNotFoundException: no segments* file found in
  org.apache.lucene.store.FSDirectory@/data/pub/index: files:
  at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(
  SegmentInfos.java:516)
  at org.apache.lucene.index.IndexReader.open(IndexReader.java
 :185)
  at org.apache.lucene.index.IndexReader.open(IndexReader.java
 :148)
  at org.apache.solr.search.SolrIndexSearcher.init(
  SolrIndexSearcher.java:87)
  at org.apache.solr.core.SolrCore.newSearcher(SolrCore.java:122)
  at com.serendio.diskoverer.core.entextor.CreateSolrIndex.init(
  CreateSolrIndex.java:70)
  at org.apache.jsp.AddToPubIndex_jsp._jspService
  (AddToPubIndex_jsp.java:57)
  at org.apache.jasper.runtime.HttpJspBase.service(
 HttpJspBase.java
  :70)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
  at org.apache.jasper.servlet.JspServletWrapper.service(
  JspServletWrapper.java:393)
  at org.apache.jasper.servlet.JspServlet.serviceJspFile(
  JspServlet.java:320)
  at org.apache.jasper.servlet.JspServlet.service(JspServlet.java
  :266)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
  at
  org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
  ApplicationFilterChain.java:290)
  at org.apache.catalina.core.ApplicationFilterChain.doFilter(
  ApplicationFilterChain.java:206)
  at org.apache.catalina.core.StandardWrapperValve.invoke(
  StandardWrapperValve.java:233)
  at org.apache.catalina.core.StandardContextValve.invoke(
  StandardContextValve.java:175)
  at org.apache.catalina.core.StandardHostValve.invoke(
  StandardHostValve.java:128)
  at org.apache.catalina.valves.ErrorReportValve.invoke(
  ErrorReportValve.java:102)
  at org.apache.catalina.core.StandardEngineValve.invoke(
  StandardEngineValve.java:109)
  at org.apache.catalina.connector.CoyoteAdapter.service(
  CoyoteAdapter.java:263)
  at org.apache.coyote.http11.Http11Processor.process(
  Http11Processor.java:844)
  at
  org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(
  Http11Protocol.java:584)
  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(
  JIoEndpoint.java:447)
  at java.lang.Thread.run(Thread.java:619)
 
  Extra Information :
  /data/pub is the Solr Home.
  /data/pub/index contains the index.
  CreateSolrIndex.java is our program that creates and searches over the
  index
 
  Regards,
  Venkat
  --
 




--


Re: multithread update client causes exceptions and dropped documents

2007-09-19 Thread Will Johnson
one other note.  the errors pop up when running against the 1.3 trunk
but do not appear to happen when run against 1.2.

- will

On 9/19/07, Will Johnson [EMAIL PROTECTED] wrote:





 we were doing some performance testing for the updating aspects of solr and
 ran into what seems to be a large problem.  we're creating small documents
 with an id and one field of 1 term only submitting them in batches of 200
 with commits every 5000 docs.  when we run the client with 1 thread
 everything is fine.  when we run it win 1 threads things go south (stack
 trace is below).  i've attached the junit test which shows the problem.
 this happens on both a mac and a pc and when running solr in both jetty and
 tomcat.  i'll create a junit issue if necessary but i thought i'd see if
 anyone else had run into this problem first.

 (output from junit test)
 Started thread: 0
 Started thread: 1
 org.apache.solr.common.SolrException:
 Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje

 Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandlerhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje

 request:
 http://localhost:8983/solr/update?wt=xmlversion=2.2
  at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:230)
  at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:199)
  at
 org.apache.solr.client.solrj.impl.BaseSolrServer.add(BaseSolrServer.java:46)
  at
 

Re: multithread update client causes exceptions and dropped documents

2007-09-19 Thread Ryan McKinley

Can you start a JIRA issue and attach the patch?

I have not seen this happen, but I bet it is caused by something from:
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.ext.subversion:subversion-commits-tabpanel

Can we add that test to trunk?  By default it does not need to be a long 
running test, but its nice to have in there so we can twiddle it for 
specific testing.


thanks
ryan


Will Johnson wrote:

one other note.  the errors pop up when running against the 1.3 trunk
but do not appear to happen when run against 1.2.

- will

On 9/19/07, Will Johnson [EMAIL PROTECTED] wrote:





we were doing some performance testing for the updating aspects of solr and
ran into what seems to be a large problem.  we're creating small documents
with an id and one field of 1 term only submitting them in batches of 200
with commits every 5000 docs.  when we run the client with 1 thread
everything is fine.  when we run it win 1 threads things go south (stack
trace is below).  i've attached the junit test which shows the problem.
this happens on both a mac and a pc and when running solr in both jetty and
tomcat.  i'll create a junit issue if necessary but i thought i'd see if
anyone else had run into this problem first.

(output from junit test)
Started thread: 0
Started thread: 1
org.apache.solr.common.SolrException:
Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandl

erhandleServletHandlerjava365__at_orgmortbayjettysecuritySecurityHandlerhandleSecurityHandlerjava216__at_orgmortbayjettyservletSessionHandlerhandleSessionHandlerjava181__at_orgmortbayjettyhandlerContextHandlerhandleContextHandlerjava712__at_orgmortbayjettywebappWebAppContexthandleWebAppContextjava405__at_orgmortbayjettyhandlerContextHandlerCollectionhandleContextHandlerCollectionjava211__at_orgmortbayjettyhandlerHandlerCollectionhandleHandlerCollectionjava114__at_orgmortbayjettyhandlerHandlerWrapperhandleHandlerWrapperjava139__at_orgmortbayjettyServerhandleServerjava285__at_orgmortbayjettyHttpConnectionhandleRequestHttpConnectionjava502__at_orgmortbayjettyHttpConnection$RequestHandlercontentHttpConnectionjava835__at_orgmortbayjettyHttpParserparseNextHttpParserjava641__at_orgmortbayjettyHttpParserparseAvailableHttpParserjava208__at_orgmortbayje


Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__javalangIllegalStateException_Current_state_is_not_among_the_states_START_ELEMENT__ATTRIBUTEvalid_for_getAttributeCount__at_comsunorgapachexercesinternalimplXMLStreamReaderImplgetAttributeCountXMLStreamReaderImpljava598__at_orgapachesolrhandlerXmlUpdateRequestHandlerreadDocXmlUpdateRequestHandlerjava335__at_orgapachesolrhandlerXmlUpdateRequestHandlerprocessUpdateXmlUpdateRequestHandlerjava181__at_orgapachesolrhandlerXmlUpdateRequestHandlerhandleRequestBodyXmlUpdateRequestHandlerjava109__at_orgapachesolrhandlerRequestHandlerBasehandleRequestRequestHandlerBasejava78__at_orgapachesolrcoreSolrCoreexecuteSolrCorejava804__at_orgapachesolrservletSolrDispatchFilterexecuteSolrDispatchFilterjava193__at_orgapachesolrservletSolrDispatchFilterdoFilterSolrDispatchFilterjava161__at_orgmortbayjettyservletServletHandler$CachedChaindoFilterServletHandlerjava1089__at_orgmortbayjettyservletServletHandl


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley [EMAIL PROTECTED] wrote:

 Stu is referring to Federated Search - where each index has some of the 
 data and results are combined before they are returned.  This is not yet 
 supported out of the box

Maybe this is related. How does this compare to the map-reduce functionality in 
Nutch/Hadoop ? 
cheers,
B

_
{Beto|Norberto|Numard} Meijome

With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. 
It is hard to be sure where they are going to land, and it could be dangerous 
sitting under them as they fly overhead.
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote:
 On Wed, 19 Sep 2007 01:46:53 -0400
 Ryan McKinley [EMAIL PROTECTED] wrote:

  Stu is referring to Federated Search - where each index has some of the

It really should be Distributed Search I think (my mistake... I
started out calling it Federated).  I think Federated search is more
about combining search results from different data sources.

  data and results are combined before they are returned.  This is not yet
  supported out of the box

 Maybe this is related. How does this compare to the map-reduce functionality 
 in Nutch/Hadoop ?

map-reduce is more for batch jobs.  Nutch only uses map-reduce for
parallel indexing, not searching.

-Yonik


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley


I have had this and other files index correctly using a different
combination version of Tomcat/Solr without any problem (using similar
code, I re-wrote it because I thought it would be better to use Solrj).
I get the same error whether I use a simple StringBuilder to created the
add manually or if I use Solrj.  I have manually encoded each field
before passing it in to the add function as well, so I don't believe it
is a content problem.   I have tried to change every setting in Tomcat
and Solr that I can think of, but I'm newer to both of them.  



So it works if you build an XML file with the same content and send it 
to the server using the example post.sh/post.jar tool?


Have you tried messing with the connection settings?
 SolrServer server = new CommonsHttpSolrServer( url );
  ((CommonsHttpSolrServer)server).setConnectionTimeout(5);
  ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
  ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

a timeout of 5ms is probably too short...


ryan


Re: How can i make a distribute search on So lr?

2007-09-19 Thread Stu Hood
Nutch implements federated search separately from their index generation.

My understanding is that MapReduce jobs generate the indexes (Nutch calls them 
segments) from raw data that has been downloaded, and then makes them available 
to be searched via remote procedure calls. Queries never pass through MapReduce 
in any shape or form, only the raw data and indexes.

If you take a look at the org.apache.nutch.searcher.DistributedSearch class, 
specifically the #Client.search method, you can see how they handle the actual 
federation of results.

Thanks,
Stu


-Original Message-
From: Norberto Meijome 
Sent: Wednesday, September 19, 2007 10:23am
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley  wrote:

 Stu is referring to Federated Search - where each index has some of the 
 data and results are combined before they are returned.  This is not yet 
 supported out of the box

Maybe this is related. How does this compare to the map-reduce functionality in 
Nutch/Hadoop ? 
cheers,
B

_
{Beto|Norberto|Numard} Meijome

With sufficient thrust, pigs fly just fine. However, this is not necessarily a 
good idea. 
It is hard to be sure where they are going to land, and it could be dangerous 
sitting under them as they fly overhead.
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley

Daley, Kristopher M. wrote:

I have tried changing those settings, for example, as:

SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
((CommonsHttpSolrServer)server).setConnectionTimeout(60);
((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

However, still no luck.  



Have you tried anything larger then 60?  60ms is not long...

try 1 (10s) and see if it works.



Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley
I'm stabbing in the dark here, but try fiddling with some of the other 
connection settings:


 getConnectionManager().getParams().setSendBufferSize( big );
 getConnectionManager().getParams().setReceiveBufferSize( big );

http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apache/commons/httpclient/params/HttpConnectionManagerParams.html




Daley, Kristopher M. wrote:

I tried 1 and 6, same result.

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 11:18 AM

To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

Daley, Kristopher M. wrote:

I have tried changing those settings, for example, as:

SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
((CommonsHttpSolrServer)server).setConnectionTimeout(60);
((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

However, still no luck.  



Have you tried anything larger then 60?  60ms is not long...

try 1 (10s) and see if it works.






RE: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Daley, Kristopher M.
Ok, I'll try to play with those.  Any suggestion on the size?

Something else that is very interesting is that I just tried to do an
aggregate add of a bunch of docs, including the one that always returned
the error.

I called a function to create a SolrInputDocument and return it.  I then
did the following:

CollectionSolrInputDocument docs = new ArrayListSolrInputDocument();
SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
UpdateRequest ur = new UpdateRequest();
ur.setAction( UpdateRequest.ACTION.COMMIT, false, false );  //Auto
Commits on Update...
ur.add(docs);
UpdateResponse rsp = ur.process(server);

In doing this, the program simply hangs after the last command.  If I
let it sit there for an amount of time, it eventually returns with the
error: class org.apache.solr.client.solrj.SolrServerException
(java.net.SocketException: Connection reset by peer: socket write error)

However, if I go to the tomcat server and restart it after I have issued
the process command, the program returns and the documents are all
posted correctly!

Very strange behavioram I somehow not closing the connection
properly?  

 
-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 11:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files

I'm stabbing in the dark here, but try fiddling with some of the other 
connection settings:

  getConnectionManager().getParams().setSendBufferSize( big );
  getConnectionManager().getParams().setReceiveBufferSize( big );

http://jakarta.apache.org/httpcomponents/httpclient-3.x/apidocs/org/apac
he/commons/httpclient/params/HttpConnectionManagerParams.html




Daley, Kristopher M. wrote:
 I tried 1 and 6, same result.
 
 -Original Message-
 From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
 Sent: Wednesday, September 19, 2007 11:18 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Index/Update Problems with Solrj/Tomcat and Larger Files
 
 Daley, Kristopher M. wrote:
 I have tried changing those settings, for example, as:

 SolrServer server = new CommonsHttpSolrServer(solrPostUrl);
 ((CommonsHttpSolrServer)server).setConnectionTimeout(60);
 ((CommonsHttpSolrServer)server).setDefaultMaxConnectionsPerHost(100);
 ((CommonsHttpSolrServer)server).setMaxTotalConnections(100);

 However, still no luck.  

 
 Have you tried anything larger then 60?  60ms is not long...
 
 try 1 (10s) and see if it works.
 
 



Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Laurent Hoss

Hi

We want to (mis)use facet search to get the number of (unique) field 
values appearing in a document resultset.
I thought  facet search perfect for this, because it already gives me 
all the (unique) field values.
But for us to be used for this special problem, we don't want all the 
values listed in response as there might be over 1 and we don't need 
the values at all, just the count of how many!


I looked at
http://wiki.apache.org/solr/SimpleFacetParameters
and hoped to find a parameter like
facet.sizeOnly = true
(or facet.showSize=true  , combined with facet.limit=1 or other small value)

Would you accept a patch with such a feature ?

It should probably be relatively easy, though not sure if fits into the 
concept of facets..


I looked at the code, maybe  add an extra Value to returned NamedList of 
getFacetCounts() in SimpleFacets ?!


ps: Other user having same request AFAIU :
http://www.nabble.com/showing--range-facet-example-%3D-by-Range-%28-1-to-1000-%29-t3660704.html#a10229069

thanks,

Laurent Hoss   






Re: Select distinct in Solr

2007-09-19 Thread Ryan McKinley

Lance Norskog wrote:

I believe I saw in the Javadocs for Lucene that there is the ability to
return the unique values for one field for a search, rather than each
record. Is it possible to add this feature to Solr?  It is the equivalent of
'select distinct' in SQL.
 


Look into faceting:
http://wiki.apache.org/solr/SimpleFacetParameters

or maybe the Luke request handler:
http://wiki.apache.org/solr/LukeRequestHandler

ryan



useColdSearcher = false... not working in 1.2?

2007-09-19 Thread Adam Goldband
Anyone else using this, and finding it not working in Solr 1.2?  Since
we've got an automated release process, I really need to be able to have
the appserver not see itself as done warming up until the firstSearcher
is ready to go... but with 1.2 this no longer seems to be the case.

adam


Re: useColdSearcher = false... not working in 1.2?

2007-09-19 Thread Yonik Seeley
On 9/19/07, Adam Goldband [EMAIL PROTECTED] wrote:
 Anyone else using this, and finding it not working in Solr 1.2?  Since
 we've got an automated release process, I really need to be able to have
 the appserver not see itself as done warming up until the firstSearcher
 is ready to go... but with 1.2 this no longer seems to be the case.

I took a quick peek at the code, and it should still work (it's pretty simple).
false is also the default.

How are you determining that it isn't working?

-Yonik


Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Yonik Seeley
On 9/19/07, Laurent Hoss [EMAIL PROTECTED] wrote:
 We want to (mis)use facet search to get the number of (unique) field
 values appearing in a document resultset.

We have paging of facets, so just like normal search results, it does
make sense to list the total number of facets matching.

The main problem with implementing this is trying to figure out where
to put the info in a backward compatible manner.  Here is how the info
is currently returned (JSON format):

 facet_fields:{
cat:[
   camera,1,
   card,2,
   connector,2,
   copier,1,
   drive,2
  ]
},


Unfortunately, there's not a good place to put this extra info without
older clients choking on it.  Within cat there should have been
another element called values or something... then we could easily
add extra fields like nvalues:

cat:{
 nvalues:5042,
 values:[
   camera,1,
   card,2,
   connector,2,
   copier,1,
   drive,2
  ]
 }

-Yonik


Exact phrase highlighting

2007-09-19 Thread Marc Bechler

Hi out of there,

I just walked through the mailing list archive, but I did not find an 
appropriate answer for phrase highlighting.


I do not have any highlighting section (and no dismax handler 
definition) in solrconfig.xml. This way (AFAIK :-)), the standard lucene 
query syntax should be supported in it's full functionality. But, in 
this case double quoting the search expressions does not have any effect 
on highlighting, i.e.


Assume we have the following text (of field type text)
It is hard work to do the hard complex work

A query for
hard work
(with the double quotes) results the highlighted section
It is emhard/em emwork/em to do the emhard/em complex 
emwork/em


Although I would guess that the correct answer should be
 It is emhard work/em to do the hard complex work

Does anyone of the SOLR experts have a good answer for me? (I guess that 
I still did not understand the functional relationship between 
highlighting, query specification and index specification...)


Thanks for your help

 marc


Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Chris Hostetter

: Product : Solr  (Embedded)Version : 1.2


: java.io.FileNotFoundException: no segments* file found in
: org.apache.lucene.store.FSDirectory@/data/pub/index: files:

According to that, the FSDirectory was empty when it ws opened (a file 
list is suppose to come after that files:  part)

you imply that you are building your index using embedded solr, but based 
on your stack trace it seems you are using Solr in a servlet container ... 
i assume to search the index you've already built?

Is the embeddd core completley closed before your servlet cotainer running 
Solr is started?  what does hte directly list look like in between the 
finish of A and the start of B?




-Hoss



Re: Exact phrase highlighting

2007-09-19 Thread Mike Klaas

On 19-Sep-07, at 1:12 PM, Marc Bechler wrote:


Hi out of there,

I just walked through the mailing list archive, but I did not find  
an appropriate answer for phrase highlighting.


I do not have any highlighting section (and no dismax handler  
definition) in solrconfig.xml. This way (AFAIK :-)), the standard  
lucene query syntax should be supported in it's full functionality.  
But, in this case double quoting the search expressions does not  
have any effect on highlighting, i.e.


Assume we have the following text (of field type text)
It is hard work to do the hard complex work

A query for
hard work
(with the double quotes) results the highlighted section
It is emhard/em emwork/em to do the emhard/em complex  
emwork/em


Although I would guess that the correct answer should be
 It is emhard work/em to do the hard complex work

Does anyone of the SOLR experts have a good answer for me? (I guess  
that I still did not understand the functional relationship between  
highlighting, query specification and index specification...)


It currently is not supported by Solr.  There is work in lucene that  
supports this (see https://issues.apache.org/jira/browse/LUCENE-794? 
page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
tabpanel#action_12526803), but it is currently not integrated.


It would make a great project to get one's hands dirty contributing,  
though :)


-Mike


Re: DisMax queries referencing undefined fields

2007-09-19 Thread Chris Hostetter

: I noticed that the field list (fl) parameter ignores field names that it
: cannot locate, while the query fields (qf) parameter throws an exception
: when fields cannot be located.  Is there any way to override this behavior and
: have qf also ignore fields it cannot find?

Those parameters are radically different.  FL isn't evaluated untill after 
a query is executed and it's time to return documents ... just because the 
current range of documents being returned doesn't have a value doesn't 
mean there is a problem with the FL -- other documents in the same DocSet 
might have those values.  It's not that Solr ignores fields in the fl 
that it can't locate, as it is that Solr tests each field a document to be 
returned has, and only returns it if the field is in the FL. 

In theory, field names in the FL should be tested to see if a matching 
field or dynamic field exists that would match and generate a 
warning/error if it's not -- i would consider that an FL bug.

The semantics of QF follow directly from the semantics of the standard 
query parser: if you tell it to query against a field which does not exist 
for any document, then something is wrong with the request.  Unlike the 
FL case (which is lazy for not checking that the feild exists) dismax 
has to check each field because it needs to know how to analyze the input 
for every field in the QF -- if the field doesn't exist, it can't do that.

: This would be pretty helpful for us, as we're going to have a large number of
: dynamic, user-specific fields defined in our schema.  These fields will have
: canonical name formats (e.g. userid-comment), but they may not be defined for
: every document.  In fact, some fields may be defined for no documents, which I
: gather would be the ones that would throw exceptions.  It would be nice to
: provide solr a set of fields that could be searched and have it use the subset
: of those fields that exist.

i supose it would be possible to make an option for dismax to ignore any 
field it can't find, but that would be fairly kludgy and would introduce 
some really confusing edge cases (ie: what happens if non of the fields in 
QF can be found)

A better option would probably be to use something like this from the 
sample schema.xml...

   !-- uncomment the following to ignore any fields that don't already 
match an existing 
field name or dynamic field, rather than reporting them as an 
error. 
alternately, change the type=ignored to some other type e.g. 
text if you want 
unknown fields indexed and/or stored by default -- 
   !-- dynamicField name=* type=ignored / --


...then any field name you want will work regardless of wether you are 
using dismax or hte standard request handler.


Hmmm: it might be better though if the ignored field type was a 
TextField with an Analyzer that produced no tokens ... then it would drop 
out of the query completely ... anyone want to submit a NoOpAnalyzer :)  


-Hoss



Re: Exact phrase highlighting

2007-09-19 Thread Mike Klaas

On 19-Sep-07, at 2:39 PM, Marc Bechler wrote:


Hi Mike,

thanks for the quick response.

 It would make a great project to get one's hands dirty  
contributing, though :)


... sounds like giving a broad hint ;-) Sounds challenging...


I'm not sure about that--it is supposed to be a drop-in replacement  
for Highlighter.  I expect most of the work will consist of figuring  
the right of way of packaging it in a jar for solr inclusion.


-Mike


Re: Getting only size of getFacetCounts , to simulate count(group by( a field) ) using facets

2007-09-19 Thread Chris Hostetter

: The main problem with implementing this is trying to figure out where
: to put the info in a backward compatible manner.  Here is how the info

1) this seems like the kind of thing that would only be returend if 
requested -- so we probably don't have to be overly concerned about 
backwards compatibility. if people are going to request this information, 
they have to make their client code to look for it, so they can also 
make their client code to know how to distinguish it from the existing 
counts.

2) the counts themselves are first order faceting data, so i think it 
makes sense to leave them where they are ... metadata about the field 
could be included as an sub-list at the start of the field list 
(much like the missing count is included as an unnamed int at the end of 
the field list)  this sub-list could be unnamed to help distinguish it 
from field term values -- but frankly i don't think that's a huge deal -- 
the counts will standa out for being integers, while this metadata would 
be a nested NamedList (aka: map, aka hash, aka however it's represented in 
the format used)

structure could be something like...

...facet.field=catfacet.limit=3facet.mincount=5facet.missing=true

 lst name=facet_fields
  lst name=cat
lst name=METADATA !-- or maybe STATS ? --
   int name=totalConstraintsAboveMinCount42/int
   int name=totalConstraints678/int
/lst
int name=music30/int
int name=connector20/int
int name=electronics10/int
int5/int!-- existing facet.missing count --
  /lst



-Hoss



RE: Triggering snapshooter through web admin interface

2007-09-19 Thread Chris Hostetter

lance: since the topic you are describing is not directly related to 
triggering a snapshot from the web interface can you please start a new 
thread with a unique subejct describing in more details exactly what it 
was you were doing and the problem you encountered?

this will make it easier for your problem to get visibility (some people 
don't read every thread, and archive searching is frequently done by 
thread, so people looking for similar problems may not realize this new 
thread is burried inside an old one)

-Hoss

: Date: Wed, 19 Sep 2007 11:33:30 -0700
: From: Lance Norskog [EMAIL PROTECTED]
: Reply-To: solr-user@lucene.apache.org
: To: solr-user@lucene.apache.org
: Subject: RE: Triggering snapshooter through web admin interface
: 
: Is there a ticket for this yet? I have a bug report and request: I just did
: a snapshot while indexing 700 records/sec. and got an inconsistency. I was
: tarring off the snapshot and tar reported that a file changed while it was
: being copied. The error rolled off my screen, so I cannot report the file
: name or extension.
: 
: If a solr command to do a snapshot is implemented, please make sure that it
: is 100% consistent.
: 
: Thanks,
: 
: Lance Norskog 



rsync start and enable for multiple solr instances within one tomcat

2007-09-19 Thread Yu-Hui Jin
Hi, there,

So we are using the Tomcat's JNDI method to set up multiple solr instances
within a tomcat server. Each instance has a solr home directory.

Now we want to set up collection distribution for all these solr home
indexes. My understanding is:

1.  we only need to run rsync-start once use the script under any of the
solr home dirs.
2.  we need to run each of the rsync-enable scripts under the solr home's
bin dirs.
3.  the twiki page at
http://wiki.apache.org/solr/SolrCollectionDistributionScripts  keeps
refering to solr/xxx. Is this solr the example solr home dir?  If so,
would it be hard-coded in any of the scripts?  For example, I saw in
snappuller line 226 (solr 1.2):

${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/
${data_dir}/${name}-wip

Is the above solr a hard-coded solr home name? If so, it's not desirable
since we have multiple solr homes with different names.  If not, what is
this solr?


thanks,

-Hui


Re: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-19 Thread Ryan McKinley


However, if I go to the tomcat server and restart it after I have issued
the process command, the program returns and the documents are all
posted correctly!

Very strange behavioram I somehow not closing the connection
properly?  



What version is the solr you are connecting to? 1.2 or 1.3-dev?  (I have 
not tested against 1.2)


Does this only happen with tomcat?  If you run with jetty do you get the 
same behavior?  (again, just stabs in the dark)


If you can make a small repeatable problem, post it in JIRA and I'll 
look into it.


ryan



setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Yu-Hui Jin
Hi, there,

I used an absolute path for the dir param in the solrconfig.xml as below:

listener event=postCommit class=solr.RunExecutableListener
  str name=exesnapshooter/str
  str name=dir/var/SolrHome/solr/bin/str
  bool name=waittrue/bool
  arr name=args strarg1/str strarg2/str /arr
  arr name=env strMYVAR=val1/str /arr
/listener

However, I got snapshooter: not found  exception thrown in catalina.out.
I don't see why this doesn't work. Anything I'm missing?


Many thanks,

-Hui

catalina.out logs:
=
..
Sep 19, 2007 6:17:20 PM org.apache.solr.handler.XmlUpdateRequestHandlerupdate
INFO: added id={SOLR1000} in 67ms
Sep 19, 2007 6:17:20 PM org.apache.solr.core.SolrCore execute
INFO: /update  0 86
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2doDeletions
INFO: DirectUpdateHandler2 deleting and removing dups for 1 ids
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening [EMAIL PROTECTED] DirectUpdateHandler2
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2doDeletions
INFO: DirectUpdateHandler2 docs deleted=0
Sep 19, 2007 6:17:21 PM org.apache.solr.core.SolrException log
SEVERE: java.io.IOException: java.io.IOException: snapshooter: not found
at java.lang.UNIXProcess.init(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:451)
at java.lang.Runtime.exec(Runtime.java:591)
at org.apache.solr.core.RunExecutableListener.exec(
RunExecutableListener.java:70)
at org.apache.solr.core.RunExecutableListener.postCommit(
RunExecutableListener.java:97)
at org.apache.solr.update.UpdateHandler.callPostCommitCallbacks(
UpdateHandler.java:99)
at org.apache.solr.update.DirectUpdateHandler2.commit(
DirectUpdateHandler2.java:514)
at org.apache.solr.handler.XmlUpdateRequestHandler.update(
XmlUpdateRequestHandler.java:214)
at org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody
(XmlUpdateRequestHandler.java:84)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:77)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
at org.apache.solr.servlet.SolrDispatchFilter.execute(
SolrDispatchFilter.java:191)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:159)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(
ApplicationFilterChain.java:202)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(
ApplicationFilterChain.java:173)
at org.apache.catalina.core.StandardWrapperValve.invoke(
StandardWrapperValve.java:213)
at org.apache.catalina.core.StandardContextValve.invoke(
StandardContextValve.java:178)
at org.apache.catalina.core.StandardHostValve.invoke(
StandardHostValve.java:126)
at org.apache.catalina.valves.ErrorReportValve.invoke(
ErrorReportValve.java:105)
at org.apache.catalina.valves.AccessLogValve.invoke(
AccessLogValve.java:526)
at org.apache.catalina.core.StandardEngineValve.invoke(
StandardEngineValve.java:107)
at org.apache.catalina.connector.CoyoteAdapter.service(
CoyoteAdapter.java:148)
at org.apache.coyote.http11.Http11Processor.process(
Http11Processor.java:856)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConnection
(Http11Protocol.java:7
44)
at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(
PoolTcpEndpoint.java:527)
at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(
LeaderFollowerWorkerThread.java:80)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(
ThreadPool.java:684)
at java.lang.Thread.run(Thread.java:595)


Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening [EMAIL PROTECTED] main
Sep 19, 2007 6:17:21 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00
,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main
filterCache{lookups=0,hits=0,hitratio=0.00
,inserts=0,evictions=0,size=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Sep 19, 2007 6:17:21 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
Nutch has two ways to make a distributed query - through HDFS(hadoop file
system)  or RPC call that is in
org.apache.nutch.searcher.DistributedSearch class.

But I think these are both not good enough.

If we use HDFS to service the user's query. Stability is a problem. We must
all do the crawl , index , query on HDFS and use mapreduce. Can we trust in
hadoop all the time?:)

If we use the RPC call in nutch . Manually separate the index is required .
We will receive reduplicate result if there is reduplicate index document on
different servers. And also the data updating and single server's error is
hard to deal with.

Thanks,
Jarvis


-Original Message-
From: Stu Hood [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 10:37 PM
To: solr-user@lucene.apache.org
Subject: Re: How can i make a distribute search on Solr?

Nutch implements federated search separately from their index generation.

My understanding is that MapReduce jobs generate the indexes (Nutch calls
them segments) from raw data that has been downloaded, and then makes them
available to be searched via remote procedure calls. Queries never pass
through MapReduce in any shape or form, only the raw data and indexes.

If you take a look at the org.apache.nutch.searcher.DistributedSearch
class, specifically the #Client.search method, you can see how they handle
the actual federation of results.

Thanks,
Stu


-Original Message-
From: Norberto Meijome 
Sent: Wednesday, September 19, 2007 10:23am
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Wed, 19 Sep 2007 01:46:53 -0400
Ryan McKinley  wrote:

 Stu is referring to Federated Search - where each index has some of the 
 data and results are combined before they are returned.  This is not yet 
 supported out of the box

Maybe this is related. How does this compare to the map-reduce functionality
in Nutch/Hadoop ? 
cheers,
B

_
{Beto|Norberto|Numard} Meijome

With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. 
It is hard to be sure where they are going to land, and it could be
dangerous sitting under them as they fly overhead.
   [RFC1925 - section 2, subsection 3]

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.



Filter by Group

2007-09-19 Thread mark angelillo

Hey all,

Let's say I have an index of one hundred documents, and these  
documents are grouped into 4 groups A, B, C, and D. The groups do in  
fact overlap. What would people recommend as the best way to apply a  
search query and return only the documents that are in group A? Also,  
how about if we run the same search query but return only those  
documents in groups A, C and D?


I imagine that I could do this by indexing a text field populated  
with the group names and adding something like groups:A to the  
query but I'm wondering if there's a better solution.


Thanks in advance,
Mark

mark angelillo
snooth inc.
o: 646.723.4328
c: 484.437.9915
[EMAIL PROTECTED]
snooth -- 1.7 million ratings and counting...




Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Wed, 19 Sep 2007 10:29:54 -0400
Yonik Seeley [EMAIL PROTECTED] wrote:

  Maybe this is related. How does this compare to the map-reduce 
  functionality in Nutch/Hadoop ?  
 
 map-reduce is more for batch jobs.  Nutch only uses map-reduce for
 parallel indexing, not searching.

I see... so in nutch all nodes have all the date indexed ? 

Thanks,
_
{Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop

Imagination is more important than knowledge.
  Albert Einstein, On Science

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Term extraction

2007-09-19 Thread Pieter Berkel
I'm currently looking at methods of term extraction and automatic keyword
generation from indexed documents.  I've been experimenting with
MoreLikeThis and values returned by the mlt.interestingTerms parameter and
so far this approach has worked well.  However, I'd like to be able to
analyze documents more intelligently to recognize phrase keywords such as
open source, Microsoft Office, Bill Gates rather than splitting each
word into separate tokens (the field is never used in search queries so
matching is not an issue).  I've been looking at SynonymFilterFactory as a
possible solution to this problem but haven't been able to work out the
specifics of how to configure it for phrase mappings.

Has anybody else dealt with this problem before or able to offer any
insights into achieve the desired results?

Thanks in advance,
Pieter


RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
I think index data which stored in HDFS and generated by map-reduce function
is used for searching in NUTCH-0.9

You can see the code in org.apache.nutch.searcher.NutchBean class . :)

Jarvis

-Original Message-
From: Norberto Meijome [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 20, 2007 9:52 AM
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Wed, 19 Sep 2007 10:29:54 -0400
Yonik Seeley [EMAIL PROTECTED] wrote:

  Maybe this is related. How does this compare to the map-reduce
functionality in Nutch/Hadoop ?  
 
 map-reduce is more for batch jobs.  Nutch only uses map-reduce for
 parallel indexing, not searching.

I see... so in nutch all nodes have all the date indexed ? 

Thanks,
_
{Beto|Norberto|Numard} Meijome...heading to read about nutch/hadoop

Imagination is more important than knowledge.
  Albert Einstein, On Science

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.



Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:37:51 +0800
Jarvis [EMAIL PROTECTED] wrote:

 If we use the RPC call in nutch .
Hi,
I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in this
league to be suggesting architecture stuff :) but i imagine there's nothing
wrong with using what they've built if it addresses solr's needs.

  Manually separate the index is required .

hmm i imagine this really depends on the application. In my case, this
separation of which docs go where happens @ a completely different layer.

 We will receive reduplicate result if there is reduplicate index document on
 different servers. 

Maybe I got this wrong...but isn't this what mapreduce is meant to deal with?
eg, 

1) get the job (a query)
2) map it to workers ( servers that provide search results from their own
indexing)
3) wait for the results from all workers that reply within acceptable timeframe.
4) comb through the lot of  results from all workers, reduce them according to
your own biz rules (eg, remove dupes, sort them by quality / priority... here 
possibly relying on the original parameters of the query in 1)
5) return the reduced results to the frontend.

 And also the data updating and single server's error is
 hard to deal with.

this really depends on your infrastructure + design. 

Having the indexing , searching and providing of results in different layers
should make for some interesting design options...

If each searcher (or wherever the index resides) is really a small cluster of
servers , the issue of data safety / server error is addressed @ that point.
You can also have repeated data across indexes (again, independent indexes) and
that's a more ... randomised :) way of keeping the docs safe... For example,
IIRC, googleFS keeps copies of each file in 3 servers or more...

cheers,
B
_
{Beto|Norberto|Numard} Meijome

He uses statistics as a drunken man uses lamp-posts ... for support rather
than illumination. Andrew Lang (1844-1912)

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Term extraction

2007-09-19 Thread Brian Whitman

On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:

I'm currently looking at methods of term extraction and automatic  
keyword

generation from indexed documents.


We do it manually (not in solr, but we put the results in solr.) We  
do it the usual way - chunk (into n-grams, named entities  noun  
phrases) and count (tf  df). It works well enough. There is a bevy  
of literature on the topic if you want to get smart -- but be  
warned smart and fast are likely not very good friends.


A lot depends on the provenance of your data -- is it clean text that  
uses a lot of domain specific terms? Is it webtext?




Re: Filter by Group

2007-09-19 Thread Pieter Berkel
Sounds like you're on the right track, if your groups overap (i.e. a
document can be in group A and B), then you should ensure your groups
field is multivalued.

If you are searching for foo in documents contained in group A, then it
might be more efficient to use a filter query (fq) like:

q=foofq=groups:A

See the wiki page on common query parameters for more info:
http://wiki.apache.org/solr/CommonQueryParameters#head-6522ef80f22d0e50d2f12ec487758577506d6002

cheers,
Piete



On 20/09/2007, mark angelillo [EMAIL PROTECTED] wrote:

 Hey all,

 Let's say I have an index of one hundred documents, and these
 documents are grouped into 4 groups A, B, C, and D. The groups do in
 fact overlap. What would people recommend as the best way to apply a
 search query and return only the documents that are in group A? Also,
 how about if we run the same search query but return only those
 documents in groups A, C and D?

 I imagine that I could do this by indexing a text field populated
 with the group names and adding something like groups:A to the
 query but I'm wondering if there's a better solution.

 Thanks in advance,
 Mark

 mark angelillo
 snooth inc.
 o: 646.723.4328
 c: 484.437.9915
 [EMAIL PROTECTED]
 snooth -- 1.7 million ratings and counting...





RE: How can i make a distribute search on Solr?

2007-09-19 Thread Jarvis
HI,
What you say is done by hadoop that support Hardware Failure、Data
Replication and some else . 
If we want to implement such a good system by ourselves without HDFS
but Solr , it's a very very complex work I think. :) 
I just want to know whether there is a component existed can do the
distributed search based on Solr.

Thanks 
Jarvis.

-Original Message-
From: Norberto Meijome [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 20, 2007 10:06 AM
To: solr-user@lucene.apache.org
Cc: [EMAIL PROTECTED]
Subject: Re: How can i make a distribute search on Solr?

On Thu, 20 Sep 2007 09:37:51 +0800
Jarvis [EMAIL PROTECTED] wrote:

 If we use the RPC call in nutch .
Hi,
I wasn't suggesting to use nutch in solr...I'm only a young grasshopper in
this
league to be suggesting architecture stuff :) but i imagine there's nothing
wrong with using what they've built if it addresses solr's needs.

  Manually separate the index is required .

hmm i imagine this really depends on the application. In my case, this
separation of which docs go where happens @ a completely different layer.

 We will receive reduplicate result if there is reduplicate index document
on
 different servers. 

Maybe I got this wrong...but isn't this what mapreduce is meant to deal
with?
eg, 

1) get the job (a query)
2) map it to workers ( servers that provide search results from their own
indexing)
3) wait for the results from all workers that reply within acceptable
timeframe.
4) comb through the lot of  results from all workers, reduce them according
to
your own biz rules (eg, remove dupes, sort them by quality / priority...
here possibly relying on the original parameters of the query in 1)
5) return the reduced results to the frontend.

 And also the data updating and single server's error is
 hard to deal with.

this really depends on your infrastructure + design. 

Having the indexing , searching and providing of results in different layers
should make for some interesting design options...

If each searcher (or wherever the index resides) is really a small cluster
of
servers , the issue of data safety / server error is addressed @ that point.
You can also have repeated data across indexes (again, independent indexes)
and
that's a more ... randomised :) way of keeping the docs safe... For example,
IIRC, googleFS keeps copies of each file in 3 servers or more...

cheers,
B
_
{Beto|Norberto|Numard} Meijome

He uses statistics as a drunken man uses lamp-posts ... for support rather
than illumination. Andrew Lang (1844-1912)

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.



Re: Term extraction

2007-09-19 Thread Pieter Berkel
Thanks Brian, I think the smart approaches you refer to might be outside
the scope of my current project.  The documents I am indexing already have
manually-generated keyword data, moving forward I'd like to have these
keywords automatically generated, selected from a pre-defined list of
keywords (i.e. the simple approach).

The data is fairly clean and domain-specific so I don't expect there will be
more than several hundred of these phrase terms to deal with, which is why I
was exploring the SynonymFilterFactory option.

Pieter



On 20/09/2007, Brian Whitman [EMAIL PROTECTED] wrote:

 On Sep 19, 2007, at 9:58 PM, Pieter Berkel wrote:

  I'm currently looking at methods of term extraction and automatic
  keyword
  generation from indexed documents.

 We do it manually (not in solr, but we put the results in solr.) We
 do it the usual way - chunk (into n-grams, named entities  noun
 phrases) and count (tf  df). It works well enough. There is a bevy
 of literature on the topic if you want to get smart -- but be
 warned smart and fast are likely not very good friends.

 A lot depends on the provenance of your data -- is it clean text that
 uses a lot of domain specific terms? Is it webtext?




Re: How can i make a distribute search on Solr?

2007-09-19 Thread Mike Klaas

On 19-Sep-07, at 7:21 PM, Jarvis wrote:


HI,
What you say is done by hadoop that support Hardware Failure、Data
Replication and some else .
If we want to implement such a good system by ourselves without HDFS
but Solr , it's a very very complex work I think. :)
I just want to know whether there is a component existed can do the
distributed search based on Solr.


https://issues.apache.org/jira/browse/SOLR-303? 
page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel


regards,
-Mike

Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Yu-Hui Jin
Hi, Pieter,

Thanks!  Now the exception is gone. However, There's no snapshot file
created in the data directory. Strangely, the snapshooter.log seems to
complete successfully.  Any idea what else I'm missing?

$ cat var/SolrHome/solr/logs/snapshooter.log
2007/09/19 20:16:17 started by solruser
2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2
2007/09/19 20:16:17 taking snapshot
var/SolrHome/solr/data/snapshot.20070919201617
2007/09/19 20:16:17 ended (elapsed time: 0 sec)

Thanks,

-Hui




On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote:

 See this recent thread for some helpful info:

 http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792

 You'll probably want to configure your exe with an absolute path rather
 than
 the dir:

   str name=exe/var/SolrHome/solr/bin/snapshooter/str
   str name=dir./str

 In order to get the snapshooter working correctly.

 cheers,
 Piete



 On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
 
  Hi, there,
 
  I used an absolute path for the dir param in the solrconfig.xml as
  below:
 
  listener event=postCommit class=solr.RunExecutableListener
str name=exesnapshooter/str
str name=dir/var/SolrHome/solr/bin/str
bool name=waittrue/bool
arr name=args strarg1/str strarg2/str /arr
arr name=env strMYVAR=val1/str /arr
  /listener
 
  However, I got snapshooter: not found  exception thrown in
 catalina.out.
  I don't see why this doesn't work. Anything I'm missing?
 
 
  Many thanks,
 
  -Hui
 




-- 
Regards,

-Hui


Re: setting absolute path for snapshooter in solrconfig.xml doesn't work

2007-09-19 Thread Pieter Berkel
If you don't need to pass any command line arguments to snapshooter, remove
(or comment out) this line from solrconfig.xml:

arr name=args strarg1/str strarg2/str /arr

By the same token, if you're not setting environment variables either,
remove the following line as well:

arr name=env strMYVAR=val1/str /arr

Once you alter / remove those two lines, snapshooter should function as
expected.

cheers,
Piete



On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:

 Hi, Pieter,

 Thanks!  Now the exception is gone. However, There's no snapshot file
 created in the data directory. Strangely, the snapshooter.log seems to
 complete successfully.  Any idea what else I'm missing?

 $ cat var/SolrHome/solr/logs/snapshooter.log
 2007/09/19 20:16:17 started by solruser
 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2
 2007/09/19 20:16:17 taking snapshot
 var/SolrHome/solr/data/snapshot.20070919201617
 2007/09/19 20:16:17 ended (elapsed time: 0 sec)

 Thanks,

 -Hui




 On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote:
 
  See this recent thread for some helpful info:
 
 
 http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.html#a12167792
 
  You'll probably want to configure your exe with an absolute path rather
  than
  the dir:
 
str name=exe/var/SolrHome/solr/bin/snapshooter/str
str name=dir./str
 
  In order to get the snapshooter working correctly.
 
  cheers,
  Piete
 
 
 
  On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
  
   Hi, there,
  
   I used an absolute path for the dir param in the solrconfig.xml as
   below:
  
   listener event=postCommit class=solr.RunExecutableListener
 str name=exesnapshooter/str
 str name=dir/var/SolrHome/solr/bin/str
 bool name=waittrue/bool
 arr name=args strarg1/str strarg2/str /arr
 arr name=env strMYVAR=val1/str /arr
   /listener
  
   However, I got snapshooter: not found  exception thrown in
  catalina.out.
   I don't see why this doesn't work. Anything I'm missing?
  
  
   Many thanks,
  
   -Hui
  
 



 --
 Regards,

 -Hui



Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:02:08 +0800
Jarvis [EMAIL PROTECTED] wrote:

 You can see the code in org.apache.nutch.searcher.NutchBean class . :)

thx for the pointer.

_
{Beto|Norberto|Numard} Meijome

In order to avoid being called a flirt, she always yielded easily.
  Charles, Count Talleyrand

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Norberto Meijome
On Thu, 20 Sep 2007 10:21:39 +0800
Jarvis [EMAIL PROTECTED] wrote:

   What you say is done by hadoop that support Hardware Failure、Data
 Replication and some else . 
   If we want to implement such a good system by ourselves without HDFS
 but Solr , it's a very very complex work I think. :) 
   I just want to know whether there is a component existed can do the
 distributed search based on Solr.

Thanks for the info.

Risking starting up  a flame war (which is not my intention :) ), what
design reasons / features are there in Solr but not in hadoop/nutch that
would make it compelling to use solr instead of h/n ? 

I know, each case is
different the feeling i got from a shortish read into h/n was that H/N is
geared towards webpage indexing, crawling,etc.  But possibly i'm missing
something...

Where Solr is , from my point of view, far more flexible. In which case, maybe
porting HDFS into Solr to add all this clustering / map/reduce options...

thanks for your time and insights :)
B
_
{Beto|Norberto|Numard} Meijome

Windows caters to everyone as though they are idiots. UNIX makes no such
assumption. It assumes you know what you are doing, and presents the challenge
of figuring it out for yourself if you don't.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: How can i make a distribute search on Solr?

2007-09-19 Thread Venkatraman S
Along similar lines :

assuming that i have 2 indexes in the same box  , say at :
/home/abc/data/index1 and  /home/abc/data/index2,
and i want the results from both the indexes when i do a search - then how
should this be 'optimally' designed - basically these are different Solr
homes and i want the results to be clearly demarcated as coming from 2
different sources.

-Venkat

On 9/20/07, Norberto Meijome [EMAIL PROTECTED] wrote:

 On Thu, 20 Sep 2007 10:21:39 +0800
 Jarvis [EMAIL PROTECTED] wrote:

What you say is done by hadoop that support Hardware Failure、Data
  Replication and some else .
If we want to implement such a good system by ourselves without
 HDFS
  but Solr , it's a very very complex work I think. :)
I just want to know whether there is a component existed can do
 the
  distributed search based on Solr.

 Thanks for the info.

 Risking starting up  a flame war (which is not my intention :) ), what
 design reasons / features are there in Solr but not in hadoop/nutch that
 would make it compelling to use solr instead of h/n ?

 I know, each case is
 different the feeling i got from a shortish read into h/n was that H/N
 is
 geared towards webpage indexing, crawling,etc.  But possibly i'm missing
 something...

 Where Solr is , from my point of view, far more flexible. In which case,
 maybe
 porting HDFS into Solr to add all this clustering / map/reduce options...

 thanks for your time and insights :)
 B
 _
 {Beto|Norberto|Numard} Meijome

 Windows caters to everyone as though they are idiots. UNIX makes no such
 assumption. It assumes you know what you are doing, and presents the
 challenge
 of figuring it out for yourself if you don't.

 I speak for myself, not my employer. Contents may be hot. Slippery when
 wet.
 Reading disclaimers makes you go blind. Writing them is worse. You have
 been
 Warned.




--


Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-19 Thread Venkatraman S
On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 you imply that you are building your index using embedded solr, but based
 on your stack trace it seems you are using Solr in a servlet container ...
 i assume to search the index you've already built?


I  have  a jsp that routes  the info from  a drupal module to my Embedded
solr app.

Does this case arise when i do a search when there is no index?? -  If yes,
then i guess the Exception can be made more meaningful.

Is the embeddd core completley closed before your servlet cotainer running
 Solr is started?  what does hte directly list look like in between the
 finish of A and the start of B?


yes - it is closed ; but i guess this problem arises when i do a search when
no index is created - can you confirm this.

-Venkat