Re: sorlj search

2009-04-09 Thread RajuMaddy



Tevfik  Kiziloren wrote:
> 
> Hi. I'm a newbie. I need to develop a jsf based search application by
> using solr. I found nothing about soljava imlementation except simple
> example on the solr wiki. When I tried a console program that similar in
> the example at solr wiki, I got the exception below. Where can i find an
> extensive documentation about solrj?
> 
> Thanks in advance.
> Tevfik Kızılören.
> 
> try {
> String url = "http://localhost:8080/solr";;
> SolrServer server = new CommonsHttpSolrServer(url);   
> 
> SolrQuery query = new SolrQuery();
> query.setQuery("solr");
> System.out.println(query.toString());   
> QueryResponse rsp = server.query(query);
> System.out.println(rsp.getResults().toString());
>
> } catch (IOException ex) {
>
> Logger.getLogger(SolrclientView.class.getName()).log(Level.SEVERE, null,
> ex);
> } catch (SolrServerException ex) {
>
> Logger.getLogger(SolrclientView.class.getName()).log(Level.SEVERE, null,
> ex);
> }
> 
> 
> ---
> solrclient.SolrclientView jButton1ActionPerformed
> SEVERE: null
> org.apache.solr.client.solrj.SolrServerException: Error executing query
> at
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:90)
> at
> org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:96)
> at
> solrclient.SolrclientView.jButton1ActionPerformed(SolrclientView.java:229)
> at solrclient.SolrclientView.access$800(SolrclientView.java:32)
> at
> solrclient.SolrclientView$4.actionPerformed(SolrclientView.java:135)
> at
> javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995)
> at
> javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
> at
> javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:387)
> at
> javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
> at
> javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListener.java:236)
> at java.awt.Component.processMouseEvent(Component.java:6038)
> at javax.swing.JComponent.processMouseEvent(JComponent.java:3265)
> at java.awt.Component.processEvent(Component.java:5803)
> at java.awt.Container.processEvent(Container.java:2058)
> at java.awt.Component.dispatchEventImpl(Component.java:4410)
> at java.awt.Container.dispatchEventImpl(Container.java:2116)
> at java.awt.Component.dispatchEvent(Component.java:4240)
> at
> java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4322)
> at
> java.awt.LightweightDispatcher.processMouseEvent(Container.java:3986)
> at
> java.awt.LightweightDispatcher.dispatchEvent(Container.java:3916)
> at java.awt.Container.dispatchEventImpl(Container.java:2102)
> at java.awt.Window.dispatchEventImpl(Window.java:2429)
> at java.awt.Component.dispatchEvent(Component.java:4240)
> at java.awt.EventQueue.dispatchEvent(EventQueue.java:599)
> at
> java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:273)
> at
> java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:183)
> at
> java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:173)
> at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:168)
> at
> java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:160)
> at java.awt.EventDispatchThread.run(EventDispatchThread.java:121)
> Caused by: org.apache.solr.common.SolrException: parsing error
> at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:138)
> at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:99)
> at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:317)
> at
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:84)
> ... 29 more
> Caused by: java.lang.RuntimeException: this must be known type! not: int
> at
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:217)
> at
> org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:235)
> at
> org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:123)
> 

Hi

   May be your query string contains any illegal values or the problem may
be in your server... make sure that your solr is running in localhost:8080 

-- 
View this message in context: 
http://www.nabble.com/sorlj-search-tp15305698p22983898.h

multiple tokenizers needed

2009-04-09 Thread Ashish P

I want to analyze a text based on pattern ";" and separate on whitespace and
it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
 



 
Can anyone please tell me how to achieve this?? Because the above syntax is
not at all possible.
-- 
View this message in context: 
http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar
Just an update. I changed the schema to store the unique id field, but
I still get the connection reset exception. I did notice that if there
is no data in the core then it returns the 0 result (no exception),
but if there is data and you search using "shards" parameter I get the
connection reset exception. Can anyone provide some tip on where can I
look for this problem?


Apr 10, 2009 3:16:04 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:395)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
... 1 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)


On Thu, Apr 9, 2009 at 6:51 PM, vivek sar  wrote:
> I think the reason behind the "connection reset" is. Looking at the
> code it points to QueryComponent.mergeIds()
>
> resultIds.put(shardDoc.id.toString(), shardDoc);
>
> looks like the doc unique id is returning null. I'm not sure how is it
> possible as its a required field. Right my unique id is not stored
> (only indexed) - does it has to be stored for distributed search?
>
> HTTP Status 500 - null java.lang.NullPointerException at
> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
> at 
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
> at 
> org.apache.s

Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar
I think the reason behind the "connection reset" is. Looking at the
code it points to QueryComponent.mergeIds()

resultIds.put(shardDoc.id.toString(), shardDoc);

looks like the doc unique id is returning null. I'm not sure how is it
possible as its a required field. Right my unique id is not stored
(only indexed) - does it has to be stored for distributed search?

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)

On Thu, Apr 9, 2009 at 5:01 PM, vivek sar  wrote:
> Hi,
>
>  I've another thread on multi-core distributed search, but just
> wanted to put a simple question here on distributed search to get some
> response. I've a search query,
>
>   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa     -
> returns with 10 result
>
> now if I add "shards" parameter to it,
>
>  http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa
>  - this fails with
>
> org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Connection reset
> org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Connection reset at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
> at
> ..
>        at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>        at 
> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>        at java.lang.Thread.run(Thread.java:637)
> Caused by: org.apache.solr.client.solrj.SolrServerException:
> java.net.SocketException: Connection reset
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
>        at 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>        at 
> org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
> ..
> Caused by: java.net.SocketException: Connection reset
>        at java.net.SocketInputStream.read(SocketInputStream.java:168)
>        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>        at 
> org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
>        at 
> org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
>        at 
> org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
>        at 
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
>        at 
> org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
>        at 
> org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
>
> Attached is my solrconfig.xml. Do I need a special RequestHandler for
> sharding? I haven't been able to make any distributed search
> successfully. Any help is appreciated.
>
> Note: I'm indexing using Solrj - not sure if that makes any difference
> to the search part.
>
> Thanks,
> -vivek
>


Re: Additive filter queries

2009-04-09 Thread Chris Hostetter
: Right now a document looks like this:
: 
: 
: 
: 1598548
: 12545
: Adidas
: 1, 2, 3, 4, 5, 6, 7
: AA, A, B, W, W, 
: Brown
: 
: 
: If we went down a level, it could look like..
: 
: 
: 1598548
: 12545
: 654641654684
: Adidas
: 1
: AA
: Brown
: 

If you want result at the "product" level then you don't have to have one 
*doc* per legal size+width pair ... you just need one *term* per 
valid size+width pair

  1, 2, 3, 4, 5, 6, 7
  AA, A, B, W, W, 
  1_W 2W 3_B 3_W 4_AA 4_A 4_B 4_W 4_WW 5_W 5_ 6_ 7_

a search for size 4 clogs would look like...

  q=clogs&fq=size:5&facet.field=opts&f.opts.facet.prefix=4_

...and the facet counts for "opts" would tell me what widths were 
available (and how many).  

for completeness you typically want to index the pairs in both directions 
(1_W and W_1 ... typically in seperate fields) so the user can filter by 
either option first ... for something like size+color this makes sense, 
but i'm guessing with shoes no one expects to narrow by "width" untill 
they've narrowed by size first.


-Hoss


Re: Querying for multi-word synonyms

2009-04-09 Thread Chris Hostetter

: Unfortunately, I have to use SynonymFilter at query time due to the nature
: of the data I'm indexing. At index time, all I have are keywords but at
: query time I will have some semantic markup which allows me to expand into
: synonyms. I am wondering if any progress has been made into making query
: time synonym searching work correctly. If not, does anyone have some ideas
: for alternatives to using SynonymFilter? The only thing I can think of is to
: simply create a custom BooleanQuery for the search and feed the synonyms in
: manually, but then I am missing out on all the functionality of the dismax
: query parser. Any ideas are appreciated, thanks very much.

Fundementally the problem with multi-word query time synonyms is that the 
Analyzer only has a limited mechanism of conveying "structure" back to the 
caller (ie: the QueryParser) ... that mechanism being the "term position" 
-- you can indicate that terms can occupy the same single position, but 
not that sequences of terms can occupy the same position.

you could write a query parser that used nested SpanNearQueries to create 
a directed acyclic graph of terms that you want to match in a sequence, 
where some "branches" of the graph contain more nodes then others, but you 
would need to do the synonym recognition while building up the query (and 
working with the DAG) ... but the current SynonymFilter works as part of 
hte TokenStream.



-Hoss



Question on Solr Distributed Search

2009-04-09 Thread vivek sar
Hi,

  I've another thread on multi-core distributed search, but just
wanted to put a simple question here on distributed search to get some
response. I've a search query,

   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa -
returns with 10 result

now if I add "shards" parameter to it,

  
http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9&q=usa
 - this fails with

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at
..
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
..
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

Attached is my solrconfig.xml. Do I need a special RequestHandler for
sharding? I haven't been able to make any distributed search
successfully. Any help is appreciated.

Note: I'm indexing using Solrj - not sure if that makes any difference
to the search part.

Thanks,
-vivek





  
  

  
   
true
100

64
2147483647
1
1000
1
single
  

  

true
100



64
2147483647
1


true
single
  

  
  




 





  


  

1024





   


  



false




   

   
10

















false

  


 
  

 
   explicit
   
 
  

 
  
  
  

 
inStock:true
 
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
 
2<-1 5<-2 6<90%
 
  
  
  

  
  


  
  
  
  
  
  

  
  
5
   

   
  
solr



  




Re: How to get the solrhome location dynamically

2009-04-09 Thread Chris Hostetter

: Subject: How to get the solrhome location dynamically

Do you really want the Solr Home Dir, or do you want the instanceDir for a 
specific SolrCore?

If you're using a solr.xml file (ie: one or many cores), you can get hte 
instanceDir for each core from the CoreAdminHandler -- but it doesn't 
expost the actual SolrHomeDir where the solr.xml file was found.

If you aren't using a solr.xml file (ie: you definitely only have one 
core) you can get the instance dir from the SystemInfoRequestHandler 
(/admin/system in the example configs) ... and since you aren't using a 
solr.xml file, the instance dir is the same as the Solr Home Dir.


(H... I suppose the CoreAdminHandler should probably expose metadta 
about the CoreContainer ... anyone want to work up a patch?)





-Hoss



Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread vivek sar
Here is what I'm doing,

SolrServer server = new StreamingUpdateSolrServer(url, 1000,5);

server.addBeans(dataList);  //where dataList is List with 10K elements

I run two threads each using the same server object and then each call
server.addBeans(...).

I'm able to get 50K/sec inserted using that, but the commit after that
(after 100k records) takes 70sec - which messes up the avg time.

There are two problems here,

1) Once in a while I get "connection reset" error,

Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)

Note: if I use CommonsHttpSolrServer I get the buffer error.

2) The commit takes way too long for every 100k  (I may commit more
often if this can not be improved)

I'm trying to fix this error problem which happens only if I run two
threads both calling addBeans (10k at a time). One thread work fine.
I'm not sure how can I use the MultiThreadedConnectionManager to
create StreamingUpdateSolrServer and if they would help?

Thanks,
-vivek

2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
> using a single request is the fatest
>
> http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65
>
> I could index at the rate of 10,000 docs/sec using this and 
> BinaryRequestWriter
>
> On Thu, Apr 9, 2009 at 10:36 PM, vivek sar  wrote:
>> I'm inserting 10K in a batch (using addBeans method). I read somewhere
>> in the wiki that it's better to use the same instance of SolrServer
>> for better performance. Would MultiThreadedConnectionManager help? How
>> do I use it?
>>
>> I also wanted to know how can use EmbeddedSolrServer - does my app
>> needs to be running in the same jvm with Solr webapp?
>>
>> Thanks,
>> -vivek
>>
>> 2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
>>> how many documents are you inserting ?
>>> may be you can create multiple instances of CommonshttpSolrServer and
>>> upload in parallel
>>>
>>>
>>> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar  wrote:
 Thanks Shalin and Paul.

 I'm not using MultipartRequest. I do share the same SolrServer between
 two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
 simply using CommonsHttpSolrServer to create the SolrServer. I've also
 tried StreamingUpdateSolrServer, which works much faster, but does
 throws "connection reset" exception once in a while.

 Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
 anything on it on Wiki.

 I was also thinking of using EmbeddedSolrServer - in what case would I
 be able to use it? Does my application and the Solr web app need to
 run into the same JVM for this to work? How would I use the
 EmbeddedSolrServer?

 Thanks,
 -vivek


 On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
  wrote:
> Vivek, do you share the same SolrServer instance between your two threads?
> If so, are you using the MultiThreadedHttpConnectionManager when creating
> the HttpClient instance?
>
> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar  wrote:
>
>> single thread everything works fine. Two threads are fine too for a
>> while and all the sudden problem starts happening.
>>
>> I tried indexing using REST services as well (instead of Solrj), but
>> with that too I get following error after a while,
>>
>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
>> indexData()-> Failed to index
>> java.net.SocketException: Broken pipe
>>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>>        at
>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>        at 
>> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>>        at
>> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
>>        at
>> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
>>         at
>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>>        at
>> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>>        at
>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>>        at
>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>>        at
>> org.apache.

logging

2009-04-09 Thread Kevin Osborn
We built our own webapp that used the Solr JARs. We used Apache Commons/log4j 
logging and just put log4j.properties in the Resin conf directory. The 
commons-logging and log4j jars were put in the Resin lib driectory. Everything 
worked great and we got log files for our code only.

So, I upgraded to Solr 1.4 and I no longer get my log file. I assume it has 
something to do with Solr 1.4 using SL4J instead of JDK logging, but it seems 
like my code would be independent of that. Any ideas?



  

Re: Using ExtractingRequestHandler to index a large PDF ~solved

2009-04-09 Thread Grant Ingersoll


On Apr 6, 2009, at 10:16 AM, Fergus McMenemie wrote:


Hmmm,

Not sure how this all hangs together. But editing my solrconfig.xml  
as follows

sorted the problem:-

   multipartUploadLimitInKB="2048" />

to

   multipartUploadLimitInKB="20048" />




We should document this on the wiki or in the config, if it isn't  
already.


Also, my initial report of the issue was misled by the log messages.  
The mention
of "oceania.pdf" refers to a previous successful tika extract. There  
no mention
of the filename that was rejected in the logs or any information  
that would help

me identify it!


We should fix this so it at least spits out a meaningful message.  Can  
you open a JIRA?





Regards Fergus.

Sorry if this is a FAQ; I suspect it could be. But how do I work  
around the following:-


INFO: [] webapp=/apache-solr-1.4-dev path=/update/extract  
params={ext.def.fl=text&ext.literal.id=factbook/reference_maps/pdf/ 
oceania.pdf} status=0 QTime=318

Apr 2, 2009 11:17:46 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.commons.fileupload.FileUploadBase 
$SizeLimitExceededException: the request was rejected because its  
size (4585774) exceeds the configured maximum (2097152)
	at org.apache.commons.fileupload.FileUploadBase 
$FileItemIteratorImpl.(FileUploadBase.java:914)
	at  
org 
.apache 
.commons 
.fileupload.FileUploadBase.getItemIterator(FileUploadBase.java:331)
	at  
org 
.apache 
.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java: 
349)
	at  
org 
.apache 
.commons 
.fileupload 
.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
	at  
org 
.apache 
.solr 
.servlet 
.MultipartRequestParser 
.parseParamsAndFillStreams(SolrRequestParsers.java:343)
	at  
org 
.apache 
.solr 
.servlet 
.StandardRequestParser 
.parseParamsAndFillStreams(SolrRequestParsers.java:396)
	at  
org 
.apache 
.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
217)
	at  
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:202)
	at  
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
173)
	at  
org 
.apache 
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
	at  
org 
.apache 
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:178)
	at  
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
	at  
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)


Although the PDF is big, it contains very little text; it is a map.

 "java -jar solr/lib/tika-0.3.jar -g" appears to have no bother  
with it.


Fergus...
--

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


--

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Dictionary lookup possibilities

2009-04-09 Thread Jaco
Hello,

I'm struggling with some ideas, maybe somebody can help me with past
experiences or tips. I have loaded a dictionary into a Solr index, using
stemming and some stopwords in analysis part of the schema. Each record
holds a term from the dictionary, which can consist of multiple words. For
some data analysis work, I want to send pieces of text (sentences actually)
to Solr to retrieve all possible dictionary terms that could occur. Ideally,
I want to construct a query that only returns those Solr records for which
all individual words in that record are matched.

For instance, my dictionary holds the following terms:
1 - a b c d
2 - c d e
3 - a b
4 - a e f g h

If I put the sentence [a b c d f g h] in as a query, I want to recieve
dictionary items 1 (matching all words a b c d) and 3 (matching words a b)
as matches

I have been puzzling about how to do this. The only way I found so far was
to construct an OR query with all words of the sentence in it. In this case,
that would result in all dictionary items being returned. This would then
require some code to go over the search results and analyse each of them
(i.e. by using the highlight function) to kick out 'false' matches, but I am
looking for a more efficient way.

Is there a way to do this with Solr functionality, or do I need to start
looking into the Lucene API ..?

Any help would be much appreciated as usual!

Thanks, bye,

Jaco.


Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr  wrote:
>
> Hi Otis,
> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
> for 14M docs and 5 update every 30mn but my replication kill everything.
> My segments are merged too often sor full index replicate and cache lost and
>  I've no idea what can I do now?
> Some help would be brilliant,
> btw im using Solr 1.4.
>

sunnnyfr , whether the replication is full or delta , the caches are
lost completely.

you can think of partitioning the index into separate Solrs and
updating one partition at a time and perform distributed search.

> Thanks,
>
>
> Otis Gospodnetic wrote:
>>
>> Mike is right about the occasional slow-down, which appears as a pause and
>> is due to large Lucene index segment merging.  This should go away with
>> newer versions of Lucene where this is happening in the background.
>>
>> That said, we just indexed about 20MM documents on a single 8-core machine
>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>> approach before some of our changes apparently required several days to
>> index the same amount of data.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> - Original Message 
>> From: Mike Klaas 
>> To: solr-user@lucene.apache.org
>> Sent: Monday, November 19, 2007 5:50:19 PM
>> Subject: Re: Any tips for indexing large amounts of data?
>>
>> There should be some slowdown in larger indices as occasionally large
>> segment merge operations must occur.  However, this shouldn't really
>> affect overall speed too much.
>>
>> You haven't really given us enough data to tell you anything useful.
>> I would recommend trying to do the indexing via a webapp to eliminate
>> all your code as a possible factor.  Then, look for signs to what is
>> happening when indexing slows.  For instance, is Solr high in cpu, is
>> the computer thrashing, etc?
>>
>> -Mike
>>
>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>
>>> Hi,
>>>
>>> Thanks for answering this question a while back. I have made some
>>> of the suggestions you mentioned. ie not committing until I've
>>> finished indexing. What I am seeing though, is as the index get
>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>> slows down to a crawl. Have you got any pointers as to what I might
>>> be doing wrong?
>>>
>>> Also, I was looking at using MultiCore solr. Could this help in
>>> some way?
>>>
>>> Thank you
>>> Brendan
>>>
>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>

 : I would think you would see better performance by allowing auto
 commit
 : to handle the commit size instead of reopening the connection
 all the
 : time.

 if your goal is "fast" indexing, don't use autoCommit at all ...
>>  just
 index everything, and don't commit until you are completely done.

 autoCommitting will slow your indexing down (the benefit being
 that more
 results will be visible to searchers as you proceed)




 -Hoss

>>>
>>
>>
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul


Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread Shalin Shekhar Mangar
On Thu, Apr 9, 2009 at 10:36 PM, vivek sar  wrote:

> I'm inserting 10K in a batch (using addBeans method). I read somewhere
> in the wiki that it's better to use the same instance of SolrServer
> for better performance. Would MultiThreadedConnectionManager help? How
> do I use it?
>

If you are not passing your own HttpClient to the CommonsHttpSolrServer
constructor then you do not need to worry about this. The default is the
MultiThreadedConnectionManager.


>
> I also wanted to know how can use EmbeddedSolrServer - does my app
> needs to be running in the same jvm with Solr webapp?
>

Actually with EmbeddedSolrServer, there is no Solr webapp. You add it as
another jar in your own webapp.

-- 
Regards,
Shalin Shekhar Mangar.


Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
using a single request is the fatest

http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65

I could index at the rate of 10,000 docs/sec using this and BinaryRequestWriter

On Thu, Apr 9, 2009 at 10:36 PM, vivek sar  wrote:
> I'm inserting 10K in a batch (using addBeans method). I read somewhere
> in the wiki that it's better to use the same instance of SolrServer
> for better performance. Would MultiThreadedConnectionManager help? How
> do I use it?
>
> I also wanted to know how can use EmbeddedSolrServer - does my app
> needs to be running in the same jvm with Solr webapp?
>
> Thanks,
> -vivek
>
> 2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
>> how many documents are you inserting ?
>> may be you can create multiple instances of CommonshttpSolrServer and
>> upload in parallel
>>
>>
>> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar  wrote:
>>> Thanks Shalin and Paul.
>>>
>>> I'm not using MultipartRequest. I do share the same SolrServer between
>>> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
>>> simply using CommonsHttpSolrServer to create the SolrServer. I've also
>>> tried StreamingUpdateSolrServer, which works much faster, but does
>>> throws "connection reset" exception once in a while.
>>>
>>> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
>>> anything on it on Wiki.
>>>
>>> I was also thinking of using EmbeddedSolrServer - in what case would I
>>> be able to use it? Does my application and the Solr web app need to
>>> run into the same JVM for this to work? How would I use the
>>> EmbeddedSolrServer?
>>>
>>> Thanks,
>>> -vivek
>>>
>>>
>>> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
>>>  wrote:
 Vivek, do you share the same SolrServer instance between your two threads?
 If so, are you using the MultiThreadedHttpConnectionManager when creating
 the HttpClient instance?

 On Wed, Apr 8, 2009 at 10:13 PM, vivek sar  wrote:

> single thread everything works fine. Two threads are fine too for a
> while and all the sudden problem starts happening.
>
> I tried indexing using REST services as well (instead of Solrj), but
> with that too I get following error after a while,
>
> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
> indexData()-> Failed to index
> java.net.SocketException: Broken pipe
>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>        at
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>        at 
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>        at
> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
>        at
> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
>         at
> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>        at
> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>        at
> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>        at
> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>
>
> Note, I'm using "simple" lock type. I'd tried "single" type before
> that once caused index corruption so I switched to "simple".
>
> Thanks,
> -vivek
>
> 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् :
> > do you see the same problem when you use a single thread?
> >
> > what is the version of SolrJ that you use?
> >
> >
> >
> > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar  wrote:
> >> Hi,
> >>
> >>  Any ideas on this issue? I ran into this again - once it starts
> >> happening it keeps happening. One of the thread keeps failing. Here
> >> are my SolrServer settings,
> >>
> >>        int socketTO = 0;
> >>        int connectionTO = 100;
> >>        int maxConnectionPerHost = 10;
> >>        int maxTotalConnection = 50;
> >>        boolean followRedirects = false;
> >>        boolean allowCompression = true;
> >>        int maxRetries = 1;
> >>
> >> Note, I'm using two threads to simultaneously write to the same index.
> >>
> >> org.apache.solr.client.solrj.SolrServerException:
> >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
> >> enclosing request can not be repeated.
> >>        at
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.re

Re: Custom DIH: FileDataSource with additional business logic?

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
FileDataSource is of type Reader . means getData() returns
ajava.io.Reader.That is not very suitable for you.

your best bet is to write a simple DataSource  which returns an
Iterator> after reading the serialized Objects
.This is what JdbcdataSource does. Then you can use it with
SqlEntityProcessor

On Thu, Apr 9, 2009 at 9:42 PM, Giovanni De Stefano
 wrote:
> Hello,
>
> here I am with another question.
>
> I am using DIH to index a DB. Additionally I also have to index some files
> containing Java serialized objects (and I cannot change this... :-( ).
>
> I currently have implemented a standalone Java app with the following
> features:
>
> 1) read all files from a given folder
> 2) deserialize the files into lists of items
> 3) convert the list of items into lists of SolrInputDocument(s)
> 4) post the lists of SolrInputDocument(s) to Solr
>
> All this is done using SolrJ. So far so good.
>
> I would like to use a DIH with a FileDataSource to do 1) and 4), and I would
> like to "squeeze" in my implementation for 2) and 3).
>
> Is this possible? Any hint?
>
> Thank you all in advance.
>
> Cheers,
> Giovanni
>



-- 
--Noble Paul


Re: Access HTTP headers from custom request handler

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
well unfortunately , no.

Solr cannot assume that the request would always come from http (think
of EmbeddedSolrServer) .So it assumes that there are only parameters
Your best bet is to modify SolrDispatchFilter and readthe params and
set them in the SolrRequest Object

or you can just write a Filter before SolrDispatchFIlter and set the
current httrequest object into a threadlocal



On Thu, Apr 9, 2009 at 6:27 PM, Giovanni De Stefano
 wrote:
> Hello all,
>
> we are writing a custom request handler and we need to implement some
> business logic according to some HTTP headers.
>
> I see there is no easy way to access HTTP headers from the request handler.
>
> Moreover it seems to me that the HTTPServletness is lost way before the
> custom request handler comes in the game.
>
> Is there any way to access HTTP headers from within the request handler?
>
> Thanks,
> Giovanni
>



-- 
--Noble Paul


Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread vivek sar
I'm inserting 10K in a batch (using addBeans method). I read somewhere
in the wiki that it's better to use the same instance of SolrServer
for better performance. Would MultiThreadedConnectionManager help? How
do I use it?

I also wanted to know how can use EmbeddedSolrServer - does my app
needs to be running in the same jvm with Solr webapp?

Thanks,
-vivek

2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् :
> how many documents are you inserting ?
> may be you can create multiple instances of CommonshttpSolrServer and
> upload in parallel
>
>
> On Thu, Apr 9, 2009 at 11:58 AM, vivek sar  wrote:
>> Thanks Shalin and Paul.
>>
>> I'm not using MultipartRequest. I do share the same SolrServer between
>> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
>> simply using CommonsHttpSolrServer to create the SolrServer. I've also
>> tried StreamingUpdateSolrServer, which works much faster, but does
>> throws "connection reset" exception once in a while.
>>
>> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
>> anything on it on Wiki.
>>
>> I was also thinking of using EmbeddedSolrServer - in what case would I
>> be able to use it? Does my application and the Solr web app need to
>> run into the same JVM for this to work? How would I use the
>> EmbeddedSolrServer?
>>
>> Thanks,
>> -vivek
>>
>>
>> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
>>  wrote:
>>> Vivek, do you share the same SolrServer instance between your two threads?
>>> If so, are you using the MultiThreadedHttpConnectionManager when creating
>>> the HttpClient instance?
>>>
>>> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar  wrote:
>>>
 single thread everything works fine. Two threads are fine too for a
 while and all the sudden problem starts happening.

 I tried indexing using REST services as well (instead of Solrj), but
 with that too I get following error after a while,

 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
 indexData()-> Failed to index
 java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at
 org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
        at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
         at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
        at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)


 Note, I'm using "simple" lock type. I'd tried "single" type before
 that once caused index corruption so I switched to "simple".

 Thanks,
 -vivek

 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् :
 > do you see the same problem when you use a single thread?
 >
 > what is the version of SolrJ that you use?
 >
 >
 >
 > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar  wrote:
 >> Hi,
 >>
 >>  Any ideas on this issue? I ran into this again - once it starts
 >> happening it keeps happening. One of the thread keeps failing. Here
 >> are my SolrServer settings,
 >>
 >>        int socketTO = 0;
 >>        int connectionTO = 100;
 >>        int maxConnectionPerHost = 10;
 >>        int maxTotalConnection = 50;
 >>        boolean followRedirects = false;
 >>        boolean allowCompression = true;
 >>        int maxRetries = 1;
 >>
 >> Note, I'm using two threads to simultaneously write to the same index.
 >>
 >> org.apache.solr.client.solrj.SolrServerException:
 >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
 >> enclosing request can not be repeated.
 >>        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
 >>        at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
 >>        at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
 >>        at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
 >>        at
 org.apac

Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar
 Attached is the solr.xml - note, the schema and solrconfig are
located in the core0 and all other cores point to the same core0
instance for schema.

Searches on individual cores work fine so I'm using the solr.xml is
correct - I also get their status correctly. From the
"NullPointerException" it seems it fails at,

 for (int i=resultSize-1; i>=0; i--) {
ShardDoc shardDoc = (ShardDoc)queue.pop();
shardDoc.positionInResponse = i;
// Need the toString() for correlation with other lists that must
// be strings (like keys in highlighting, explain, etc)
resultIds.put(shardDoc.id.toString(), shardDoc);
  }

I've a unique field (required) in my documents so I'm not sure whether
that can be null - could doc itself be null - how? Same search on the
same cores individually works fine. Not sure if there is a way to
debug this.

I'm not sure on when would I get "Connection reset" exception - would
it be if indexing is happening at the same time at hight rate - would
that cause problems?

Thanks,
-vivek


On Thu, Apr 9, 2009 at 4:07 AM, Fergus McMenemie  wrote:
>>Any help on this issue? Would distributed search on multi-core on same
>>Solr instance even work? Does it has to be different Solr instances
>>altogether (separate shards)?
>
> As best I can tell this works fine for me. Multiple cores on the one
> machine. Very different schema and solrconfig.xml for each of the
> cores. Distributed searching using shards works fine. But I am using
> the trunk version.
>
> Perhaps you should post your solr.xml file.
>
>>I'm kind of stuck at this point right now. Keep getting one of the two
>>errors (when running distributed search - single searches work fine)
>>as mentioned in this thread earlier.
>>
>>Thanks,
>>-vivek
>>
>>On Wed, Apr 8, 2009 at 1:57 AM, vivek sar  wrote:
>>> Thanks Fergus. I'm still having problem with multicore search.
>>>
>>> I tried the following with two cores (they both share the same schema
>>> and solrconfig.xml) on the same box on same solr instance,
>>>
>>> 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
>>> cores in admin interface
>>> 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores 
>>> in xml
>>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
>>> gives me top 10 records
>>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
>>> gives me top 10 records
>>> 5) 
>>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
>>>  - this FAILS. I've seen two problems with this.
>>>
>>>    a) When index are being committed I see,
>>>
>>> SEVERE: org.apache.solr.common.SolrException:
>>> org.apache.solr.client.solrj.SolrServerException:
>>> java.net.SocketException: Connection reset
>>>        at 
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
>>>        at 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>>        at 
>>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>>        at 
>>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>>        at 
>>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>>        at 
>>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>>        at 
>>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>>        at 
>>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>>        at 
>>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>>        at 
>>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>>        at 
>>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>>        at 
>>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>>        at 
>>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>>>        at 
>>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>>        at 
>>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>>        at java.lang.Thread.run(Thread.java:637)
>>>
>>>    b) Other times I see this,
>>>
>>> SEVERE: java.lang.NullPointerException
>>>        at 
>>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
>>>        at 
>>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
>>>        at 
>>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
>>>        at 
>>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.

Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar
Erik,

  Here is what I'd posted in this thread earlier,

I tried the following with two cores (they both share the same schema
and solrconfig.xml) on the same box on same solr instance,

1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
cores in admin interface
2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in xml
3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
gives me top 10 records
4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
gives me top 10 records
5) 
http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
 - this FAILS. I've seen two problems with this.



   a) This is the error most of the times,

SEVERE: java.lang.NullPointerException
   at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
   at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at java.lang.Thread.run(Thread.java:637)

b)  When index are being committed I see this during search,

SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at java.lang.Thread.run(Thread.java:637)

Any tips on how can I search on multicore on same solr instance?

Thanks,
-vivek

On Thu, Apr 9, 2009 at 2:56 AM, Erik Hatcher  wrote:
>
> On Apr 9, 2009, at 3:00 AM, vivek sar wrote:
>>
>>  Can someone please clear this up as I'm not
>> able to run distributed search on multi-cores.
>
> What error or problem are you encountering when trying this?  How are you
> trying it?
>
>        Erik
>
>


Custom DIH: FileDataSource with additional business logic?

2009-04-09 Thread Giovanni De Stefano
Hello,

here I am with another question.

I am using DIH to index a DB. Additionally I also have to index some files
containing Java serialized objects (and I cannot change this... :-( ).

I currently have implemented a standalone Java app with the following
features:

1) read all files from a given folder
2) deserialize the files into lists of items
3) convert the list of items into lists of SolrInputDocument(s)
4) post the lists of SolrInputDocument(s) to Solr

All this is done using SolrJ. So far so good.

I would like to use a DIH with a FileDataSource to do 1) and 4), and I would
like to "squeeze" in my implementation for 2) and 3).

Is this possible? Any hint?

Thank you all in advance.

Cheers,
Giovanni


Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Glen Newton
> - As per
> http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
Sorry, the presentation covers a lot of ground: see slide #20:
"Standard thread pools can have high contention for task queue and
other data structures when used with fine-grained tasks"
[I haven't yet implemented work stealing]

-glen

2009/4/9 Glen Newton :
> For Solr / Lucene:
> - use -XX:+AggressiveOpts
> - If available, huge pages can help. See
> http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html
>  I haven't yet followed-up with my Lucene performance numbers using
> huge pages: it is 10-15% for large indexing jobs.
>
> For Lucene:
> - multi-thread using java.util.concurrent.ThreadPoolExecutor
> (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
>  6.4 million full-text article + metadata indexed resulting in 83GB
> index; these are old number: things are down to ~10hours now)
> - while multithreading on multicore is particularly good, it also
> improves performance on single core, for small (<6 YMMV) numbers of
> threads & good I/O (test for your particular configuration)
> - Use multiple indexes & merge at the end
> - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
> use separate ThreadPoolExecutor  per index in previous, reducing queue
> contention. This is giving me an additional ~10%. I will blog about
> this in the near future...
>
> -glen
>
> 2009/4/9 sunnyfr :
>>
>> Hi Otis,
>> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
>> for 14M docs and 5 update every 30mn but my replication kill everything.
>> My segments are merged too often sor full index replicate and cache lost and
>>  I've no idea what can I do now?
>> Some help would be brilliant,
>> btw im using Solr 1.4.
>>
>> Thanks,
>>
>>
>> Otis Gospodnetic wrote:
>>>
>>> Mike is right about the occasional slow-down, which appears as a pause and
>>> is due to large Lucene index segment merging.  This should go away with
>>> newer versions of Lucene where this is happening in the background.
>>>
>>> That said, we just indexed about 20MM documents on a single 8-core machine
>>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
>>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>>> approach before some of our changes apparently required several days to
>>> index the same amount of data.
>>>
>>> Otis
>>> --
>>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>>
>>> - Original Message 
>>> From: Mike Klaas 
>>> To: solr-user@lucene.apache.org
>>> Sent: Monday, November 19, 2007 5:50:19 PM
>>> Subject: Re: Any tips for indexing large amounts of data?
>>>
>>> There should be some slowdown in larger indices as occasionally large
>>> segment merge operations must occur.  However, this shouldn't really
>>> affect overall speed too much.
>>>
>>> You haven't really given us enough data to tell you anything useful.
>>> I would recommend trying to do the indexing via a webapp to eliminate
>>> all your code as a possible factor.  Then, look for signs to what is
>>> happening when indexing slows.  For instance, is Solr high in cpu, is
>>> the computer thrashing, etc?
>>>
>>> -Mike
>>>
>>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>>
 Hi,

 Thanks for answering this question a while back. I have made some
 of the suggestions you mentioned. ie not committing until I've
 finished indexing. What I am seeing though, is as the index get
 larger (around 1Gb), indexing is taking a lot longer. In fact it
 slows down to a crawl. Have you got any pointers as to what I might
 be doing wrong?

 Also, I was looking at using MultiCore solr. Could this help in
 some way?

 Thank you
 Brendan

 On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:

>
> : I would think you would see better performance by allowing auto
> commit
> : to handle the commit size instead of reopening the connection
> all the
> : time.
>
> if your goal is "fast" indexing, don't use autoCommit at all ...
>>>  just
> index everything, and don't commit until you are completely done.
>
> autoCommitting will slow your indexing down (the benefit being
> that more
> results will be visible to searchers as you proceed)
>
>
>
>
> -Hoss
>

>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> View this message in context: 
>> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
>
> -
>



-- 

-


Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Glen Newton
For Solr / Lucene:
- use -XX:+AggressiveOpts
- If available, huge pages can help. See
http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html
 I haven't yet followed-up with my Lucene performance numbers using
huge pages: it is 10-15% for large indexing jobs.

For Lucene:
- multi-thread using java.util.concurrent.ThreadPoolExecutor
(http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
  6.4 million full-text article + metadata indexed resulting in 83GB
index; these are old number: things are down to ~10hours now)
- while multithreading on multicore is particularly good, it also
improves performance on single core, for small (<6 YMMV) numbers of
threads & good I/O (test for your particular configuration)
- Use multiple indexes & merge at the end
- As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
use separate ThreadPoolExecutor  per index in previous, reducing queue
contention. This is giving me an additional ~10%. I will blog about
this in the near future...

-glen

2009/4/9 sunnyfr :
>
> Hi Otis,
> How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
> for 14M docs and 5 update every 30mn but my replication kill everything.
> My segments are merged too often sor full index replicate and cache lost and
>  I've no idea what can I do now?
> Some help would be brilliant,
> btw im using Solr 1.4.
>
> Thanks,
>
>
> Otis Gospodnetic wrote:
>>
>> Mike is right about the occasional slow-down, which appears as a pause and
>> is due to large Lucene index segment merging.  This should go away with
>> newer versions of Lucene where this is happening in the background.
>>
>> That said, we just indexed about 20MM documents on a single 8-core machine
>> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
>> a little less than 10 hours - that's over 550 docs/second.  The vanilla
>> approach before some of our changes apparently required several days to
>> index the same amount of data.
>>
>> Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>> - Original Message 
>> From: Mike Klaas 
>> To: solr-user@lucene.apache.org
>> Sent: Monday, November 19, 2007 5:50:19 PM
>> Subject: Re: Any tips for indexing large amounts of data?
>>
>> There should be some slowdown in larger indices as occasionally large
>> segment merge operations must occur.  However, this shouldn't really
>> affect overall speed too much.
>>
>> You haven't really given us enough data to tell you anything useful.
>> I would recommend trying to do the indexing via a webapp to eliminate
>> all your code as a possible factor.  Then, look for signs to what is
>> happening when indexing slows.  For instance, is Solr high in cpu, is
>> the computer thrashing, etc?
>>
>> -Mike
>>
>> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
>>
>>> Hi,
>>>
>>> Thanks for answering this question a while back. I have made some
>>> of the suggestions you mentioned. ie not committing until I've
>>> finished indexing. What I am seeing though, is as the index get
>>> larger (around 1Gb), indexing is taking a lot longer. In fact it
>>> slows down to a crawl. Have you got any pointers as to what I might
>>> be doing wrong?
>>>
>>> Also, I was looking at using MultiCore solr. Could this help in
>>> some way?
>>>
>>> Thank you
>>> Brendan
>>>
>>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>>

 : I would think you would see better performance by allowing auto
 commit
 : to handle the commit size instead of reopening the connection
 all the
 : time.

 if your goal is "fast" indexing, don't use autoCommit at all ...
>>  just
 index everything, and don't commit until you are completely done.

 autoCommitting will slow your indexing down (the benefit being
 that more
 results will be visible to searchers as you proceed)




 -Hoss

>>>
>>
>>
>>
>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 

-


Re: Any tips for indexing large amounts of data?

2009-04-09 Thread sunnyfr

Hi Otis,
How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
for 14M docs and 5 update every 30mn but my replication kill everything. 
My segments are merged too often sor full index replicate and cache lost and
 I've no idea what can I do now?
Some help would be brilliant,
btw im using Solr 1.4.

Thanks,


Otis Gospodnetic wrote:
> 
> Mike is right about the occasional slow-down, which appears as a pause and
> is due to large Lucene index segment merging.  This should go away with
> newer versions of Lucene where this is happening in the background.
> 
> That said, we just indexed about 20MM documents on a single 8-core machine
> with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
> a little less than 10 hours - that's over 550 docs/second.  The vanilla
> approach before some of our changes apparently required several days to
> index the same amount of data.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> - Original Message 
> From: Mike Klaas 
> To: solr-user@lucene.apache.org
> Sent: Monday, November 19, 2007 5:50:19 PM
> Subject: Re: Any tips for indexing large amounts of data?
> 
> There should be some slowdown in larger indices as occasionally large  
> segment merge operations must occur.  However, this shouldn't really  
> affect overall speed too much.
> 
> You haven't really given us enough data to tell you anything useful.   
> I would recommend trying to do the indexing via a webapp to eliminate  
> all your code as a possible factor.  Then, look for signs to what is  
> happening when indexing slows.  For instance, is Solr high in cpu, is  
> the computer thrashing, etc?
> 
> -Mike
> 
> On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
> 
>> Hi,
>>
>> Thanks for answering this question a while back. I have made some  
>> of the suggestions you mentioned. ie not committing until I've  
>> finished indexing. What I am seeing though, is as the index get  
>> larger (around 1Gb), indexing is taking a lot longer. In fact it  
>> slows down to a crawl. Have you got any pointers as to what I might  
>> be doing wrong?
>>
>> Also, I was looking at using MultiCore solr. Could this help in  
>> some way?
>>
>> Thank you
>> Brendan
>>
>> On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:
>>
>>>
>>> : I would think you would see better performance by allowing auto  
>>> commit
>>> : to handle the commit size instead of reopening the connection  
>>> all the
>>> : time.
>>>
>>> if your goal is "fast" indexing, don't use autoCommit at all ...
>  just
>>> index everything, and don't commit until you are completely done.
>>>
>>> autoCommitting will slow your indexing down (the benefit being  
>>> that more
>>> results will be visible to searchers as you proceed)
>>>
>>>
>>>
>>>
>>> -Hoss
>>>
>>
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Snapinstaller vs Solr Restart

2009-04-09 Thread sunnyfr

Hi Otis,

Ok about that, but still when it merges segments it changes names and I've
no choice to replicate all the segment which is bad for the replication and
cpu. ??

Thanks


Otis Gospodnetic wrote:
> 
> Lower your mergeFactor and Lucene will merge segments(i.e. fewer index
> files) and purge deletes more often for you at the expense of somewhat
> slower indexing.
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> - Original Message 
>> From: wojtekpia 
>> To: solr-user@lucene.apache.org
>> Sent: Tuesday, January 6, 2009 5:18:26 PM
>> Subject: Re: Snapinstaller vs Solr Restart
>> 
>> 
>> I'm optimizing because I thought I should. I'll be updating my index
>> somewhere between every 15 minutes, and every 2 hours. That means between
>> 12
>> and 96 updates per day. That seems like a lot of index files (and it
>> scared
>> me a little), so that's my second reason for wanting to optimize nightly.
>> 
>> I haven't benchmarked the performance hit for not optimizing. That'll be
>> my
>> next step. If the hit isn't too bad, I'll look into optimizing less
>> frequently (weekly, ...).
>> 
>> Thanks Otis!
>> 
>> 
>> Otis Gospodnetic wrote:
>> > 
>> > OK, so that question/answer seems to have hit the nail on the head.  :)
>> > 
>> > When you optimize your index, all index files get rewritten.  This
>> means
>> > that everything that the OS cached up to that point goes out the window
>> > and the OS has to slowly re-cache the hot parts of the index.  If you
>> > don't optimize, this won't happen.  Do you really need to optimize?  Or
>> > maybe a more direct question: why are you optimizing?
>> > 
>> > 
>> > Regarding autowarming, with such high fq hit rate, I'd make good use of
>> fq
>> > autowarming.  The result cache rate is lower, but still decent.  I
>> > wouldn't turn off autowarming the way you have.
>> > 
>> > 
>> 
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p22972780.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Exception while solr commit

2009-04-09 Thread Michael McCandless
This is a spooky exception.

Committing after every update will give very poor performance, but
should be "fine" (ie, not cause exceptions like this).

What filesystem are you on?  Is there any possibility that two writers
are open against the same index?  Is this easily reproduced?

Mike

On Wed, Apr 8, 2009 at 2:13 PM, Narayanan, Karthikeyan
 wrote:
>
> Hello,
>         I am calling commit for every record (document) added/updated
> to the index.   Our number of records size is < 50k.  Getting the
> following exception during commit. Is it correct approach
> to call commit for every insert/update?.
>
> Apr 7, 2009 4:41:23 PM org.apache.solr.handler.dataimport.SolrWriter
> commit
> SEVERE: Exception while solr commit.
> java.lang.RuntimeException: after flush: fdx size mismatch: 20096 docs
> vs 65536 length in bytes of _6.fdx
>        at
> org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWri
> ter.java:94)
>        at
> org.apache.lucene.index.DocFieldConsumers.closeDocStore(DocFieldConsumer
> s.java:83)
>        at
> org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcesso
> r.java:47)
>        at
> org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.ja
> va:367)
>        at
> org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1774
> )
>        at
> org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3600)
>        at
> org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:4151)
>        at
> org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:4031)
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeSc
> heduler.java:176)
>        at
> org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2485)
>        at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2332)
>        at
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280)
>        at
> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.
> java:355)
>        at
> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpd
> ateProcessorFactory.java:77)
>        at
> org.apache.solr.handler.dataimport.SolrWriter.commit(SolrWriter.java:180
> )
>        at
> org.apache.solr.handler.dataimport.DocBuilder.commit(DocBuilder.java:168
> )
>        at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:15
> 2)
>        at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporte
> r.java:334)
>        at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java
> :386)
>        at
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:
> 377)
> Apr 7, 2009 4:41:23 PM org.apache.solr.handler.dataimport.DocBuilder
> execute
>
>
>
> Thanks.
>
> Karthik
>


Access HTTP headers from custom request handler

2009-04-09 Thread Giovanni De Stefano
Hello all,

we are writing a custom request handler and we need to implement some
business logic according to some HTTP headers.

I see there is no easy way to access HTTP headers from the request handler.

Moreover it seems to me that the HTTPServletness is lost way before the
custom request handler comes in the game.

Is there any way to access HTTP headers from within the request handler?

Thanks,
Giovanni


Re: Dataimporthandler + MySQL = Datetime offset by 2 hours ?

2009-04-09 Thread Shalin Shekhar Mangar
On Thu, Apr 9, 2009 at 6:18 PM, gateway0  wrote:

>
> Hi,
>
> im fetching entries from my mysql database and index them with the
> Dataimporthandler:
>
> MySQL Table entry: (for example)
> pr_timedate : 2009-04-14 11:00:00
>
> entry in data-config.xml to index the mysql field:
>  dateTimeFormat="-MM-dd'T'hh:mm:ss'Z'" />
>
> result in solr index:
> 2009-04-14T09:00:00Z:confused:
>
> it says 09:00:00 instead of 11:00:00 as it supposed to.
>
> I´ve searched for hours already, why is that?
>

I think that may be because date/time in Solr is supposed to be in UTC. See
the note on DateField in the schema.xml
-- 
Regards,
Shalin Shekhar Mangar.


Dataimporthandler + MySQL = Datetime offset by 2 hours ?

2009-04-09 Thread gateway0

Hi,

im fetching entries from my mysql database and index them with the
Dataimporthandler:

MySQL Table entry: (for example)
pr_timedate : 2009-04-14 11:00:00 

entry in data-config.xml to index the mysql field:


result in solr index:
2009-04-14T09:00:00Z:confused:

it says 09:00:00 instead of 11:00:00 as it supposed to. 

I´ve searched for hours already, why is that?

best wishes, Sebastian
-- 
View this message in context: 
http://www.nabble.com/Dataimporthandler-%2B-MySQL-%3D-Datetime-offset-by-2-hours---tp22970250p22970250.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Using constants with DataImportHandler and MySQL ?

2009-04-09 Thread gateway0

Here´s the solution:



 

 

just insert a dummy sql field 'dataci_project' in your select statement. 



Glen Newton wrote:
> 
> In MySql at least, you can do achieve what I think you want by
> manipulating the SQL, like this:
> 
> mysql> select "foo" as Constant1, id from Article limit 10;
> select "foo" as Constant1, id from Article limit 10;
> +---++
> | Constant1 | id |
> +---++
> | foo   |  1 |
> | foo   |  2 |
> | foo   |  3 |
> | foo   |  4 |
> | foo   |  5 |
> | foo   |  6 |
> | foo   |  7 |
> | foo   |  8 |
> | foo   |  9 |
> | foo   | 10 |
> +---++
> 10 rows in set (0.00 sec)
> 
> mysql> select 435 as Constant2, id from Article limit 10;
> select 435 as Constant2, id from Article limit 10;
> +---++
> | Constant2 | id |
> +---++
> |   435 |  1 |
> |   435 |  2 |
> |   435 |  3 |
> |   435 |  4 |
> |   435 |  5 |
> |   435 |  6 |
> |   435 |  7 |
> |   435 |  8 |
> |   435 |  9 |
> |   435 | 10 |
> +---++
> 10 rows in set (0.00 sec)
> 
> mysql>
> 
> 2009/4/8 Shalin Shekhar Mangar :
>> On Wed, Apr 8, 2009 at 10:23 PM, gateway0  wrote:
>>
>>>
>>> The problem as you see is the line:
>>> "Projects"
>>>
>>> I want to set a constant value for every row in the SQL table but it
>>> doesn´t
>>> work that way, any ideas?
>>>
>>
>> That is not a valid syntax.
>>
>> There are two ways to do this:
>> 1. In your schema.xml provide the 'default' attribute
>> 2. Use TemplateTransformer - see
>> http://wiki.apache.org/solr/DataImportHandlerFaq
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
> 
> 
> 
> -- 
> 
> -
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Using-constants-with-DataImportHandler-and-MySQL---tp22954954p22969123.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multi-language support

2009-04-09 Thread revas
Hi,

To reframe my earlier question

Some languages have just analyzers only but nostemmer from snowball
porter,then does the analyzer take care of stemming as well?

Some languages only have the stemmer from snowball but no analyzer?

Some have both.

Can we say then that solr supports all the above languages .Will search be
same across all the above cases?

thanks
revas


Re: Searching on mulit-core Solr

2009-04-09 Thread Fergus McMenemie
>Any help on this issue? Would distributed search on multi-core on same
>Solr instance even work? Does it has to be different Solr instances
>altogether (separate shards)?

As best I can tell this works fine for me. Multiple cores on the one
machine. Very different schema and solrconfig.xml for each of the 
cores. Distributed searching using shards works fine. But I am using
the trunk version.

Perhaps you should post your solr.xml file.

>I'm kind of stuck at this point right now. Keep getting one of the two
>errors (when running distributed search - single searches work fine)
>as mentioned in this thread earlier.
>
>Thanks,
>-vivek
>
>On Wed, Apr 8, 2009 at 1:57 AM, vivek sar  wrote:
>> Thanks Fergus. I'm still having problem with multicore search.
>>
>> I tried the following with two cores (they both share the same schema
>> and solrconfig.xml) on the same box on same solr instance,
>>
>> 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
>> cores in admin interface
>> 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in 
>> xml
>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
>> gives me top 10 records
>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
>> gives me top 10 records
>> 5) 
>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
>>  - this FAILS. I've seen two problems with this.
>>
>>    a) When index are being committed I see,
>>
>> SEVERE: org.apache.solr.common.SolrException:
>> org.apache.solr.client.solrj.SolrServerException:
>> java.net.SocketException: Connection reset
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>        at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>        at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>>        at 
>> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
>>        at 
>> org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
>>        at java.lang.Thread.run(Thread.java:637)
>>
>>    b) Other times I see this,
>>
>> SEVERE: java.lang.NullPointerException
>>        at 
>> org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
>>        at 
>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at 
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>>        at 
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
>>        at 
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
>>        at 
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>>        at 
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
>>        at 
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
>>        at 
>> org.apac

Re: different scoring for different types of found documents

2009-04-09 Thread Shalin Shekhar Mangar
On Thu, Apr 9, 2009 at 2:17 PM, Andrey Klochkov
wrote:

>
> So we're searching through the product catalog. Product have types (i.e.
> "Electronics", "Apparel", "Furniture" etc). What we need is to customize
> scoring of the results so that top results should contain products of all
> different types which match the query. So after finding all the products
> matching the query we want to group results by product type.


This is something similar to Field Collapsing. It is not committed to trunk
but there are a few patches.

https://issues.apache.org/jira/browse/SOLR-236


> Then for every
> product type take corresponding sub-set of results and in every of the
> sub-sets assign scores with the following logic. Assign score 5 to the
> first
> 20% of results, then assign score 4 to the next 15% of results, and so on.
> Particular percent values are configured by the end user. How could we
> achive it using Solr? Is it possible at all? Maybe we should implement some
> custom ValueSource and use it in a function queries?
>

Such kind of scoring is not possible out of the box. You need to assign
scores according to where the document lies in the final list of results
(after all filters are applied), therefore you may not be able to operate on
the DocList directly or in the value source. I *think* a good place to start
looking would be the QueryValueSource in trunk as it has access to the
scorer. But I do not know much about these things.
-- 
Regards,
Shalin Shekhar Mangar.


Re: solr 1.4 facet boost field according to another field

2009-04-09 Thread Shalin Shekhar Mangar
I don't think conditional boosting is possible. You can boost the same field
on which the match was found. But you cannot boost a different field.

On Thu, Apr 9, 2009 at 2:05 PM, sunnyfr  wrote:

>
> Do you have an idea ?
>
>
>
> sunnyfr wrote:
> >
> > Hi,
> >
> > I've title description and tag field ... According to where I find the
> > word searched, I would like to boost differently other field like
> nb_views
> > or rating.
> >
> > if word is find in title then nb_views^10 and rating^10
> > if word is find in description then nb_views^2 and rating^2
> >
> > Thanks a lot for your help,
> >
>
> --
> View this message in context:
> http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: Searching on mulit-core Solr

2009-04-09 Thread Erik Hatcher


On Apr 9, 2009, at 3:00 AM, vivek sar wrote:

 Can someone please clear this up as I'm not
able to run distributed search on multi-cores.


What error or problem are you encountering when trying this?  How are  
you trying it?


Erik



Re: Its urgent! plz help in schema.xml- appending one field to another

2009-04-09 Thread Erik Hatcher


On Apr 8, 2009, at 9:50 PM, Udaya wrote:



Hi,
Need your help,
I would like to know how we could append or add one field value to  
another

field in Scheme.xml
My scheme is as follows (only the field part is given):
Scheme.xml


  stored="true"

required="true"/>
  

   default="http://comp.com/portals/ForumWindow? 
action=1&v=t&p="topics_id"#"topics_id""

/>
   

Here for the field with name "topics_id" we get id from a table. I  
what his
topics_id value to be appended into the default value attribute of  
the field

with name "url".

For eg:
Suppose if we get topics_id value as 512 during a search then the  
value of

the url should be appended as
http://comp.com/portals/JBossForumWindow?action=1&v=t&p=512#512

Is this possible, plz give me some suggestions.


If you're using DIH to index your table, you could aggregate using the  
template transformer during indexing.


If you're indexing a different way, why not let the searching client  
(UI) do the aggregation of an id into a URL?


Erik



Analyzers and stemmer

2009-04-09 Thread revas
Hi ,

  With respect to language support in solr ,we have analyzers for some
languages and stemmers for certain langauges.Do we say that solr supports
this particular language only if we have both analyzer and stemmer for the
language or also for which we have analyzer but not stemmer

Regards
Sujatha


different scoring for different types of found documents

2009-04-09 Thread Andrey Klochkov
Hi,

We have a quite complex requirement concerning scoring logic customization,
but but I guess it's quite useful and probably something like it  was done
already.

So we're searching through the product catalog. Product have types (i.e.
"Electronics", "Apparel", "Furniture" etc). What we need is to customize
scoring of the results so that top results should contain products of all
different types which match the query. So after finding all the products
matching the query we want to group results by product type. Then for every
product type take corresponding sub-set of results and in every of the
sub-sets assign scores with the following logic. Assign score 5 to the first
20% of results, then assign score 4 to the next 15% of results, and so on.
Particular percent values are configured by the end user. How could we
achive it using Solr? Is it possible at all? Maybe we should implement some
custom ValueSource and use it in a function queries?

-- 
Andrew Klochkov


Re: solr 1.4 facet boost field according to another field

2009-04-09 Thread sunnyfr

Do you have an idea ?



sunnyfr wrote:
> 
> Hi,
> 
> I've title description and tag field ... According to where I find the
> word searched, I would like to boost differently other field like nb_views
> or rating.
> 
> if word is find in title then nb_views^10 and rating^10
> if word is find in description then nb_views^2 and rating^2
> 
> Thanks a lot for your help,
> 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.4 memory jvm

2009-04-09 Thread sunnyfr

Hi Noble,

Yes exactly that,
I would like to know how people do during a replication ?
Do they turn off servers and put a high autowarmCount which turn off the
slave for a while like for my case, 10mn to bring back the new index and
then autowarmCount maybe 10 minutes more.

Otherwise I tried to put large number of mergefactor but I guess I've too
much update every 30mn something like 2000docs and almost all segment are
modified.

What would you reckon? :(  :)

Thanks a lot Noble 


Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> So what I decipher from the numbers is w/o queries Solr replication is
> not performing too badly. The queries are inherently slow and you wish
> to optimize the query performance itself.
> am I correct?
> 
> On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr  wrote:
>>
>> Hi,
>>
>> So I did two test on two servers;
>>
>> First server : with just replication every 20mn like you can notice:
>> http://www.nabble.com/file/p22930179/cpu_without_request.png
>> cpu_without_request.png
>> http://www.nabble.com/file/p22930179/cpu2_without_request.jpg
>> cpu2_without_request.jpg
>>
>> Second server : with one first replication and a second one during query
>> test: between 15:32pm and 15h41
>> during replication (checked on .../admin/replication/index.jsp) my
>> respond
>> time query at the end was around 5000msec
>> after the replication I guess during commitment I couldn't get answer of
>> my
>> query for a long time, I refreshed my page few minutes after.
>> http://www.nabble.com/file/p22930179/cpu_with_request.png
>> cpu_with_request.png
>> http://www.nabble.com/file/p22930179/cpu2_with_request.jpg
>> cpu2_with_request.jpg
>>
>> Now without replication I kept going query on the second server, and I
>> can't
>> get better than
>> 1000msec repond time and 11request/second.
>> http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg
>>
>> This is my request :
>> select?fl=id&fq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1&json.nl=map&wt=json&start=0&version=1.2&bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5&bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15&rows=100&qt=dismax&qf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5
>>
>> Do you have advice ?
>>
>> Thanks Noble
>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22966630.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
how many documents are you inserting ?
may be you can create multiple instances of CommonshttpSolrServer and
upload in parallel


On Thu, Apr 9, 2009 at 11:58 AM, vivek sar  wrote:
> Thanks Shalin and Paul.
>
> I'm not using MultipartRequest. I do share the same SolrServer between
> two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
> simply using CommonsHttpSolrServer to create the SolrServer. I've also
> tried StreamingUpdateSolrServer, which works much faster, but does
> throws "connection reset" exception once in a while.
>
> Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
> anything on it on Wiki.
>
> I was also thinking of using EmbeddedSolrServer - in what case would I
> be able to use it? Does my application and the Solr web app need to
> run into the same JVM for this to work? How would I use the
> EmbeddedSolrServer?
>
> Thanks,
> -vivek
>
>
> On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
>  wrote:
>> Vivek, do you share the same SolrServer instance between your two threads?
>> If so, are you using the MultiThreadedHttpConnectionManager when creating
>> the HttpClient instance?
>>
>> On Wed, Apr 8, 2009 at 10:13 PM, vivek sar  wrote:
>>
>>> single thread everything works fine. Two threads are fine too for a
>>> while and all the sudden problem starts happening.
>>>
>>> I tried indexing using REST services as well (instead of Solrj), but
>>> with that too I get following error after a while,
>>>
>>> 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
>>> indexData()-> Failed to index
>>> java.net.SocketException: Broken pipe
>>>        at java.net.SocketOutputStream.socketWrite0(Native Method)
>>>        at
>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>>        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>>        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
>>>        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
>>>        at
>>> org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
>>>        at
>>> org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
>>>         at
>>> org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
>>>        at
>>> org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
>>>        at
>>> org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
>>>        at
>>> org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
>>>        at
>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
>>>        at
>>> org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
>>>
>>>
>>> Note, I'm using "simple" lock type. I'd tried "single" type before
>>> that once caused index corruption so I switched to "simple".
>>>
>>> Thanks,
>>> -vivek
>>>
>>> 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् :
>>> > do you see the same problem when you use a single thread?
>>> >
>>> > what is the version of SolrJ that you use?
>>> >
>>> >
>>> >
>>> > On Wed, Apr 8, 2009 at 1:19 PM, vivek sar  wrote:
>>> >> Hi,
>>> >>
>>> >>  Any ideas on this issue? I ran into this again - once it starts
>>> >> happening it keeps happening. One of the thread keeps failing. Here
>>> >> are my SolrServer settings,
>>> >>
>>> >>        int socketTO = 0;
>>> >>        int connectionTO = 100;
>>> >>        int maxConnectionPerHost = 10;
>>> >>        int maxTotalConnection = 50;
>>> >>        boolean followRedirects = false;
>>> >>        boolean allowCompression = true;
>>> >>        int maxRetries = 1;
>>> >>
>>> >> Note, I'm using two threads to simultaneously write to the same index.
>>> >>
>>> >> org.apache.solr.client.solrj.SolrServerException:
>>> >> org.apache.commons.httpclient.ProtocolException: Unbuffered entity
>>> >> enclosing request can not be repeated.
>>> >>        at
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
>>> >>        at
>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
>>> >>        at
>>> org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
>>> >>        at
>>> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
>>> >>        at
>>> org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
>>> >>
>>> >> Thanks,
>>> >> -vivek
>>> >>
>>> >> On Sat, Apr 4, 2009 at 1:07 AM, vivek sar  wrote:
>>> >>> Hi,
>>> >>>
>>> >>>  I'm sending 15K records at once using Solrj (server.addBeans(...))
>>> >>> and have two threads writing to same index. One thread goes fine, but
>>> >>> the second thread always fails with,
>>> >>>
>>> >>>
>>> >>> org.apache.solr.client.solrj.SolrServerException:
>>> >>> org.apache.commons.httpclient.ProtocolException: Unb

Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar
Hi,

   I've gone through the mailing archive and have read contradicting
remarks on this issue. Can someone please clear this up as I'm not
able to run distributed search on multi-cores. Is there any document
on how can I search across multicore which share the same schema. Here
are the various comments I've read on this mailing list,

1) http://www.nabble.com/multi-core-vs-multi-app-td15803781.html#a15803781
Don't think you can search against multiple cores "automatically" -
i.e. got to make multiple queries, one for each core and combine
results yourself. Yes, this will slow things down.   - Otis

2) 
http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-td20356173.html#a20356173
The idea behind multicore is that you will use them if you have completely
different type of documents (basically multiple schemas). - Shalin

3) http://www.nabble.com/Distributed-search-td22036229.html#a22036229
That should work, yes, though it may not be a wise thing to do
performance-wise, if the number of CPU cores that solr server has is
lower than the number of Solr cores. - Otis

My only motivation behind using multi-core is to keep the index size
in limit. All my cores are using the same schema. My index grow to
over 30G within a day and I need to keep up to a year of data.  I
couldn't find any other way of scaling using Solr. I've noticed once
the index grows above 10G the index process starts slowing down, the
commit takes much longer and optimize is hard to finish. So, I'm
trying to create a new core after every 10 million documents (equals
to 10G in my case). I don't want to start new Solr instance every 10G
- that won't scale for a year time. I'm going to use 3-4 servers to
hold all these cores.

Now if someone could please tell me if this is a wrong scaling
architecture I could re-think. I want fast indexing at the same time
fast enough search. If I've to search on each core separately and
merge myself the search performance is going to be awful.

Is Solr the right tool for managing billions of records (I can get up
to 100million records every day - with 1Kb per record - 100GB of index
a day)? Most of the field values are pretty distinct (like  10 million
email addresses) so the index size would be huge too.

I would think it's a common problem to scale huge size index keeping
both indexing and search time acceptable. I'm not sure if this can be
managed on just 4 servers - we don't have 100s of boxes for this
project. Any other tool that might be more appropriate for this kind
of case - like Katta or Lucene on Hadoop, or simply use Lucene using
Parallel Search and partition the indexes on size?

Thanks,
-vivek

On Wed, Apr 8, 2009 at 11:07 AM, vivek sar  wrote:
> Any help on this issue? Would distributed search on multi-core on same
> Solr instance even work? Does it has to be different Solr instances
> altogether (separate shards)?
>
> I'm kind of stuck at this point right now. Keep getting one of the two
> errors (when running distributed search - single searches work fine)
> as mentioned in this thread earlier.
>
> Thanks,
> -vivek
>
> On Wed, Apr 8, 2009 at 1:57 AM, vivek sar  wrote:
>> Thanks Fergus. I'm still having problem with multicore search.
>>
>> I tried the following with two cores (they both share the same schema
>> and solrconfig.xml) on the same box on same solr instance,
>>
>> 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
>> cores in admin interface
>> 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in 
>> xml
>> 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
>> gives me top 10 records
>> 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
>> gives me top 10 records
>> 5) 
>> http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3&indent=true&q=japan
>>  - this FAILS. I've seen two problems with this.
>>
>>    a) When index are being committed I see,
>>
>> SEVERE: org.apache.solr.common.SolrException:
>> org.apache.solr.client.solrj.SolrServerException:
>> java.net.SocketException: Connection reset
>>        at 
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
>>        at 
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
>>        at 
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>>        at 
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>>        at 
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>>        at 
>> org.apache.catalina.core.StandardContextVa