Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-07 Thread diyun2008
*I have installed solr cloud with solr4.4 and zookeeper 3.4.5. 
And I'm testing some requirements with 10k collections supporting in one
solr server.
When I post collection to solr
server(admin/collections?action=CREATEname=europetest${loopcnt}numShards=2replicationFactor=2maxShardsPerNode=2)
with jmeter, 
I found every time when collections number reached 600+, Solr and zookeeper
will not work correctly.

I checked logs. Here's Solr logs:*
07:56:01,149 ERROR SolrException: null:org.apache.solr.common.SolrException:
createcollection the collection time out:60s
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:175)
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156)
at
org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:781)

07:57:23,523 ERROR SolrException: org.apache.solr.common.SolrException:
createcollection the collection error [Watcher fired on path:
/overseer/collection-queue-work/qnr-001590 state: SyncConnected type
NodeDeleted]
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:178)
at
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156)
at
org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290)
at
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at

Re: charfilter doesn't do anything

2013-09-07 Thread Erick Erickson
Hmmm, have you looked at:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

Not quite the body, perhaps, but might it help?


On Fri, Sep 6, 2013 at 11:33 AM, Andreas Owen a...@conx.ch wrote:

 ok i have html pages with html.!--body--content i
 want!--/body--./html. i want to extract (index, store) only
 that between the body-comments. i thought regexTransformer would be the
 best because xpath doesn't work in tika and i cant nest a
 xpathEntetyProcessor to use xpath. what i have also found out is that the
 htmlparser from tika cuts my body-comments out and tries to make well
 formed html, which i would like to switch off.

 On 6. Sep 2013, at 5:04 PM, Shawn Heisey wrote:

  On 9/6/2013 7:09 AM, Andreas Owen wrote:
  i've managed to get it working if i use the regexTransformer and string
 is on the same line in my tika entity. but when the string is multilined it
 isn't working even though i tried ?s to set the flag dotall.
 
  entity name=tika processor=TikaEntityProcessor url=${rec.url}
 dataSource=dataUrl onError=skip htmlMapper=identity format=html
 transformer=RegexTransformer
   field column=text_html regex=lt;bodygt;(.+)lt;/bodygt;
 replaceWith=QQQ sourceColName=text  /
  /entity
 
  then i tried it like this and i get a stackoverflow
 
  field column=text_html regex=lt;bodygt;((.|\n|\r)+)lt;/bodygt;
 replaceWith=QQQ sourceColName=text  /
 
  in javascript this works but maybe because i only used a small string.
 
  Sounds like we've got an XY problem here.
 
  http://people.apache.org/~hossman/#xyproblem
 
  How about you tell us *exactly* what you'd actually like to have happen
  and then we can find a solution for you?
 
  It sounds a little bit like you're interested in stripping all the HTML
  tags out.  Perhaps the HTMLStripCharFilter?
 
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory
 
  Something that I already said: By using the KeywordTokenizer, you won't
  be able to search for individual words on your HTML input.  The entire
  input string is treated as a single token, and therefore ONLY exact
  entire-field matches (or certain wildcard matches) will be possible.
 
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeywordTokenizerFactory
 
  Note that no matter what you do to your data with the analysis chain,
  Solr will always return the text that was originally indexed in search
  results.  If you need to affect what gets stored as well, perhaps you
  need an Update Processor.
 
  Thanks,
  Shawn




Re: collections api setting dataDir

2013-09-07 Thread Erick Erickson
Did you try just specifying dataDir=blah? I haven't tried this, but the
notes for
the collections API indicate they're sugar around core creation commands,
see: http://wiki.apache.org/solr/CoreAdmin#CREATE

FWIW,
Erick


On Fri, Sep 6, 2013 at 4:23 PM, mike st. john mstj...@gmail.com wrote:

 is there any way to change the dataDir while creating a collection via the
 collection api?



Re: charfilter doesn't do anything

2013-09-07 Thread Jack Krupansky
For the second question, there is no multiline mode - the ends of lines are 
just white space characters. IOW, it is implicitly multi-line.


-- Jack Krupansky

-Original Message- 
From: Andreas Owen

Sent: Thursday, September 05, 2013 12:03 PM
To: solr-user@lucene.apache.org
Subject: charfilter doesn't do anything

i would like to filter / replace a word during indexing but it doesn't do 
anything and i dont get a error.


in schema.xml i have the following:

field name=text_html type=text_cutHtml indexed=true stored=true 
multiValued=true/


fieldType name=text_cutHtml class=solr.TextField
analyzer
 !--  tokenizer class=solr.StandardTokenizerFactory/ --
 charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=Zahlungsverkehr replacement=ASDFGHJK /

 tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
  /fieldType

my 2. question is where can i say that the expression is multilined like in 
javascript i can use /m at the end of the pattern? 



Re: Batch Solr Server

2013-09-07 Thread Erick Erickson
It's unclear to me why using the server.add(CollectionSolrInputDocument)
doesn't work for you.

bq:  which will create an UpdateRequest object for the entire
collection

Huh? Just call it with your batches, something like

ListSolrInputDocument list = new...
while (more docs) {
   list.add(doc);
   if ((list.size() % batch_size) == 0) {
   server.add(list);
   list.clear();
   }

}
if (list.size()  0) server.add(list);

Best,
Erick


On Fri, Sep 6, 2013 at 7:53 PM, gaoagong thricether...@gmail.com wrote:

 Does anyone know if there is such a thing as a BatchSolrServer object in
 the
 solrj code? I am currently using the ConcurrentUpdateSolrServer, but it
 isn't doing quite what I expected. It will distribute the load of sending
 through the http client through different threads and manage the
 connections, but it does not package the documents in bundles. This can be
 done manually by calling solrServer.add(CollectionSolrInputDocument
 documents), which will create an UpdateRequest object for the entire
 collection. When the ConcurrentUpdateSolrServer gets to this UpdateRequest
 it will send all of the documents together in a single http call.

 What I want to be able to do is call solrServer.add(SolInputDocument
 document) and have the SolrServer grab the next batch (up to a specified
 size) and then create an UpdateRequest. This would reduce the number of
 individual Requests the SOLR servers have to handle as well as any per http
 call overhead incurred.

 Would this kind of functionality be worth while to anyone else? Should I
 create such a SolrServer object?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Batch-Solr-Server-tp4088657.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr4.4 or zookeeper 3.4.5 do not support too many collections? more than 600?

2013-09-07 Thread Erick Erickson
Right, I _think_ that the use of ZK is limited to 1M and it looks like the
600th collection pushes the ZK state past 1M. 1024*1024 is
1,048,576 which is way suspiciously close to
1,048,971

At 600 collections you're pushing past this limit it looks like.
Not quite sure where it can be  changed. Here's a good discussion of this:
http://lucene.472066.n3.nabble.com/gt-1MB-file-to-Zookeeper-td3958614.html

Best,
Erick


On Sat, Sep 7, 2013 at 10:10 AM, diyun2008 diyun2...@gmail.com wrote:

 *I have installed solr cloud with solr4.4 and zookeeper 3.4.5.
 And I'm testing some requirements with 10k collections supporting in one
 solr server.
 When I post collection to solr

 server(admin/collections?action=CREATEname=europetest${loopcnt}numShards=2replicationFactor=2maxShardsPerNode=2)
 with jmeter,
 I found every time when collections number reached 600+, Solr and zookeeper
 will not work correctly.

 I checked logs. Here's Solr logs:*
 07:56:01,149 ERROR SolrException:
 null:org.apache.solr.common.SolrException:
 createcollection the collection time out:60s
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:175)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at

 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
 at

 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at

 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
 at

 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
 at

 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:781)

 07:57:23,523 ERROR SolrException: org.apache.solr.common.SolrException:
 createcollection the collection error [Watcher fired on path:
 /overseer/collection-queue-work/qnr-001590 state: SyncConnected type
 NodeDeleted]
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:178)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:156)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleCreateAction(CollectionsHandler.java:290)
 at

 org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:112)
 at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 at

 org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:611)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)
 at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
 at

 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at

 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at

 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
 at

 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
 at

 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at

 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
 at
 

Re: collections api setting dataDir

2013-09-07 Thread mike st. john
Thanks erick,

yes the collections api ignored it,what i ended up doing, was just
building out some fairness in regards to creating the cores and calling
coreadmin to create the cores, seemed to work ok.   Only issue i'm having
now, and i'm still investigating is subsequent queries are returning
different counts.


msj




On Sat, Sep 7, 2013 at 1:58 PM, Erick Erickson erickerick...@gmail.comwrote:

 Did you try just specifying dataDir=blah? I haven't tried this, but the
 notes for
 the collections API indicate they're sugar around core creation commands,
 see: http://wiki.apache.org/solr/CoreAdmin#CREATE

 FWIW,
 Erick


 On Fri, Sep 6, 2013 at 4:23 PM, mike st. john mstj...@gmail.com wrote:

  is there any way to change the dataDir while creating a collection via
 the
  collection api?
 



Re: Unknown attribute id in add:allowDups

2013-09-07 Thread Furkan KAMACI
I did not use the Pecl package and the problem maybe about that. I want to
ask that when you define your schema you indicate that:

*required=true*

However error says:

*allowDups*

for id field. So it seems that id is not a unique field for that package.
You may need to config anything else at that package or there maybe a bug
unrelated to Solr.



2013/9/7 Brian Robinson br...@socialsurgemedia.com

 Hello,
 I'm working with the Pecl package, with Solr 4.3.1. I have a doc defined
 in my schema where id is the uniqueKey,

 field name=id type=int indexed=true stored=true required=true
 multiValued=false /
 uniqueKeyid/uniqueKey

 I tried to add a doc to my index with the following code (simplified for
 the question):

 $client = new SolrClient($options);
 $doc = new SolrInputDocument();
 $doc-addField('id', 12345);
 $doc-addField('description', 'This is the content of the doc');
 $updateResponse = $client-addDocument($doc);

 When I do this, the doc is not added to the index, and I get the following
 error in the logs in admin

  Unknown attribute id in add:allowDups

 However, I noticed that if I change the field to type string:

 field name=id type=string indexed=true stored=true
 required=true multiValued=false /
 ...
 $doc-addField('id', '12345');

 the doc is added to the index, but I still get the error in the log.

 So first, I was wondering, is there some other way I should be setting
 this up so that id can be an int instead of a string?

 And then I was also wondering what this error is referring to. Is there
 some further way I need to define id? Or maybe define the uniqueKey
 differently?

 Any help would be much appreciated.
 Thanks,
 Brian



Re: Connection Established but waiting for response for a long time.

2013-09-07 Thread Furkan KAMACI
Could you give us more information about your other Jetty configurations?


2013/9/6 qungg qzheng1...@gmail.com

 Hi,

 I'm runing solr 4.0 but using legacy distributed search set up. I set the
 shards parameter for search, but indexing into each solr shards directly.
 The problem I have been experiencing is building connection with solr
 shards. If I run a query, by using wget, to get number of records from each
 individual shards (50 of them) sequentially, the request will hang at some
 shards (seems random). The wget log will say the connection is established
 but waiting for response. At that point I thought that the Solr shard might
 be under high load, but the strange behavior happens when I send another
 request to the send shard (using wget again) from another thread, the
 response comes back, and will trigger something in Solr to send back
 response for the first request I have sent before.

 This also happens in my daily indexing. If I send an commit, it will some
 times hangs. However, if I send another commit to the same shard, both
 commit will come back fine.

 I'm running Solr on stock jetty server, and sometime back my boss told me
 to
 set the maxIdleTime to 5000 for indexing purpose. I'm not sure if this
 have anything to do with the strange behavior that I'm seeing right now.

 Please help me resolve this issue.

 Thanks,
 Qun



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Connection-Established-but-waiting-for-response-for-a-long-time-tp4088587.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Indexing pdf files - question.

2013-09-07 Thread Furkan KAMACI
Could you show us logs you get when you start your web container?


2013/9/4 Nutan Shinde nutanshinde1...@gmail.com

 My solrconfig.xml is:



 requestHandler name=/update/extract
 class=solr.extraction.ExtractingRequestHandler 

 lst name=defaults

 str name=fmap.contentdesc/str   !-to map this field of my table
 which
 is defined as shown below in schem.xml--

 str name=lowernamestrue/str

 str name=uprefixattr_/str

 str name=captureAttrtrue/str

 /lst

 /requestHandler

 lib dir=../../extract regex=.*\.jar /



 Schema.xml:

 fields

 field name=doc_id type=integer indexed=true stored=true
 multiValued=false/

 field name=name type=text indexed=true stored=true
 multiValued=false/

 field name=path type=text indexed=true stored=true
 multiValued=false/

 field name=desc type=text_split indexed=true stored=true
 multiValued=false/

 /fields

 types

 fieldType name=string class=solr.StrField  /

 fieldType name=integer class=solr.IntField /

 fieldType name=text class=solr.TextField /

 fieldType name=text class=solr.TextField /

 /types

 dynamicField name=*_i  type=integer  indexed=true  stored=true/

 uniqueKeydoc_id/uniqueKey



 I have created extract directory and copied all required .jar and solr-cell
 jar files into this extract directory and given its path in lib tag in
 solrconfig.xml



 When I try out this:



 curl
 http://localhost:8080/solr/update/extract?literal.doc_id=1commit=true;

 -F myfile=@solr-word.pdf mailto:myfile=@solr-word.pdf   in Windows 7.



 I get /solr/update/extract is not available and sometimes I get access
 denied error.

 I tried resolving through net,but in vain.as all the solutions are related
 to linux os,im working on Windows.

 Please help me and provide solutions related o Windows os.

 I referred Apache_solr_4_Cookbook.

 Thanks a lot.




Re: Adding weight to location of the string found

2013-09-07 Thread Furkan KAMACI
Firstly, did you check here:
http://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/package-summary.html#package_description


2013/8/28 zseml zs...@hotmail.com

 In Solr syntax, is there a way to add weight to the result found based on
 the
 location of the string that it's found?

 For instance, if I'm searching these strings for Hello:

 Hello World
 World Hello

 ...I'd like the first result to be the first one in my search results.

 Additionally, is there a way to add weight based on the number of
 occurrences of a string that are found?  For instance, if I'm searching
 these strings for Hello:

 Hello World Hello
 Hello World

 ...again, I'd like the first result to be found.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Adding-weight-to-location-of-the-string-found-tp4086932.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Regarding reducing qtime

2013-09-07 Thread Furkan KAMACI
What is your question here?


2013/9/6 prabu palanisamy pr...@serendio.com

 Hi

 I am currently using solr -3.5.0 indexed by wikipedia dump (50 gb) with
 java 1.6. I am searching the tweets in the solr. Currently it takes average
 of 210 millisecond for each post, out of which 200 millisecond is consumed
 by solr server (QTime).   I used the jconsole mointor tool, The report are
heap usage of 10-50Mb,
No of threads - 10-20
No of class around 3800,



Re: Can we used CloudSolrServer for searching data

2013-09-07 Thread Furkan KAMACI
Shalin is right. If you read the documentation for CloudSolrServer you will
see that:

*SolrJ client class to communicate with SolrCloud. Instances of this class
communicate with Zookeeper to discover Solr endpoints for SolrCloud
collections, and then use the LBHttpSolrServer to issue requests.*
*
*
It uses *LBHttpSolrServer *for communication and that is what are you
looking for. Here is explanation of constructing it:

SolrServer lbHttpSolrServer = new
LBHttpSolrServer(http://host1:8080/solr/,http://host2:8080/solr,http://host2:8080/solr;);
 //or if you wish to pass the HttpClient do as follows
 httpClient httpClient =  new HttpClient();
 SolrServer lbHttpSolrServer = new
LBHttpSolrServer(httpClient,http://host1:8080/solr/,http://host2:8080/solr,http://host2:8080/solr;);

Than you can use it as a *SolrServer*.



2013/9/3 Shalin Shekhar Mangar shalinman...@gmail.com

 CloudSolrServer can only be used if you are actually using SolrCloud
 (i.e. a ZooKeeper aware setup). If you only have a multi-core setup,
 then you can use LBHttpSolrServer.

 See http://wiki.apache.org/solr/LBHttpSolrServer

 On Tue, Aug 27, 2013 at 2:11 PM, Dharmendra Jaiswal
 dharmendra.jais...@gmail.com wrote:
  Hello,
 
  I am using multi-core mechnism with Solr4.4.0. And each core is
 dedicated to
  a
  particular client (each core is a collection)
 
  Like If we search data from SiteA, it will provide search result from
 CoreA
  And if we search data from SiteB, it will provide search result from
 CoreB
  and similar case with other client.
 
  Right now i am using HttpSolrServer (SolrJ API) for connecting with Solr
 for
  search.
  As per my understanding it will try to connect directly to a particular
 Solr
  instance for searching and if that node will be down searching will fail.
  please let me know if my assumption is wrong.
 
  My query is that is it possible to connect with Solr using
 CloudSolrServer
  instead of HTTPSolrServer for searching. so that in case one node will be
  down cloud solr server will pick data from other instance of Solr.
 
  Any pointer and link will be helpful. it will be better if some one
 shared
  me some example related to connection using ClouSolrServer.
 
  Note: I am Using Windows machine for deployment of Solr. And we are
 indexing
  data from database using DIH
 
  Thanks,
  Dharmendra jaiswal
 
 
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Can-we-used-CloudSolrServer-for-searching-data-tp4086766.html
  Sent from the Solr - User mailing list archive at Nabble.com.



 --
 Regards,
 Shalin Shekhar Mangar.



Re: SolrCloud - shard containing an invalid host:port

2013-09-07 Thread Furkan KAMACI
If that line(192.168.1.10:8983/solr) is not green and gray then probably it
is because of you started up a Solr instance without defining a port and it
has registered itself into Zookeeper.


2013/9/3 Daniel Collins danwcoll...@gmail.com

 Was it a test instance that you created 8983 is the default port, so
 possibly you started an instance before you had the ports setup properly,
 and it registered in zookeeper as a valid instance.  You can use the Core
 API to UNLOAD it (if it is still running), if it isn't running anymore, I
 have yet to find a way to remove something from ZK We normally end up
 wiping zoo_data and bouncing everything at that point, instances should
 re-register themselves as they start up.  But that is the sledgehammer to
 crack a walnut approach. :)


 On 3 September 2013 13:55, Marc des Garets m...@ttux.net wrote:

  Hi,
 
  I have setup SolrCloud with tomcat. I use solr 4.1.
 
  I have zookeeper running on 192.168.1.10.
  A tomcat running solr_myidx on 192.168.1.10 on port 8080.
  A tomcat running solr_myidx on 192.168.1.11 on port 8080.
 
  My solr.xml is like this:
  ?xml version=1.0 encoding=UTF-8 ?
  solr persistent=true collection.configName=myidx
cores adminPath=/admin/cores defaultCoreName=collection1
  hostPort=8080 hostContext=solr_myidx zkClientTimeout=2
  core name=collection1 instanceDir=./
/cores
  /solr
 
  I have tomcat starting with: -Dbootstrap_conf=true -DzkHost=
  192.168.1.10:2181
 
  Both tomcat startup all good but when I go to the Cloud tab in the solr
  admin, I see the following:
 
  collection1 -- shard1 -- 192.168.1.10:8983/solr
192.168.1.11:8080/solr_ugc
192.168.1.10:8080/solr_ugc
 
  I don't know what is 192.168.1.10:8983/solr doing there. Do you know how
  I can remove it?
 
  It's causing the following error when I try to query the index:
  SEVERE: Error while trying to recover. core=collection1:org.apache.**
  solr.client.solrj.**SolrServerException: Server refused connection at:
  http://192.168.10.206:8983/**solr http://192.168.10.206:8983/solr
 
  Thanks,
  Marc
 



Re: Tweaking boosts for more search results variety

2013-09-07 Thread Furkan KAMACI
What do you mean with *these limitations *Do you want to make multiple
grouping at same time?


2013/9/6 Sai Gadde gadde@gmail.com

 Thank you Jack for the suggestion.

 We can try group by site. But considering that number of sites are only
 about 1000 against the index size of 5 million, One can expect most of the
 hits would be hidden and for certain specific keywords only a handful of
 actual results could be displayed if results are grouped by site.

 we already group on a signature field to identify duplicate content in
 these 5 million+ docs. But here the number of duplicates are only about
 3-5% maximum.

 Is there any workaround for these limitations with grouping?

 Thanks
 Shyam



 On Thu, Sep 5, 2013 at 9:16 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  The grouping (field collapsing) feature somewhat addresses this - group
 by
  a site field and then if more than one or a few top pages are from the
  same site they get grouped or collapsed so that you can see more sites
 in a
  few results.
 
  See:
  http://wiki.apache.org/solr/**FieldCollapsing
 http://wiki.apache.org/solr/FieldCollapsing
  https://cwiki.apache.org/**confluence/display/solr/**Result+Grouping
 https://cwiki.apache.org/confluence/display/solr/Result+Grouping
 
  -- Jack Krupansky
 
  -Original Message- From: Sai Gadde
  Sent: Thursday, September 05, 2013 2:27 AM
  To: solr-user@lucene.apache.org
  Subject: Tweaking boosts for more search results variety
 
 
  Our index is aggregated content from various sites on the web. We want
 good
  user experience by showing multiple sites in the search results. In our
  setup we are seeing most of the results from same site on the top.
 
  Here is some information regarding queries and schema
 site - String field. We have about 1000 sites in index
 sitetype - String field.  we have 3 site types
  omitNorms=true for both the fields
 
  Doc count varies largely based on site and sitetype by a factor of 10 -
  1000 times
  Total index size is about 5 million docs.
  Solr Version: 4.0
 
  In our queries we have a fixed and preferential boost for certain sites.
  sitetype has different and fixed boosts for 3 possible values. We turned
  off Inverse Document Frequency (IDF) for these boosts to work properly.
  Other text fields are boosted based on search keywords only.
 
  With this setup we often see a bunch of hits from a single site followed
 by
  next etc.,
  Is there any solution to see results from variety of sites and still keep
  the preferential boosts in place?