Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread vivek sar
Thanks Shalin and Paul.

I'm not using MultipartRequest. I do share the same SolrServer between
two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
simply using CommonsHttpSolrServer to create the SolrServer. I've also
tried StreamingUpdateSolrServer, which works much faster, but does
throws connection reset exception once in a while.

Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
anything on it on Wiki.

I was also thinking of using EmbeddedSolrServer - in what case would I
be able to use it? Does my application and the Solr web app need to
run into the same JVM for this to work? How would I use the
EmbeddedSolrServer?

Thanks,
-vivek


On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 Vivek, do you share the same SolrServer instance between your two threads?
 If so, are you using the MultiThreadedHttpConnectionManager when creating
 the HttpClient instance?

 On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote:

 single thread everything works fine. Two threads are fine too for a
 while and all the sudden problem starts happening.

 I tried indexing using REST services as well (instead of Solrj), but
 with that too I get following error after a while,

 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
 indexData()- Failed to index
 java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at
 org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
        at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
         at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
        at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)


 Note, I'm using simple lock type. I'd tried single type before
 that once caused index corruption so I switched to simple.

 Thanks,
 -vivek

 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
  do you see the same problem when you use a single thread?
 
  what is the version of SolrJ that you use?
 
 
 
  On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote:
  Hi,
 
   Any ideas on this issue? I ran into this again - once it starts
  happening it keeps happening. One of the thread keeps failing. Here
  are my SolrServer settings,
 
         int socketTO = 0;
         int connectionTO = 100;
         int maxConnectionPerHost = 10;
         int maxTotalConnection = 50;
         boolean followRedirects = false;
         boolean allowCompression = true;
         int maxRetries = 1;
 
  Note, I'm using two threads to simultaneously write to the same index.
 
  org.apache.solr.client.solrj.SolrServerException:
  org.apache.commons.httpclient.ProtocolException: Unbuffered entity
  enclosing request can not be repeated.
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
         at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
         at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
         at
 org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
 
  Thanks,
  -vivek
 
  On Sat, Apr 4, 2009 at 1:07 AM, vivek sar vivex...@gmail.com wrote:
  Hi,
 
   I'm sending 15K records at once using Solrj (server.addBeans(...))
  and have two threads writing to same index. One thread goes fine, but
  the second thread always fails with,
 
 
  org.apache.solr.client.solrj.SolrServerException:
  org.apache.commons.httpclient.ProtocolException: Unbuffered entity
  enclosing request can not be repeated.
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
         at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
         at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
         at
 

Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar
Hi,

   I've gone through the mailing archive and have read contradicting
remarks on this issue. Can someone please clear this up as I'm not
able to run distributed search on multi-cores. Is there any document
on how can I search across multicore which share the same schema. Here
are the various comments I've read on this mailing list,

1) http://www.nabble.com/multi-core-vs-multi-app-td15803781.html#a15803781
Don't think you can search against multiple cores automatically -
i.e. got to make multiple queries, one for each core and combine
results yourself. Yes, this will slow things down.   - Otis

2) 
http://www.nabble.com/Search-in-SOLR-multi-cores-in-a-single-request-td20356173.html#a20356173
The idea behind multicore is that you will use them if you have completely
different type of documents (basically multiple schemas). - Shalin

3) http://www.nabble.com/Distributed-search-td22036229.html#a22036229
That should work, yes, though it may not be a wise thing to do
performance-wise, if the number of CPU cores that solr server has is
lower than the number of Solr cores. - Otis

My only motivation behind using multi-core is to keep the index size
in limit. All my cores are using the same schema. My index grow to
over 30G within a day and I need to keep up to a year of data.  I
couldn't find any other way of scaling using Solr. I've noticed once
the index grows above 10G the index process starts slowing down, the
commit takes much longer and optimize is hard to finish. So, I'm
trying to create a new core after every 10 million documents (equals
to 10G in my case). I don't want to start new Solr instance every 10G
- that won't scale for a year time. I'm going to use 3-4 servers to
hold all these cores.

Now if someone could please tell me if this is a wrong scaling
architecture I could re-think. I want fast indexing at the same time
fast enough search. If I've to search on each core separately and
merge myself the search performance is going to be awful.

Is Solr the right tool for managing billions of records (I can get up
to 100million records every day - with 1Kb per record - 100GB of index
a day)? Most of the field values are pretty distinct (like  10 million
email addresses) so the index size would be huge too.

I would think it's a common problem to scale huge size index keeping
both indexing and search time acceptable. I'm not sure if this can be
managed on just 4 servers - we don't have 100s of boxes for this
project. Any other tool that might be more appropriate for this kind
of case - like Katta or Lucene on Hadoop, or simply use Lucene using
Parallel Search and partition the indexes on size?

Thanks,
-vivek

On Wed, Apr 8, 2009 at 11:07 AM, vivek sar vivex...@gmail.com wrote:
 Any help on this issue? Would distributed search on multi-core on same
 Solr instance even work? Does it has to be different Solr instances
 altogether (separate shards)?

 I'm kind of stuck at this point right now. Keep getting one of the two
 errors (when running distributed search - single searches work fine)
 as mentioned in this thread earlier.

 Thanks,
 -vivek

 On Wed, Apr 8, 2009 at 1:57 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Fergus. I'm still having problem with multicore search.

 I tried the following with two cores (they both share the same schema
 and solrconfig.xml) on the same box on same solr instance,

 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
 cores in admin interface
 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in 
 xml
 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
 gives me top 10 records
 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
 gives me top 10 records
 5) 
 http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan
  - this FAILS. I've seen two problems with this.

    a) When index are being committed I see,

 SEVERE: org.apache.solr.common.SolrException:
 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 

Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
how many documents are you inserting ?
may be you can create multiple instances of CommonshttpSolrServer and
upload in parallel


On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Shalin and Paul.

 I'm not using MultipartRequest. I do share the same SolrServer between
 two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
 simply using CommonsHttpSolrServer to create the SolrServer. I've also
 tried StreamingUpdateSolrServer, which works much faster, but does
 throws connection reset exception once in a while.

 Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
 anything on it on Wiki.

 I was also thinking of using EmbeddedSolrServer - in what case would I
 be able to use it? Does my application and the Solr web app need to
 run into the same JVM for this to work? How would I use the
 EmbeddedSolrServer?

 Thanks,
 -vivek


 On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 Vivek, do you share the same SolrServer instance between your two threads?
 If so, are you using the MultiThreadedHttpConnectionManager when creating
 the HttpClient instance?

 On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote:

 single thread everything works fine. Two threads are fine too for a
 while and all the sudden problem starts happening.

 I tried indexing using REST services as well (instead of Solrj), but
 with that too I get following error after a while,

 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
 indexData()- Failed to index
 java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at
 org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
        at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
         at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
        at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)


 Note, I'm using simple lock type. I'd tried single type before
 that once caused index corruption so I switched to simple.

 Thanks,
 -vivek

 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
  do you see the same problem when you use a single thread?
 
  what is the version of SolrJ that you use?
 
 
 
  On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote:
  Hi,
 
   Any ideas on this issue? I ran into this again - once it starts
  happening it keeps happening. One of the thread keeps failing. Here
  are my SolrServer settings,
 
         int socketTO = 0;
         int connectionTO = 100;
         int maxConnectionPerHost = 10;
         int maxTotalConnection = 50;
         boolean followRedirects = false;
         boolean allowCompression = true;
         int maxRetries = 1;
 
  Note, I'm using two threads to simultaneously write to the same index.
 
  org.apache.solr.client.solrj.SolrServerException:
  org.apache.commons.httpclient.ProtocolException: Unbuffered entity
  enclosing request can not be repeated.
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
         at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
         at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
         at
 org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
 
  Thanks,
  -vivek
 
  On Sat, Apr 4, 2009 at 1:07 AM, vivek sar vivex...@gmail.com wrote:
  Hi,
 
   I'm sending 15K records at once using Solrj (server.addBeans(...))
  and have two threads writing to same index. One thread goes fine, but
  the second thread always fails with,
 
 
  org.apache.solr.client.solrj.SolrServerException:
  org.apache.commons.httpclient.ProtocolException: Unbuffered entity
  enclosing request can not be repeated.
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
         at
 

Re: solr 1.4 memory jvm

2009-04-09 Thread sunnyfr

Hi Noble,

Yes exactly that,
I would like to know how people do during a replication ?
Do they turn off servers and put a high autowarmCount which turn off the
slave for a while like for my case, 10mn to bring back the new index and
then autowarmCount maybe 10 minutes more.

Otherwise I tried to put large number of mergefactor but I guess I've too
much update every 30mn something like 2000docs and almost all segment are
modified.

What would you reckon? :(  :)

Thanks a lot Noble 


Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 So what I decipher from the numbers is w/o queries Solr replication is
 not performing too badly. The queries are inherently slow and you wish
 to optimize the query performance itself.
 am I correct?
 
 On Tue, Apr 7, 2009 at 7:50 PM, sunnyfr johanna...@gmail.com wrote:

 Hi,

 So I did two test on two servers;

 First server : with just replication every 20mn like you can notice:
 http://www.nabble.com/file/p22930179/cpu_without_request.png
 cpu_without_request.png
 http://www.nabble.com/file/p22930179/cpu2_without_request.jpg
 cpu2_without_request.jpg

 Second server : with one first replication and a second one during query
 test: between 15:32pm and 15h41
 during replication (checked on .../admin/replication/index.jsp) my
 respond
 time query at the end was around 5000msec
 after the replication I guess during commitment I couldn't get answer of
 my
 query for a long time, I refreshed my page few minutes after.
 http://www.nabble.com/file/p22930179/cpu_with_request.png
 cpu_with_request.png
 http://www.nabble.com/file/p22930179/cpu2_with_request.jpg
 cpu2_with_request.jpg

 Now without replication I kept going query on the second server, and I
 can't
 get better than
 1000msec repond time and 11request/second.
 http://www.nabble.com/file/p22930179/cpu_.jpg cpu_.jpg

 This is my request :
 select?fl=idfq=status_published:1+AND+status_moderated:0+AND+status_personal:0+AND+status_private:0+AND+status_deleted:0+AND+status_error:0+AND+status_ready_web:1json.nl=mapwt=jsonstart=0version=1.2bq=status_official:1^1.5+OR+status_creative:1^1+OR+language:en^0.5bf=recip(rord(created),1,10,10)^3+pow(stat_views,0.1)^15+pow(stat_comments,0.1)^15rows=100qt=dismaxqf=title_en^0.8+title^0.2+description_en^0.3+description^0.2+tags^1+owner_login^0.5

 Do you have advice ?

 Thanks Noble


 --
 View this message in context:
 http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22930179.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-memory-jvm-tp22913742p22966630.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr 1.4 facet boost field according to another field

2009-04-09 Thread sunnyfr

Do you have an idea ?



sunnyfr wrote:
 
 Hi,
 
 I've title description and tag field ... According to where I find the
 word searched, I would like to boost differently other field like nb_views
 or rating.
 
 if word is find in title then nb_views^10 and rating^10
 if word is find in description then nb_views^2 and rating^2
 
 Thanks a lot for your help,
 

-- 
View this message in context: 
http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html
Sent from the Solr - User mailing list archive at Nabble.com.



different scoring for different types of found documents

2009-04-09 Thread Andrey Klochkov
Hi,

We have a quite complex requirement concerning scoring logic customization,
but but I guess it's quite useful and probably something like it  was done
already.

So we're searching through the product catalog. Product have types (i.e.
Electronics, Apparel, Furniture etc). What we need is to customize
scoring of the results so that top results should contain products of all
different types which match the query. So after finding all the products
matching the query we want to group results by product type. Then for every
product type take corresponding sub-set of results and in every of the
sub-sets assign scores with the following logic. Assign score 5 to the first
20% of results, then assign score 4 to the next 15% of results, and so on.
Particular percent values are configured by the end user. How could we
achive it using Solr? Is it possible at all? Maybe we should implement some
custom ValueSource and use it in a function queries?

-- 
Andrew Klochkov


Re: Its urgent! plz help in schema.xml- appending one field to another

2009-04-09 Thread Erik Hatcher


On Apr 8, 2009, at 9:50 PM, Udaya wrote:



Hi,
Need your help,
I would like to know how we could append or add one field value to  
another

field in Scheme.xml
My scheme is as follows (only the field part is given):
Scheme.xml
fields
field name=topics_id type=integer indexed=true stored=true
required=true /
  field name=topics_subject type=text indexed=true  
stored=true

required=true/
  field name=post_text type=text indexed=true stored=true
multiValued=true/

   field name=url type=string stored=true
default=http://comp.com/portals/ForumWindow? 
action=1v=tp=topics_id#topics_id

/
   field name=all_text type=text indexed=true stored=true
multiValued=true/

Here for the field with name topics_id we get id from a table. I  
what his
topics_id value to be appended into the default value attribute of  
the field

with name url.

For eg:
Suppose if we get topics_id value as 512 during a search then the  
value of

the url should be appended as
http://comp.com/portals/JBossForumWindow?action=1v=tp=512#512

Is this possible, plz give me some suggestions.


If you're using DIH to index your table, you could aggregate using the  
template transformer during indexing.


If you're indexing a different way, why not let the searching client  
(UI) do the aggregation of an id into a URL?


Erik



Re: Searching on mulit-core Solr

2009-04-09 Thread Erik Hatcher


On Apr 9, 2009, at 3:00 AM, vivek sar wrote:

 Can someone please clear this up as I'm not
able to run distributed search on multi-cores.


What error or problem are you encountering when trying this?  How are  
you trying it?


Erik



Re: solr 1.4 facet boost field according to another field

2009-04-09 Thread Shalin Shekhar Mangar
I don't think conditional boosting is possible. You can boost the same field
on which the match was found. But you cannot boost a different field.

On Thu, Apr 9, 2009 at 2:05 PM, sunnyfr johanna...@gmail.com wrote:


 Do you have an idea ?



 sunnyfr wrote:
 
  Hi,
 
  I've title description and tag field ... According to where I find the
  word searched, I would like to boost differently other field like
 nb_views
  or rating.
 
  if word is find in title then nb_views^10 and rating^10
  if word is find in description then nb_views^2 and rating^2
 
  Thanks a lot for your help,
 

 --
 View this message in context:
 http://www.nabble.com/solr-1.4-facet-boost-field-according-to-another-field-tp22913642p2294.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Regards,
Shalin Shekhar Mangar.


Re: different scoring for different types of found documents

2009-04-09 Thread Shalin Shekhar Mangar
On Thu, Apr 9, 2009 at 2:17 PM, Andrey Klochkov
akloch...@griddynamics.comwrote:


 So we're searching through the product catalog. Product have types (i.e.
 Electronics, Apparel, Furniture etc). What we need is to customize
 scoring of the results so that top results should contain products of all
 different types which match the query. So after finding all the products
 matching the query we want to group results by product type.


This is something similar to Field Collapsing. It is not committed to trunk
but there are a few patches.

https://issues.apache.org/jira/browse/SOLR-236


 Then for every
 product type take corresponding sub-set of results and in every of the
 sub-sets assign scores with the following logic. Assign score 5 to the
 first
 20% of results, then assign score 4 to the next 15% of results, and so on.
 Particular percent values are configured by the end user. How could we
 achive it using Solr? Is it possible at all? Maybe we should implement some
 custom ValueSource and use it in a function queries?


Such kind of scoring is not possible out of the box. You need to assign
scores according to where the document lies in the final list of results
(after all filters are applied), therefore you may not be able to operate on
the DocList directly or in the value source. I *think* a good place to start
looking would be the QueryValueSource in trunk as it has access to the
scorer. But I do not know much about these things.
-- 
Regards,
Shalin Shekhar Mangar.


Re: Searching on mulit-core Solr

2009-04-09 Thread Fergus McMenemie
Any help on this issue? Would distributed search on multi-core on same
Solr instance even work? Does it has to be different Solr instances
altogether (separate shards)?

As best I can tell this works fine for me. Multiple cores on the one
machine. Very different schema and solrconfig.xml for each of the 
cores. Distributed searching using shards works fine. But I am using
the trunk version.

Perhaps you should post your solr.xml file.

I'm kind of stuck at this point right now. Keep getting one of the two
errors (when running distributed search - single searches work fine)
as mentioned in this thread earlier.

Thanks,
-vivek

On Wed, Apr 8, 2009 at 1:57 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Fergus. I'm still having problem with multicore search.

 I tried the following with two cores (they both share the same schema
 and solrconfig.xml) on the same box on same solr instance,

 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
 cores in admin interface
 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in 
 xml
 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
 gives me top 10 records
 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
 gives me top 10 records
 5) 
 http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan
  - this FAILS. I've seen two problems with this.

    a) When index are being committed I see,

 SEVERE: org.apache.solr.common.SolrException:
 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:637)

    b) Other times I see this,

 SEVERE: java.lang.NullPointerException
        at 
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
        at 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   

Multi-language support

2009-04-09 Thread revas
Hi,

To reframe my earlier question

Some languages have just analyzers only but nostemmer from snowball
porter,then does the analyzer take care of stemming as well?

Some languages only have the stemmer from snowball but no analyzer?

Some have both.

Can we say then that solr supports all the above languages .Will search be
same across all the above cases?

thanks
revas


Re: Using constants with DataImportHandler and MySQL ?

2009-04-09 Thread gateway0

Here´s the solution:

entity name=ci_project query=select pr_id, pr_name, pr_comment,
'dataci_project' from ci_project WHERE pr_id = 1

 field column=dataci_project name=definition /

 /entity

just insert a dummy sql field 'dataci_project' in your select statement. 



Glen Newton wrote:
 
 In MySql at least, you can do achieve what I think you want by
 manipulating the SQL, like this:
 
 mysql select foo as Constant1, id from Article limit 10;
 select foo as Constant1, id from Article limit 10;
 +---++
 | Constant1 | id |
 +---++
 | foo   |  1 |
 | foo   |  2 |
 | foo   |  3 |
 | foo   |  4 |
 | foo   |  5 |
 | foo   |  6 |
 | foo   |  7 |
 | foo   |  8 |
 | foo   |  9 |
 | foo   | 10 |
 +---++
 10 rows in set (0.00 sec)
 
 mysql select 435 as Constant2, id from Article limit 10;
 select 435 as Constant2, id from Article limit 10;
 +---++
 | Constant2 | id |
 +---++
 |   435 |  1 |
 |   435 |  2 |
 |   435 |  3 |
 |   435 |  4 |
 |   435 |  5 |
 |   435 |  6 |
 |   435 |  7 |
 |   435 |  8 |
 |   435 |  9 |
 |   435 | 10 |
 +---++
 10 rows in set (0.00 sec)
 
 mysql
 
 2009/4/8 Shalin Shekhar Mangar shalinman...@gmail.com:
 On Wed, Apr 8, 2009 at 10:23 PM, gateway0 reiterwo...@yahoo.de wrote:


 The problem as you see is the line:
 field name=definitionProjects/field

 I want to set a constant value for every row in the SQL table but it
 doesn´t
 work that way, any ideas?


 That is not a valid syntax.

 There are two ways to do this:
 1. In your schema.xml provide the 'default' attribute
 2. Use TemplateTransformer - see
 http://wiki.apache.org/solr/DataImportHandlerFaq

 --
 Regards,
 Shalin Shekhar Mangar.

 
 
 
 -- 
 
 -
 
 

-- 
View this message in context: 
http://www.nabble.com/Using-constants-with-DataImportHandler-and-MySQL---tp22954954p22969123.html
Sent from the Solr - User mailing list archive at Nabble.com.



Analyzers and stemmer

2009-04-09 Thread revas
Hi ,

  With respect to language support in solr ,we have analyzers for some
languages and stemmers for certain langauges.Do we say that solr supports
this particular language only if we have both analyzer and stemmer for the
language or also for which we have analyzer but not stemmer

Regards
Sujatha


Dataimporthandler + MySQL = Datetime offset by 2 hours ?

2009-04-09 Thread gateway0

Hi,

im fetching entries from my mysql database and index them with the
Dataimporthandler:

MySQL Table entry: (for example)
pr_timedate : 2009-04-14 11:00:00 

entry in data-config.xml to index the mysql field:
field column=pr_timedate name=completion
dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' /

result in solr index:
date2009-04-14T09:00:00Z/date:confused:

it says 09:00:00 instead of 11:00:00 as it supposed to. 

I´ve searched for hours already, why is that?

best wishes, Sebastian
-- 
View this message in context: 
http://www.nabble.com/Dataimporthandler-%2B-MySQL-%3D-Datetime-offset-by-2-hours---tp22970250p22970250.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Dataimporthandler + MySQL = Datetime offset by 2 hours ?

2009-04-09 Thread Shalin Shekhar Mangar
On Thu, Apr 9, 2009 at 6:18 PM, gateway0 reiterwo...@yahoo.de wrote:


 Hi,

 im fetching entries from my mysql database and index them with the
 Dataimporthandler:

 MySQL Table entry: (for example)
 pr_timedate : 2009-04-14 11:00:00

 entry in data-config.xml to index the mysql field:
 field column=pr_timedate name=completion
 dateTimeFormat=-MM-dd'T'hh:mm:ss'Z' /

 result in solr index:
 date2009-04-14T09:00:00Z/date:confused:

 it says 09:00:00 instead of 11:00:00 as it supposed to.

 I´ve searched for hours already, why is that?


I think that may be because date/time in Solr is supposed to be in UTC. See
the note on DateField in the schema.xml
-- 
Regards,
Shalin Shekhar Mangar.


Access HTTP headers from custom request handler

2009-04-09 Thread Giovanni De Stefano
Hello all,

we are writing a custom request handler and we need to implement some
business logic according to some HTTP headers.

I see there is no easy way to access HTTP headers from the request handler.

Moreover it seems to me that the HTTPServletness is lost way before the
custom request handler comes in the game.

Is there any way to access HTTP headers from within the request handler?

Thanks,
Giovanni


Re: Snapinstaller vs Solr Restart

2009-04-09 Thread sunnyfr

Hi Otis,

Ok about that, but still when it merges segments it changes names and I've
no choice to replicate all the segment which is bad for the replication and
cpu. ??

Thanks


Otis Gospodnetic wrote:
 
 Lower your mergeFactor and Lucene will merge segments(i.e. fewer index
 files) and purge deletes more often for you at the expense of somewhat
 slower indexing.
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 - Original Message 
 From: wojtekpia wojte...@hotmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, January 6, 2009 5:18:26 PM
 Subject: Re: Snapinstaller vs Solr Restart
 
 
 I'm optimizing because I thought I should. I'll be updating my index
 somewhere between every 15 minutes, and every 2 hours. That means between
 12
 and 96 updates per day. That seems like a lot of index files (and it
 scared
 me a little), so that's my second reason for wanting to optimize nightly.
 
 I haven't benchmarked the performance hit for not optimizing. That'll be
 my
 next step. If the hit isn't too bad, I'll look into optimizing less
 frequently (weekly, ...).
 
 Thanks Otis!
 
 
 Otis Gospodnetic wrote:
  
  OK, so that question/answer seems to have hit the nail on the head.  :)
  
  When you optimize your index, all index files get rewritten.  This
 means
  that everything that the OS cached up to that point goes out the window
  and the OS has to slowly re-cache the hot parts of the index.  If you
  don't optimize, this won't happen.  Do you really need to optimize?  Or
  maybe a more direct question: why are you optimizing?
  
  
  Regarding autowarming, with such high fq hit rate, I'd make good use of
 fq
  autowarming.  The result cache rate is lower, but still decent.  I
  wouldn't turn off autowarming the way you have.
  
  
 
 -- 
 View this message in context: 
 http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p21320334.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Snapinstaller-vs-Solr-Restart-tp21315273p22972780.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Any tips for indexing large amounts of data?

2009-04-09 Thread sunnyfr

Hi Otis,
How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
for 14M docs and 5 update every 30mn but my replication kill everything. 
My segments are merged too often sor full index replicate and cache lost and
 I've no idea what can I do now?
Some help would be brilliant,
btw im using Solr 1.4.

Thanks,


Otis Gospodnetic wrote:
 
 Mike is right about the occasional slow-down, which appears as a pause and
 is due to large Lucene index segment merging.  This should go away with
 newer versions of Lucene where this is happening in the background.
 
 That said, we just indexed about 20MM documents on a single 8-core machine
 with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
 a little less than 10 hours - that's over 550 docs/second.  The vanilla
 approach before some of our changes apparently required several days to
 index the same amount of data.
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 - Original Message 
 From: Mike Klaas mike.kl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, November 19, 2007 5:50:19 PM
 Subject: Re: Any tips for indexing large amounts of data?
 
 There should be some slowdown in larger indices as occasionally large  
 segment merge operations must occur.  However, this shouldn't really  
 affect overall speed too much.
 
 You haven't really given us enough data to tell you anything useful.   
 I would recommend trying to do the indexing via a webapp to eliminate  
 all your code as a possible factor.  Then, look for signs to what is  
 happening when indexing slows.  For instance, is Solr high in cpu, is  
 the computer thrashing, etc?
 
 -Mike
 
 On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:
 
 Hi,

 Thanks for answering this question a while back. I have made some  
 of the suggestions you mentioned. ie not committing until I've  
 finished indexing. What I am seeing though, is as the index get  
 larger (around 1Gb), indexing is taking a lot longer. In fact it  
 slows down to a crawl. Have you got any pointers as to what I might  
 be doing wrong?

 Also, I was looking at using MultiCore solr. Could this help in  
 some way?

 Thank you
 Brendan

 On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:


 : I would think you would see better performance by allowing auto  
 commit
 : to handle the commit size instead of reopening the connection  
 all the
 : time.

 if your goal is fast indexing, don't use autoCommit at all ...
  just
 index everything, and don't commit until you are completely done.

 autoCommitting will slow your indexing down (the benefit being  
 that more
 results will be visible to searchers as you proceed)




 -Hoss


 
 
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Glen Newton
For Solr / Lucene:
- use -XX:+AggressiveOpts
- If available, huge pages can help. See
http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html
 I haven't yet followed-up with my Lucene performance numbers using
huge pages: it is 10-15% for large indexing jobs.

For Lucene:
- multi-thread using java.util.concurrent.ThreadPoolExecutor
(http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
  6.4 million full-text article + metadata indexed resulting in 83GB
index; these are old number: things are down to ~10hours now)
- while multithreading on multicore is particularly good, it also
improves performance on single core, for small (6 YMMV) numbers of
threads  good I/O (test for your particular configuration)
- Use multiple indexes  merge at the end
- As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
use separate ThreadPoolExecutor  per index in previous, reducing queue
contention. This is giving me an additional ~10%. I will blog about
this in the near future...

-glen

2009/4/9 sunnyfr johanna...@gmail.com:

 Hi Otis,
 How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
 for 14M docs and 5 update every 30mn but my replication kill everything.
 My segments are merged too often sor full index replicate and cache lost and
  I've no idea what can I do now?
 Some help would be brilliant,
 btw im using Solr 1.4.

 Thanks,


 Otis Gospodnetic wrote:

 Mike is right about the occasional slow-down, which appears as a pause and
 is due to large Lucene index segment merging.  This should go away with
 newer versions of Lucene where this is happening in the background.

 That said, we just indexed about 20MM documents on a single 8-core machine
 with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
 a little less than 10 hours - that's over 550 docs/second.  The vanilla
 approach before some of our changes apparently required several days to
 index the same amount of data.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

 - Original Message 
 From: Mike Klaas mike.kl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, November 19, 2007 5:50:19 PM
 Subject: Re: Any tips for indexing large amounts of data?

 There should be some slowdown in larger indices as occasionally large
 segment merge operations must occur.  However, this shouldn't really
 affect overall speed too much.

 You haven't really given us enough data to tell you anything useful.
 I would recommend trying to do the indexing via a webapp to eliminate
 all your code as a possible factor.  Then, look for signs to what is
 happening when indexing slows.  For instance, is Solr high in cpu, is
 the computer thrashing, etc?

 -Mike

 On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:

 Hi,

 Thanks for answering this question a while back. I have made some
 of the suggestions you mentioned. ie not committing until I've
 finished indexing. What I am seeing though, is as the index get
 larger (around 1Gb), indexing is taking a lot longer. In fact it
 slows down to a crawl. Have you got any pointers as to what I might
 be doing wrong?

 Also, I was looking at using MultiCore solr. Could this help in
 some way?

 Thank you
 Brendan

 On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:


 : I would think you would see better performance by allowing auto
 commit
 : to handle the commit size instead of reopening the connection
 all the
 : time.

 if your goal is fast indexing, don't use autoCommit at all ...
  just
 index everything, and don't commit until you are completely done.

 autoCommitting will slow your indexing down (the benefit being
 that more
 results will be visible to searchers as you proceed)




 -Hoss









 --
 View this message in context: 
 http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 

-


Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Glen Newton
 - As per
 http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
Sorry, the presentation covers a lot of ground: see slide #20:
Standard thread pools can have high contention for task queue and
other data structures when used with fine-grained tasks
[I haven't yet implemented work stealing]

-glen

2009/4/9 Glen Newton glen.new...@gmail.com:
 For Solr / Lucene:
 - use -XX:+AggressiveOpts
 - If available, huge pages can help. See
 http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html
  I haven't yet followed-up with my Lucene performance numbers using
 huge pages: it is 10-15% for large indexing jobs.

 For Lucene:
 - multi-thread using java.util.concurrent.ThreadPoolExecutor
 (http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
  6.4 million full-text article + metadata indexed resulting in 83GB
 index; these are old number: things are down to ~10hours now)
 - while multithreading on multicore is particularly good, it also
 improves performance on single core, for small (6 YMMV) numbers of
 threads  good I/O (test for your particular configuration)
 - Use multiple indexes  merge at the end
 - As per http://developers.sun.com/learning/javaoneonline/2008/pdf/TS-5515.pdf
 use separate ThreadPoolExecutor  per index in previous, reducing queue
 contention. This is giving me an additional ~10%. I will blog about
 this in the near future...

 -glen

 2009/4/9 sunnyfr johanna...@gmail.com:

 Hi Otis,
 How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
 for 14M docs and 5 update every 30mn but my replication kill everything.
 My segments are merged too often sor full index replicate and cache lost and
  I've no idea what can I do now?
 Some help would be brilliant,
 btw im using Solr 1.4.

 Thanks,


 Otis Gospodnetic wrote:

 Mike is right about the occasional slow-down, which appears as a pause and
 is due to large Lucene index segment merging.  This should go away with
 newer versions of Lucene where this is happening in the background.

 That said, we just indexed about 20MM documents on a single 8-core machine
 with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
 a little less than 10 hours - that's over 550 docs/second.  The vanilla
 approach before some of our changes apparently required several days to
 index the same amount of data.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

 - Original Message 
 From: Mike Klaas mike.kl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, November 19, 2007 5:50:19 PM
 Subject: Re: Any tips for indexing large amounts of data?

 There should be some slowdown in larger indices as occasionally large
 segment merge operations must occur.  However, this shouldn't really
 affect overall speed too much.

 You haven't really given us enough data to tell you anything useful.
 I would recommend trying to do the indexing via a webapp to eliminate
 all your code as a possible factor.  Then, look for signs to what is
 happening when indexing slows.  For instance, is Solr high in cpu, is
 the computer thrashing, etc?

 -Mike

 On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:

 Hi,

 Thanks for answering this question a while back. I have made some
 of the suggestions you mentioned. ie not committing until I've
 finished indexing. What I am seeing though, is as the index get
 larger (around 1Gb), indexing is taking a lot longer. In fact it
 slows down to a crawl. Have you got any pointers as to what I might
 be doing wrong?

 Also, I was looking at using MultiCore solr. Could this help in
 some way?

 Thank you
 Brendan

 On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:


 : I would think you would see better performance by allowing auto
 commit
 : to handle the commit size instead of reopening the connection
 all the
 : time.

 if your goal is fast indexing, don't use autoCommit at all ...
  just
 index everything, and don't commit until you are completely done.

 autoCommitting will slow your indexing down (the benefit being
 that more
 results will be visible to searchers as you proceed)




 -Hoss









 --
 View this message in context: 
 http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --

 -




-- 

-


Custom DIH: FileDataSource with additional business logic?

2009-04-09 Thread Giovanni De Stefano
Hello,

here I am with another question.

I am using DIH to index a DB. Additionally I also have to index some files
containing Java serialized objects (and I cannot change this... :-( ).

I currently have implemented a standalone Java app with the following
features:

1) read all files from a given folder
2) deserialize the files into lists of items
3) convert the list of items into lists of SolrInputDocument(s)
4) post the lists of SolrInputDocument(s) to Solr

All this is done using SolrJ. So far so good.

I would like to use a DIH with a FileDataSource to do 1) and 4), and I would
like to squeeze in my implementation for 2) and 3).

Is this possible? Any hint?

Thank you all in advance.

Cheers,
Giovanni


Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar
Erik,

  Here is what I'd posted in this thread earlier,

I tried the following with two cores (they both share the same schema
and solrconfig.xml) on the same box on same solr instance,

1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
cores in admin interface
2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores in xml
3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
gives me top 10 records
4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
gives me top 10 records
5) 
http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan
 - this FAILS. I've seen two problems with this.



   a) This is the error most of the times,

SEVERE: java.lang.NullPointerException
   at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
   at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at java.lang.Thread.run(Thread.java:637)

b)  When index are being committed I see this during search,

SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
   at java.lang.Thread.run(Thread.java:637)

Any tips on how can I search on multicore on same solr instance?

Thanks,
-vivek

On Thu, Apr 9, 2009 at 2:56 AM, Erik Hatcher e...@ehatchersolutions.com wrote:

 On Apr 9, 2009, at 3:00 AM, vivek sar wrote:

  Can someone please clear this up as I'm not
 able to run distributed search on multi-cores.

 What error or problem are you encountering when trying this?  How are you
 trying it?

        Erik




Re: Searching on mulit-core Solr

2009-04-09 Thread vivek sar
 Attached is the solr.xml - note, the schema and solrconfig are
located in the core0 and all other cores point to the same core0
instance for schema.

Searches on individual cores work fine so I'm using the solr.xml is
correct - I also get their status correctly. From the
NullPointerException it seems it fails at,

 for (int i=resultSize-1; i=0; i--) {
ShardDoc shardDoc = (ShardDoc)queue.pop();
shardDoc.positionInResponse = i;
// Need the toString() for correlation with other lists that must
// be strings (like keys in highlighting, explain, etc)
resultIds.put(shardDoc.id.toString(), shardDoc);
  }

I've a unique field (required) in my documents so I'm not sure whether
that can be null - could doc itself be null - how? Same search on the
same cores individually works fine. Not sure if there is a way to
debug this.

I'm not sure on when would I get Connection reset exception - would
it be if indexing is happening at the same time at hight rate - would
that cause problems?

Thanks,
-vivek


On Thu, Apr 9, 2009 at 4:07 AM, Fergus McMenemie fer...@twig.me.uk wrote:
Any help on this issue? Would distributed search on multi-core on same
Solr instance even work? Does it has to be different Solr instances
altogether (separate shards)?

 As best I can tell this works fine for me. Multiple cores on the one
 machine. Very different schema and solrconfig.xml for each of the
 cores. Distributed searching using shards works fine. But I am using
 the trunk version.

 Perhaps you should post your solr.xml file.

I'm kind of stuck at this point right now. Keep getting one of the two
errors (when running distributed search - single searches work fine)
as mentioned in this thread earlier.

Thanks,
-vivek

On Wed, Apr 8, 2009 at 1:57 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Fergus. I'm still having problem with multicore search.

 I tried the following with two cores (they both share the same schema
 and solrconfig.xml) on the same box on same solr instance,

 1) http://10.4.x.x:8080/solr/core0/admin/  - works fine, shows all the
 cores in admin interface
 2) http://10.4.x.x:8080/solr/admin/cores  - works fine, see all the cores 
 in xml
 3) http://10.4.x.x:8080/solr/20090407_2/select?q=japan - works fine,
 gives me top 10 records
 4) http://10.4.x.x:8080/solr/20090408_3/select?q=japan - works fine,
 gives me top 10 records
 5) 
 http://10.4.x.x:8080/solr/20090407_2/select?shards=10.4.x.x:8080/solr/20090407_2,10.4.x.x:8080/solr/20090408_3indent=trueq=japan
  - this FAILS. I've seen two problems with this.

    a) When index are being committed I see,

 SEVERE: org.apache.solr.common.SolrException:
 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
        at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
        at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
        at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
        at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:637)

    b) Other times I see this,

 SEVERE: java.lang.NullPointerException
        at 
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
        at 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
        at 
 

Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread vivek sar
I'm inserting 10K in a batch (using addBeans method). I read somewhere
in the wiki that it's better to use the same instance of SolrServer
for better performance. Would MultiThreadedConnectionManager help? How
do I use it?

I also wanted to know how can use EmbeddedSolrServer - does my app
needs to be running in the same jvm with Solr webapp?

Thanks,
-vivek

2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 how many documents are you inserting ?
 may be you can create multiple instances of CommonshttpSolrServer and
 upload in parallel


 On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Shalin and Paul.

 I'm not using MultipartRequest. I do share the same SolrServer between
 two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
 simply using CommonsHttpSolrServer to create the SolrServer. I've also
 tried StreamingUpdateSolrServer, which works much faster, but does
 throws connection reset exception once in a while.

 Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
 anything on it on Wiki.

 I was also thinking of using EmbeddedSolrServer - in what case would I
 be able to use it? Does my application and the Solr web app need to
 run into the same JVM for this to work? How would I use the
 EmbeddedSolrServer?

 Thanks,
 -vivek


 On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 Vivek, do you share the same SolrServer instance between your two threads?
 If so, are you using the MultiThreadedHttpConnectionManager when creating
 the HttpClient instance?

 On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote:

 single thread everything works fine. Two threads are fine too for a
 while and all the sudden problem starts happening.

 I tried indexing using REST services as well (instead of Solrj), but
 with that too I get following error after a while,

 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
 indexData()- Failed to index
 java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at
 org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
        at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
         at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
        at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)


 Note, I'm using simple lock type. I'd tried single type before
 that once caused index corruption so I switched to simple.

 Thanks,
 -vivek

 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
  do you see the same problem when you use a single thread?
 
  what is the version of SolrJ that you use?
 
 
 
  On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote:
  Hi,
 
   Any ideas on this issue? I ran into this again - once it starts
  happening it keeps happening. One of the thread keeps failing. Here
  are my SolrServer settings,
 
         int socketTO = 0;
         int connectionTO = 100;
         int maxConnectionPerHost = 10;
         int maxTotalConnection = 50;
         boolean followRedirects = false;
         boolean allowCompression = true;
         int maxRetries = 1;
 
  Note, I'm using two threads to simultaneously write to the same index.
 
  org.apache.solr.client.solrj.SolrServerException:
  org.apache.commons.httpclient.ProtocolException: Unbuffered entity
  enclosing request can not be repeated.
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
         at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
         at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
         at
 org.apache.solr.client.solrj.SolrServer.addBeans(SolrServer.java:57)
 
  Thanks,
  -vivek
 
  On Sat, Apr 4, 2009 at 1:07 AM, vivek sar vivex...@gmail.com wrote:
  Hi,
 
   I'm sending 15K records at once using Solrj (server.addBeans(...))
  and have two threads writing to same index. One thread goes fine, but
  the second 

Re: Custom DIH: FileDataSource with additional business logic?

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
FileDataSource is of type Reader . means getData() returns
ajava.io.Reader.That is not very suitable for you.

your best bet is to write a simple DataSource  which returns an
IteratorMapString,Object after reading the serialized Objects
.This is what JdbcdataSource does. Then you can use it with
SqlEntityProcessor

On Thu, Apr 9, 2009 at 9:42 PM, Giovanni De Stefano
giovanni.destef...@gmail.com wrote:
 Hello,

 here I am with another question.

 I am using DIH to index a DB. Additionally I also have to index some files
 containing Java serialized objects (and I cannot change this... :-( ).

 I currently have implemented a standalone Java app with the following
 features:

 1) read all files from a given folder
 2) deserialize the files into lists of items
 3) convert the list of items into lists of SolrInputDocument(s)
 4) post the lists of SolrInputDocument(s) to Solr

 All this is done using SolrJ. So far so good.

 I would like to use a DIH with a FileDataSource to do 1) and 4), and I would
 like to squeeze in my implementation for 2) and 3).

 Is this possible? Any hint?

 Thank you all in advance.

 Cheers,
 Giovanni




-- 
--Noble Paul


Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
using a single request is the fatest

http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65

I could index at the rate of 10,000 docs/sec using this and BinaryRequestWriter

On Thu, Apr 9, 2009 at 10:36 PM, vivek sar vivex...@gmail.com wrote:
 I'm inserting 10K in a batch (using addBeans method). I read somewhere
 in the wiki that it's better to use the same instance of SolrServer
 for better performance. Would MultiThreadedConnectionManager help? How
 do I use it?

 I also wanted to know how can use EmbeddedSolrServer - does my app
 needs to be running in the same jvm with Solr webapp?

 Thanks,
 -vivek

 2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 how many documents are you inserting ?
 may be you can create multiple instances of CommonshttpSolrServer and
 upload in parallel


 On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Shalin and Paul.

 I'm not using MultipartRequest. I do share the same SolrServer between
 two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
 simply using CommonsHttpSolrServer to create the SolrServer. I've also
 tried StreamingUpdateSolrServer, which works much faster, but does
 throws connection reset exception once in a while.

 Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
 anything on it on Wiki.

 I was also thinking of using EmbeddedSolrServer - in what case would I
 be able to use it? Does my application and the Solr web app need to
 run into the same JVM for this to work? How would I use the
 EmbeddedSolrServer?

 Thanks,
 -vivek


 On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 Vivek, do you share the same SolrServer instance between your two threads?
 If so, are you using the MultiThreadedHttpConnectionManager when creating
 the HttpClient instance?

 On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote:

 single thread everything works fine. Two threads are fine too for a
 while and all the sudden problem starts happening.

 I tried indexing using REST services as well (instead of Solrj), but
 with that too I get following error after a while,

 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
 indexData()- Failed to index
 java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at
 org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
        at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
         at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
        at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)


 Note, I'm using simple lock type. I'd tried single type before
 that once caused index corruption so I switched to simple.

 Thanks,
 -vivek

 2009/4/8 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
  do you see the same problem when you use a single thread?
 
  what is the version of SolrJ that you use?
 
 
 
  On Wed, Apr 8, 2009 at 1:19 PM, vivek sar vivex...@gmail.com wrote:
  Hi,
 
   Any ideas on this issue? I ran into this again - once it starts
  happening it keeps happening. One of the thread keeps failing. Here
  are my SolrServer settings,
 
         int socketTO = 0;
         int connectionTO = 100;
         int maxConnectionPerHost = 10;
         int maxTotalConnection = 50;
         boolean followRedirects = false;
         boolean allowCompression = true;
         int maxRetries = 1;
 
  Note, I'm using two threads to simultaneously write to the same index.
 
  org.apache.solr.client.solrj.SolrServerException:
  org.apache.commons.httpclient.ProtocolException: Unbuffered entity
  enclosing request can not be repeated.
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:470)
         at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
         at
 org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest.java:259)
         at
 org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:48)
         at
 

Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread Shalin Shekhar Mangar
On Thu, Apr 9, 2009 at 10:36 PM, vivek sar vivex...@gmail.com wrote:

 I'm inserting 10K in a batch (using addBeans method). I read somewhere
 in the wiki that it's better to use the same instance of SolrServer
 for better performance. Would MultiThreadedConnectionManager help? How
 do I use it?


If you are not passing your own HttpClient to the CommonsHttpSolrServer
constructor then you do not need to worry about this. The default is the
MultiThreadedConnectionManager.



 I also wanted to know how can use EmbeddedSolrServer - does my app
 needs to be running in the same jvm with Solr webapp?


Actually with EmbeddedSolrServer, there is no Solr webapp. You add it as
another jar in your own webapp.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Any tips for indexing large amounts of data?

2009-04-09 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Apr 9, 2009 at 8:51 PM, sunnyfr johanna...@gmail.com wrote:

 Hi Otis,
 How did you manage that? I've 8 core machine with 8GB of ram and 11GB index
 for 14M docs and 5 update every 30mn but my replication kill everything.
 My segments are merged too often sor full index replicate and cache lost and
  I've no idea what can I do now?
 Some help would be brilliant,
 btw im using Solr 1.4.


sunnnyfr , whether the replication is full or delta , the caches are
lost completely.

you can think of partitioning the index into separate Solrs and
updating one partition at a time and perform distributed search.

 Thanks,


 Otis Gospodnetic wrote:

 Mike is right about the occasional slow-down, which appears as a pause and
 is due to large Lucene index segment merging.  This should go away with
 newer versions of Lucene where this is happening in the background.

 That said, we just indexed about 20MM documents on a single 8-core machine
 with 8 GB of RAM, resulting in nearly 20 GB index.  The whole process took
 a little less than 10 hours - that's over 550 docs/second.  The vanilla
 approach before some of our changes apparently required several days to
 index the same amount of data.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

 - Original Message 
 From: Mike Klaas mike.kl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Monday, November 19, 2007 5:50:19 PM
 Subject: Re: Any tips for indexing large amounts of data?

 There should be some slowdown in larger indices as occasionally large
 segment merge operations must occur.  However, this shouldn't really
 affect overall speed too much.

 You haven't really given us enough data to tell you anything useful.
 I would recommend trying to do the indexing via a webapp to eliminate
 all your code as a possible factor.  Then, look for signs to what is
 happening when indexing slows.  For instance, is Solr high in cpu, is
 the computer thrashing, etc?

 -Mike

 On 19-Nov-07, at 2:44 PM, Brendan Grainger wrote:

 Hi,

 Thanks for answering this question a while back. I have made some
 of the suggestions you mentioned. ie not committing until I've
 finished indexing. What I am seeing though, is as the index get
 larger (around 1Gb), indexing is taking a lot longer. In fact it
 slows down to a crawl. Have you got any pointers as to what I might
 be doing wrong?

 Also, I was looking at using MultiCore solr. Could this help in
 some way?

 Thank you
 Brendan

 On Oct 31, 2007, at 10:09 PM, Chris Hostetter wrote:


 : I would think you would see better performance by allowing auto
 commit
 : to handle the commit size instead of reopening the connection
 all the
 : time.

 if your goal is fast indexing, don't use autoCommit at all ...
  just
 index everything, and don't commit until you are completely done.

 autoCommitting will slow your indexing down (the benefit being
 that more
 results will be visible to searchers as you proceed)




 -Hoss









 --
 View this message in context: 
 http://www.nabble.com/Any-tips-for-indexing-large-amounts-of-data--tp13510670p22973205.html
 Sent from the Solr - User mailing list archive at Nabble.com.





-- 
--Noble Paul


Dictionary lookup possibilities

2009-04-09 Thread Jaco
Hello,

I'm struggling with some ideas, maybe somebody can help me with past
experiences or tips. I have loaded a dictionary into a Solr index, using
stemming and some stopwords in analysis part of the schema. Each record
holds a term from the dictionary, which can consist of multiple words. For
some data analysis work, I want to send pieces of text (sentences actually)
to Solr to retrieve all possible dictionary terms that could occur. Ideally,
I want to construct a query that only returns those Solr records for which
all individual words in that record are matched.

For instance, my dictionary holds the following terms:
1 - a b c d
2 - c d e
3 - a b
4 - a e f g h

If I put the sentence [a b c d f g h] in as a query, I want to recieve
dictionary items 1 (matching all words a b c d) and 3 (matching words a b)
as matches

I have been puzzling about how to do this. The only way I found so far was
to construct an OR query with all words of the sentence in it. In this case,
that would result in all dictionary items being returned. This would then
require some code to go over the search results and analyse each of them
(i.e. by using the highlight function) to kick out 'false' matches, but I am
looking for a more efficient way.

Is there a way to do this with Solr functionality, or do I need to start
looking into the Lucene API ..?

Any help would be much appreciated as usual!

Thanks, bye,

Jaco.


logging

2009-04-09 Thread Kevin Osborn
We built our own webapp that used the Solr JARs. We used Apache Commons/log4j 
logging and just put log4j.properties in the Resin conf directory. The 
commons-logging and log4j jars were put in the Resin lib driectory. Everything 
worked great and we got log files for our code only.

So, I upgraded to Solr 1.4 and I no longer get my log file. I assume it has 
something to do with Solr 1.4 using SL4J instead of JDK logging, but it seems 
like my code would be independent of that. Any ideas?



  

Re: httpclient.ProtocolException using Solrj

2009-04-09 Thread vivek sar
Here is what I'm doing,

SolrServer server = new StreamingUpdateSolrServer(url, 1000,5);

server.addBeans(dataList);  //where dataList is Listsome_obj with 10K elements

I run two threads each using the same server object and then each call
server.addBeans(...).

I'm able to get 50K/sec inserted using that, but the commit after that
(after 100k records) takes 70sec - which messes up the avg time.

There are two problems here,

1) Once in a while I get connection reset error,

Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)

Note: if I use CommonsHttpSolrServer I get the buffer error.

2) The commit takes way too long for every 100k  (I may commit more
often if this can not be improved)

I'm trying to fix this error problem which happens only if I run two
threads both calling addBeans (10k at a time). One thread work fine.
I'm not sure how can I use the MultiThreadedConnectionManager to
create StreamingUpdateSolrServer and if they would help?

Thanks,
-vivek

2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 using a single request is the fatest

 http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65

 I could index at the rate of 10,000 docs/sec using this and 
 BinaryRequestWriter

 On Thu, Apr 9, 2009 at 10:36 PM, vivek sar vivex...@gmail.com wrote:
 I'm inserting 10K in a batch (using addBeans method). I read somewhere
 in the wiki that it's better to use the same instance of SolrServer
 for better performance. Would MultiThreadedConnectionManager help? How
 do I use it?

 I also wanted to know how can use EmbeddedSolrServer - does my app
 needs to be running in the same jvm with Solr webapp?

 Thanks,
 -vivek

 2009/4/9 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 how many documents are you inserting ?
 may be you can create multiple instances of CommonshttpSolrServer and
 upload in parallel


 On Thu, Apr 9, 2009 at 11:58 AM, vivek sar vivex...@gmail.com wrote:
 Thanks Shalin and Paul.

 I'm not using MultipartRequest. I do share the same SolrServer between
 two threads. I'm not using MultiThreadedHttpConnectionManager. I'm
 simply using CommonsHttpSolrServer to create the SolrServer. I've also
 tried StreamingUpdateSolrServer, which works much faster, but does
 throws connection reset exception once in a while.

 Do I need to use MultiThreadedHttpConnectionManager? I couldn't find
 anything on it on Wiki.

 I was also thinking of using EmbeddedSolrServer - in what case would I
 be able to use it? Does my application and the Solr web app need to
 run into the same JVM for this to work? How would I use the
 EmbeddedSolrServer?

 Thanks,
 -vivek


 On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 Vivek, do you share the same SolrServer instance between your two threads?
 If so, are you using the MultiThreadedHttpConnectionManager when creating
 the HttpClient instance?

 On Wed, Apr 8, 2009 at 10:13 PM, vivek sar vivex...@gmail.com wrote:

 single thread everything works fine. Two threads are fine too for a
 while and all the sudden problem starts happening.

 I tried indexing using REST services as well (instead of Solrj), but
 with that too I get following error after a while,

 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer -
 indexData()- Failed to index
 java.net.SocketException: Broken pipe
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at
 java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
        at 
 java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
        at
 org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145)
        at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499)
         at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
        at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)


 Note, I'm using simple lock type. I'd tried single type before
 that 

Re: How to get the solrhome location dynamically

2009-04-09 Thread Chris Hostetter

: Subject: How to get the solrhome location dynamically

Do you really want the Solr Home Dir, or do you want the instanceDir for a 
specific SolrCore?

If you're using a solr.xml file (ie: one or many cores), you can get hte 
instanceDir for each core from the CoreAdminHandler -- but it doesn't 
expost the actual SolrHomeDir where the solr.xml file was found.

If you aren't using a solr.xml file (ie: you definitely only have one 
core) you can get the instance dir from the SystemInfoRequestHandler 
(/admin/system in the example configs) ... and since you aren't using a 
solr.xml file, the instance dir is the same as the Solr Home Dir.


(H... I suppose the CoreAdminHandler should probably expose metadta 
about the CoreContainer ... anyone want to work up a patch?)





-Hoss



Question on Solr Distributed Search

2009-04-09 Thread vivek sar
Hi,

  I've another thread on multi-core distributed search, but just
wanted to put a simple question here on distributed search to get some
response. I've a search query,

   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa -
returns with 10 result

now if I add shards parameter to it,

  
http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9q=usa
 - this fails with

org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at
..
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
..
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

Attached is my solrconfig.xml. Do I need a special RequestHandler for
sharding? I haven't been able to make any distributed search
successfully. Any help is appreciated.

Note: I'm indexing using Solrj - not sure if that makes any difference
to the search part.

Thanks,
-vivek
?xml version=1.0 ?
!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the License); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an AS IS BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
--

config

  !-- Used to specify an alternate directory to hold all index data
   other than the default ./data under the Solr home.
   If replication is in use, this should match the replication configuration. --
  !--
  dataDir./solr/data/dataDir
  --

  indexDefaults
   !-- Values here affect all index writers and act as a default unless overridden. --
useCompoundFiletrue/useCompoundFile
mergeFactor100/mergeFactor
!-- maxBufferedDocs1/maxBufferedDocs --
ramBufferSizeMB64/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
lockTypesingle/lockType
  /indexDefaults

  mainIndex
!-- options specific to the main on-disk lucene index --
useCompoundFiletrue/useCompoundFile
mergeFactor100/mergeFactor

!-- maxBufferedDocs1000/maxBufferedDocs  --
!-- Tell Lucene when to flush documents to disk.
Giving Lucene more memory for indexing means faster indexing at the cost of more RAM
If both ramBufferSizeMB and maxBufferedDocs is set, then Lucene will flush based on whichever limit is hit first.
--
ramBufferSizeMB64/ramBufferSizeMB
maxMergeDocs2147483647/maxMergeDocs
maxFieldLength1/maxFieldLength

!-- If true, unlock any held write or commit locks on startup. 
 This defeats the locking mechanism that allows multiple
 processes to safely access a lucene index, and should be
 used with care. --
unlockOnStartuptrue/unlockOnStartup
lockTypesingle/lockType
  /mainIndex

  !-- the 

Re: Querying for multi-word synonyms

2009-04-09 Thread Chris Hostetter

: Unfortunately, I have to use SynonymFilter at query time due to the nature
: of the data I'm indexing. At index time, all I have are keywords but at
: query time I will have some semantic markup which allows me to expand into
: synonyms. I am wondering if any progress has been made into making query
: time synonym searching work correctly. If not, does anyone have some ideas
: for alternatives to using SynonymFilter? The only thing I can think of is to
: simply create a custom BooleanQuery for the search and feed the synonyms in
: manually, but then I am missing out on all the functionality of the dismax
: query parser. Any ideas are appreciated, thanks very much.

Fundementally the problem with multi-word query time synonyms is that the 
Analyzer only has a limited mechanism of conveying structure back to the 
caller (ie: the QueryParser) ... that mechanism being the term position 
-- you can indicate that terms can occupy the same single position, but 
not that sequences of terms can occupy the same position.

you could write a query parser that used nested SpanNearQueries to create 
a directed acyclic graph of terms that you want to match in a sequence, 
where some branches of the graph contain more nodes then others, but you 
would need to do the synonym recognition while building up the query (and 
working with the DAG) ... but the current SynonymFilter works as part of 
hte TokenStream.



-Hoss



Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar
I think the reason behind the connection reset is. Looking at the
code it points to QueryComponent.mergeIds()

resultIds.put(shardDoc.id.toString(), shardDoc);

looks like the doc unique id is returning null. I'm not sure how is it
possible as its a required field. Right my unique id is not stored
(only indexed) - does it has to be stored for distributed search?

HTTP Status 500 - null java.lang.NullPointerException at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)

On Thu, Apr 9, 2009 at 5:01 PM, vivek sar vivex...@gmail.com wrote:
 Hi,

  I've another thread on multi-core distributed search, but just
 wanted to put a simple question here on distributed search to get some
 response. I've a search query,

   http://etsx19.co.com:8080/solr/20090409_9/select?q=usa     -
 returns with 10 result

 now if I add shards parameter to it,

  http://etsx19.co.com:8080/solr/20090409_9/select?shards=etsx19.co.com:8080/solr/20090409_9q=usa
  - this fails with

 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset
 org.apache.solr.common.SolrException:
 org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
 at
 ..
        at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
        at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
        at java.lang.Thread.run(Thread.java:637)
 Caused by: org.apache.solr.client.solrj.SolrServerException:
 java.net.SocketException: Connection reset
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
        at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
        at 
 org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
 ..
 Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(SocketInputStream.java:168)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at 
 org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
        at 
 org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
        at 
 org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
        at 
 org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
        at 
 org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)

 Attached is my solrconfig.xml. Do I need a special RequestHandler for
 sharding? I haven't been able to make any distributed search
 successfully. Any help is appreciated.

 Note: I'm indexing using Solrj - not sure if that makes any difference
 to the search part.

 Thanks,
 -vivek



Re: Question on Solr Distributed Search

2009-04-09 Thread vivek sar
Just an update. I changed the schema to store the unique id field, but
I still get the connection reset exception. I did notice that if there
is no data in the core then it returns the 0 result (no exception),
but if there is data and you search using shards parameter I get the
connection reset exception. Can anyone provide some tip on where can I
look for this problem?


Apr 10, 2009 3:16:04 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:282)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:637)
Caused by: org.apache.solr.client.solrj.SolrServerException:
java.net.SocketException: Connection reset
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:473)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:242)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:422)
at 
org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:395)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
... 1 more
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at 
org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
at 
org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at 
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at 
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
at 
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at 
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)


On Thu, Apr 9, 2009 at 6:51 PM, vivek sar vivex...@gmail.com wrote:
 I think the reason behind the connection reset is. Looking at the
 code it points to QueryComponent.mergeIds()

 resultIds.put(shardDoc.id.toString(), shardDoc);

 looks like the doc unique id is returning null. I'm not sure how is it
 possible as its a required field. Right my unique id is not stored
 (only indexed) - does it has to be stored for distributed search?

 HTTP Status 500 - null java.lang.NullPointerException at
 org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:432)
 at 
 org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:276)
 at 
 

multiple tokenizers needed

2009-04-09 Thread Ashish P

I want to analyze a text based on pattern ; and separate on whitespace and
it is a Japanese text so use CJKAnalyzer + tokenizer also.
in short I want to do:
 analyzer 
class=org.apache.lucene.analysis.cjk.CJKAnalyzer
tokenizer class=solr.PatternTokenizerFactory 
pattern=; /
tokenizer 
class=solr.WhitespaceTokenizerFactory /
tokenizer 
class=org.apache.lucene.analysis.cjk.CJKTokenizer /
/analyzer 
Can anyone please tell me how to achieve this?? Because the above syntax is
not at all possible.
-- 
View this message in context: 
http://www.nabble.com/multiple-tokenizers-needed-tp22982382p22982382.html
Sent from the Solr - User mailing list archive at Nabble.com.