mix cased search terms

2010-04-23 Thread Tuan Nguyen
Hello list, first time posting here. I am trying to find an answer to  
a strange search behaviour we're finding in our VuFind application. In  
order to eliminate any VuFind related variables, I have used the  
vanilla Solr example schema to try our problematic search.


I posted this xml to the example schema, slightly modified version of  
the monitor.xml for testing:



  1
  In pursuit of the PhD
  Dell, Inc.
  electronics
  monitor
  In pursuit of the PhD
  In pursuit of the PhD
  401.6
  2199
  6
  true



Then run a query in the admin interface with debug on and got no match:

features:PhD

The debug info shows:

features:PhD
features:PhD
PhraseQuery(features:"ph d")
features:"ph d"

But, In the analysis tool, it shows a match for the split "ph d" given  
the query term PhD.


If I set the splitOnCaseChange=0 option in the WordDelimiterFilter,  
then a match is found as expected.


I appreciate any insight on this problem. Thanks in advance.

Tuan


Boost function on *:*

2010-04-23 Thread Blargy

Is it possible to use boost function across the whole index/empty search
term? 

I'm guessing the next question that would be asked is "Why would you want to
do that". Well with have a bunch of custom business metrics included in each
document (a product). I would like to only show the best products (based on
our metrics and some boost functions) in absence of a search term. 

Is this possible?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-function-on-tp747131p747131.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr does not honor facet.mincount and field.facet.mincount

2010-04-23 Thread Koji Sekiguchi

Umesh_ wrote:
Hi All, 


I am trying to restrict facets in solr response, by setting facet.mincount =
1, which does not work as the request and response are shown below: 


REQUEST:
http://localhost:8983/solr/select/?q=*%3A*&version=2.2&rows=0&start=0&indent=on&facet=true&facet.field=Instrument&facet.field=Location&facet.mincount=9

RESPONSE: 
 
− 
 
0 
1 
− 
 
true 
on 
0 
*:* 
− 
 
Instrument 
Location 
 
9 
2.2 
0 
 
 
 
− 
 
 
− 
 
 
− 
 
118 
7 
 
 
 
 
 


As we can see from the response that Instrument facet which has zero number
of distinct values, is included in the response. Also facet "Philadelphia,
Pennsylvania [unconfirmed]" which has count less than mincount (9) is
included in the response. 

  

The emptiness of Instrument field of the response shows that Solr
couldn't facet data (9 or above docs) on the field. Regarding Location
field, the result is weird. Can you show us the data and the
field type of the field to reproduce the problem?


I also tried Instrument.facet.mincount=1 but still I see Instrument facet in
the response. 

  
Per field parameter needs "f." prefix. It should be 
f.Instrument.face.mincount=1.


Koji

--
http://www.rondhuit.com/en/



Re: Solr full-import not working as expected

2010-04-23 Thread MitchK

Unfortunately you haven't answered my question, saratv.
The important question is, why did your DIH-configuration not import those
rows. 

Without providing any schema-information or configuration-details of your
DIH, no one will be able to help you.
Just for the future: If something don't work, present detailed information
to get fast and good-quality help.

Regards
- Mitch 


saratv wrote:
> 
> is there a way ..from a java program  that i can find missing rows in the
> database...and import those rows into solr as docs...or atleast is there a
> way to find which rows r missing...
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-full-import-not-working-as-expected-tp744937p746927.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Comparing two queries

2010-04-23 Thread Villemos, Gert
Yes, your solution is much simpler, providing the result through a single 
query. I didnt understand it the first time I read it.
 
I guess you would need to run it backwards as well to really evaluate the 
relevance, i.e. 
 
First 
q=&facet=on&facet.query=

Then 
q=&facet=on&facet.query=
 
Query 1 may return 100.000 hits with 500 overlapping with query 2. This would 
indicate no relevance.
Query 2 may return 1.000 documents with 500 overlaping with 1. This would 
indicate relevance.
 
I will test it out the next days and let you know how it works for us.
 
Regardsm
Gert.
 
 
 



From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Fri 4/23/2010 11:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Comparing two queries



Gert,

In your second query example you used "qf=...".  Did you mean "fq=" ?  If 
so, the answer is no - filter queries don't affect the score.


I haven't tried your approach, but intuitively feel that looking at % overlap 
may work better.
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Villemos, Gert" 
> To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 5:08:04 PM
> Subject: RE: Comparing two queries
>
> I was thinking along the lines

1. Retrieve the top result for one
> query.
2. Take the resulting document and evaluate the score that it would
> get in another query.
3. If the scores are similar, then the queries most
> likely overlap.

I guess that if I had two simple query strings "archive
> crash" and query "archiving failure" then I could:

1. Use the query
> ?q="archive crash"&rows=1 which will return me one result (if any).
2.
> Read the score of the returned document.
3. Read the unique identifier field
> value, lets say it has field name 'URI' and value
> "50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q="archiving
> failure"&qf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55&rows=1
5. Read
> the score of the returned document (the document will be the same as returned
> under 1, the score will be different, evaluated based on the second
> query).
6. Evaluate how similar the scores are.

My question this
> approach is; is the score calculated in 4 affected by the subquery, whoes role
> is solely to select a specific result?

I'm using the dismax by the way.
> Should I use the standard handler instead? Would it make a difference?

>
Thanks,
Gert.


>



From: Erik Hatcher [mailto:
> ymailto="mailto:erik.hatc...@gmail.com";
> href="mailto:erik.hatc...@gmail.com";>erik.hatc...@gmail.com]
Sent: Fri
> 4/23/2010 8:08 PM
To:
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
Subject:
> Re: Comparing two queries



Or, use facet.query to get the
> overlap.  Here's
> ?
q=&facet=on&facet.query=

You'll
> get the hit count from query #1 in the results, and the
overlapping count to
> query #2 in the facet query response.

Erik -
>
> >http://www.lucidimagination.com   <
> href="http://www.lucidimagination.com/"; target=_blank
> >http://www.lucidimagination.com/>

On Apr 23, 2010, at 11:01 AM,
> Otis Gospodnetic wrote:

> Hello Gert,
>
> I think you'd
> have to apply custom heuristics that involves looking
> at top N hits for
> each query and looking at the % overlap.
>
> Otis
>
> 
> Sematext ::
> >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem
> search ::
> >http://search-lucene.com/
>
>
>
> - Original
> Message 
>> From: "Villemos, Gert" <
> ymailto="mailto:gert.ville...@logica.com";
> href="mailto:gert.ville...@logica.com";>gert.ville...@logica.com>
>>
> To:
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>
> Sent: Fri, April 23, 2010 10:20:54 AM
>> Subject: Comparing two
> queries
>>
>> We want to support that a user can register for
> interest in
>> information,
> based on a query he has defined
> himself. For example that he
>> type in a
> query, press a save
> button, provides his email and the system will
>> now
> email him
> with a daily digest.
>
>
>
> As part of this, it
> would
>> be nice to be able to tell the user that the
> same / a
> similar query are
>> already being monitored by another user,
> as
> the users will likely have the
>> same interests. I would
> therefore like to
> evaluate whether two queries will
>> return
> (almost) the same set of
> results.
>
>
>
> But
> how can I
>> compare two queries to determine if they will
> return
> (almost) the same set of
>>
> results?
>
>
>
> Thanks,
>
>
> Gert.
>
>
>
> Please help Logica
>> to respect
> the environment by not printing this email  / Pour
>>
> contribuer
>> comme Logica au respect de l'environnement, merci de ne
> pas
>> imprimer ce mail
>> /  Bitte drucken Sie diese
> Nachricht nicht aus und helfen Sie so
>> Logica
>> dabei, die
> Umwelt zu schützen. / 

Re: Comparing two queries

2010-04-23 Thread Otis Gospodnetic
Gert,

In your second query example you used "qf=...".  Did you mean "fq=" ?  If 
so, the answer is no - filter queries don't affect the score.


I haven't tried your approach, but intuitively feel that looking at % overlap 
may work better. 
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Villemos, Gert" 
> To: solr-user@lucene.apache.org; solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 5:08:04 PM
> Subject: RE: Comparing two queries
> 
> I was thinking along the lines

1. Retrieve the top result for one 
> query.
2. Take the resulting document and evaluate the score that it would 
> get in another query.
3. If the scores are similar, then the queries most 
> likely overlap.

I guess that if I had two simple query strings "archive 
> crash" and query "archiving failure" then I could:

1. Use the query 
> ?q="archive crash"&rows=1 which will return me one result (if any).
2. 
> Read the score of the returned document.
3. Read the unique identifier field 
> value, lets say it has field name 'URI' and value 
> "50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q="archiving 
> failure"&qf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55&rows=1
5. Read 
> the score of the returned document (the document will be the same as returned 
> under 1, the score will be different, evaluated based on the second 
> query).
6. Evaluate how similar the scores are.

My question this 
> approach is; is the score calculated in 4 affected by the subquery, whoes 
> role 
> is solely to select a specific result?

I'm using the dismax by the way. 
> Should I use the standard handler instead? Would it make a difference?

> 
Thanks,
Gert.


> 



From: Erik Hatcher [mailto:
> ymailto="mailto:erik.hatc...@gmail.com"; 
> href="mailto:erik.hatc...@gmail.com";>erik.hatc...@gmail.com]
Sent: Fri 
> 4/23/2010 8:08 PM
To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
Subject: 
> Re: Comparing two queries



Or, use facet.query to get the 
> overlap.  Here's 
> ?
q=&facet=on&facet.query=

You'll 
> get the hit count from query #1 in the results, and the 
overlapping count to 
> query #2 in the facet query response.

Erik - 
> 
> >http://www.lucidimagination.com <
> href="http://www.lucidimagination.com/"; target=_blank 
> >http://www.lucidimagination.com/> 

On Apr 23, 2010, at 11:01 AM, 
> Otis Gospodnetic wrote:

> Hello Gert,
>
> I think you'd 
> have to apply custom heuristics that involves looking 
> at top N hits for 
> each query and looking at the % overlap.
>
> Otis
> 
> 
> Sematext :: 
> >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem 
> search :: 
> >http://search-lucene.com/
>
>
>
> - Original 
> Message 
>> From: "Villemos, Gert" <
> ymailto="mailto:gert.ville...@logica.com"; 
> href="mailto:gert.ville...@logica.com";>gert.ville...@logica.com>
>> 
> To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>> 
> Sent: Fri, April 23, 2010 10:20:54 AM
>> Subject: Comparing two 
> queries
>>
>> We want to support that a user can register for 
> interest in
>> information,
> based on a query he has defined 
> himself. For example that he
>> type in a
> query, press a save 
> button, provides his email and the system will
>> now
> email him 
> with a daily digest.
>
>
>
> As part of this, it 
> would
>> be nice to be able to tell the user that the
> same / a 
> similar query are
>> already being monitored by another user, 
> as
> the users will likely have the
>> same interests. I would 
> therefore like to
> evaluate whether two queries will
>> return 
> (almost) the same set of
> results.
>
>
>
> But 
> how can I
>> compare two queries to determine if they will 
> return
> (almost) the same set of
>> 
> results?
>
>
>
> Thanks,
>
> 
> Gert.
>
>
>
> Please help Logica
>> to respect 
> the environment by not printing this email  / Pour 
>> 
> contribuer
>> comme Logica au respect de l'environnement, merci de ne 
> pas 
>> imprimer ce mail
>> /  Bitte drucken Sie diese 
> Nachricht nicht aus und helfen Sie so 
>> Logica
>> dabei, die 
> Umwelt zu schützen. /  Por favor ajude a Logica a 
>> respeitar 
> o
>> ambiente nao imprimindo este correio 
> electronico.
>
>
>
> This e-mail and
>> any 
> attachment is for authorised use by the intended recipient(s) 
>> only. 
> It may
>> contain proprietary material, confidential information and/or 
> be 
>> subject to
>> legal privilege. It should not be copied, 
> disclosed to, retained or 
>> used by, any
>> other party. If 
> you are not an intended recipient then please 
>> promptly 
> delete
>> this e-mail and any attachment and all copies and inform the 
> 
>> sender. Thank
>> you.






Please 
> help Logica to respect the environment by not printing this email  / Pour 
> contribuer comme Logica au respect de l'environnement, merci de ne pas 
> impri

RE: Comparing two queries

2010-04-23 Thread Villemos, Gert
I was thinking along the lines
 
1. Retrieve the top result for one query.
2. Take the resulting document and evaluate the score that it would get in 
another query.
3. If the scores are similar, then the queries most likely overlap.
 
I guess that if I had two simple query strings "archive crash" and query 
"archiving failure" then I could:
 
1. Use the query ?q="archive crash"&rows=1 which will return me one result (if 
any).
2. Read the score of the returned document.
3. Read the unique identifier field value, lets say it has field name 'URI' and 
value "50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55'.
4. Use the query ?q="archiving 
failure"&qf=URI:50d1c07b-a635-4f9a-a6eb-f2e3ebcb6b55&rows=1
5. Read the score of the returned document (the document will be the same as 
returned under 1, the score will be different, evaluated based on the second 
query).
6. Evaluate how similar the scores are.
 
My question this approach is; is the score calculated in 4 affected by the 
subquery, whoes role is solely to select a specific result?
 
I'm using the dismax by the way. Should I use the standard handler instead? 
Would it make a difference?
 
Thanks,
Gert.
 
 



From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Fri 4/23/2010 8:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Comparing two queries



Or, use facet.query to get the overlap.  Here's ?
q=&facet=on&facet.query=

You'll get the hit count from query #1 in the results, and the 
overlapping count to query #2 in the facet query response.

Erik - http://www.lucidimagination.com 
 

On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote:

> Hello Gert,
>
> I think you'd have to apply custom heuristics that involves looking 
> at top N hits for each query and looking at the % overlap.
>
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
>> From: "Villemos, Gert" 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, April 23, 2010 10:20:54 AM
>> Subject: Comparing two queries
>>
>> We want to support that a user can register for interest in
>> information,
> based on a query he has defined himself. For example that he
>> type in a
> query, press a save button, provides his email and the system will
>> now
> email him with a daily digest.
>
>
>
> As part of this, it would
>> be nice to be able to tell the user that the
> same / a similar query are
>> already being monitored by another user, as
> the users will likely have the
>> same interests. I would therefore like to
> evaluate whether two queries will
>> return (almost) the same set of
> results.
>
>
>
> But how can I
>> compare two queries to determine if they will return
> (almost) the same set of
>> results?
>
>
>
> Thanks,
>
> Gert.
>
>
>
> Please help Logica
>> to respect the environment by not printing this email  / Pour 
>> contribuer
>> comme Logica au respect de l'environnement, merci de ne pas 
>> imprimer ce mail
>> /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so 
>> Logica
>> dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
>> respeitar o
>> ambiente nao imprimindo este correio electronico.
>
>
>
> This e-mail and
>> any attachment is for authorised use by the intended recipient(s) 
>> only. It may
>> contain proprietary material, confidential information and/or be 
>> subject to
>> legal privilege. It should not be copied, disclosed to, retained or 
>> used by, any
>> other party. If you are not an intended recipient then please 
>> promptly delete
>> this e-mail and any attachment and all copies and inform the 
>> sender. Thank
>> you.






Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: Tomcat vs. WebSphere

2010-04-23 Thread Deo, Shantanu
We have run SOLR in weblogic without problems. The only change we see is
some spurious extra logging info which we don¹t see in the case of tomcat.
Anyone have an idea of how to control that ?

Thanks
Shantanu


On 4/23/10 12:53 PM, "Ken Lane (kenlane)"  wrote:

> Does anyone know of any advantages/disadvantages to running SOLR on
> WebSphere versus Tomcat?
> 
> 
> 
> Thanks,
> 
> Ken
> 
> 




Re: Collapse problem

2010-04-23 Thread Chris Hostetter
: basically, we are running query with field collapsing (Solr 1.4 with 
: patch 236). The responses tells us that there are about 2700 documents 
: matching our query. However, I can not get passed the 431th document. 
: From this point on, the response will not contain any document.

isn't that how collapse is suppose to work?  a total of 2700 match, but it 
collapses away many of them according to some criteria, so you only 
paginate through 431?


-Hoss



Re: Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-23 Thread Otis Gospodnetic
Chris,

It looks like Mike already offered several solutions though I don't know 
what Solr does without looking at the code.

But I'm curious:
* how big is your index? and do you know how large the segments being merged 
are?
* do you batch docs or do you make use of Streaming SolrServer?
 I'm curious, because I've never encountered this problem before...

Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Chris Harris 
> To: solr-user@lucene.apache.org
> Sent: Thu, April 22, 2010 6:28:29 PM
> Subject: Best way to prevent this search lockup (apparently caused during big 
>  segment merges)?
> 
> I'm running Solr 1.4+ under Tomcat 6, with indexing and searching
requests 
> simultaneously hitting the same Solr machine. Sometimes Solr,
Tomcat, and my 
> (C#) indexing process conspire to render search
inoperable. So far I've only 
> noticed this while big segment merges
(i.e. merges that take multiple 
> minutes) are taking place.

Let me explain the situation as best as I 
> understand it.

My indexer has a main loop that looks roughly like 
> this:

  while true:
try:
  
> submit a new add or delete request to Solr via HTTP
catch 
> timeoutException:
  sleep a few seconds

When things 
> are going wrong (i.e., when a large segment merge is
happening), this loop is 
> problematic:

* When the indexer's request hits Solr, then the 
> corresponding thread
in Tomcat blocks. (It looks to me like the thread is 
> destined to block
until the entire merge is complete. I'll paste in what the 
> Java stack
traces look like at the end of the message if they can help 
> diagnose
things.)
* Because the Solr thread stays blocked for so long, 
> eventually the
indexer hits a timeoutException. (That is, it gives up on 
> Solr.)
* Hitting the timeout exception doesn't cause the corresponding 
> Tomcat
thread to die or unblock. Therefore, each time through the 
> loop,
another Solr-handling thread inside Tomcat enters a blocked state.
* 
> Eventually so many threads (maxThreads, whose Tomcat default is 200)
are 
> blocked that Tomcat starts rejecting all new Solr HTTP requests --
including 
> those coming in from the web tier.
* Users are unable to search. The problem 
> might self-correct once the
merge is complete, but that could be quite a 
> while.

What are my options for changing Solr settings or changing my 
> indexing
process to avoid this lockup scenario? Do you agree that the 
> segment
merge is helping cause the lockup? Do adds and deletes really need 
> to
block on segment merges?

Partial thread dumps follow, showing 
> example add and delete threads
that are blocked. Also the active Lucene Merge 
> Thread, and the thread
that kicked off the merge.

[doc deletion 
> thread, waiting for DirectUpdateHandler2.iwCommit.lock()
to 
> return]
"http-1800-200" daemon prio=6 tid=0x0a58cc00 
> nid=0x1028
waiting on condition 
> [0x0f9ae000..0x0f9afa90]
   java.lang.Thread.State: 
> WAITING (parking)
at sun.misc.Unsafe.park(Native 
> Method)
- parking to wait for  
> <0x00016d801ae0> 
> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

> at java.util.concurrent.locks.LockSupport.park(Unknown 
> Source)
at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
Source)

> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
Source)

> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown
Source)

> at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown
Source)

> at 
> org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320)

> at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71)

> at 
> org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)

> at 
> org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)

> at 
> org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)

> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)

> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

> at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)

> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)

> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

> at 
> org.apache.catalina.core.StandardCo

Re: What hardware do I need ?

2010-04-23 Thread Otis Gospodnetic
Xavier,

100-700 QPS is still high.  I'm guessing your 1 box won't handle that without 
sweating a lot (read: slow queries).
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Xavier Schepler 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 11:53:23 AM
> Subject: Re: What hardware do I need ?
> 
> Le 23/04/2010 17:08, Otis Gospodnetic a écrit :
> Xavier,
>
> 
> 0-1000 QPS is a pretty wide range.  Plus, it depends on how good your 
> auto-complete is, which depends on types of queries it issues, among other 
> things.
> 100K short docs is small, so that will all fit in RAM nicely, 
> assuming those other processes leave enough RAM for the OS to cache the 
> index.
>
>   That said, you do need more than 1 box if you want 
> your auto-complete more fault tolerant.
>
> Otis
> 
> 
> Sematext :: 
> >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem 
> search :: 
> >http://search-lucene.com/
>
>
>
> - Original 
> Message 
>
>> From: Xavier Schepler<
> ymailto="mailto:xavier.schep...@sciences-po.fr"; 
> href="mailto:xavier.schep...@sciences-po.fr";>xavier.schep...@sciences-po.fr>
>> 
> To: 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>> 
> Sent: Fri, April 23, 2010 11:01:24 AM
>> Subject: What hardware do I 
> need ?
>>
>> Hi,
>>  
> I'm 
> working with Solr 1.4.
> My schema has about 50 fields.
> 
> I'm
>
>> using full text search in short strings (~ 
> 30-100 terms) and facetted
>> search.
>>  
> 
> My index will have 100 000 documents.
>
> The number of 
> requests
>
>> per second will be low. Let's say 
> between 0 and 1000 because of
>> auto-complete.
>>  
> 
> Is a standard server (3ghz proc, 4gb ram) with the 
> client
>
>> application (apache + php5 + ZF + apc) 
> and Tomcat + Solr enough ???
>>  
> Do I 
> need
>
>> more hardware ?
>>
>   
> Thanks in advance,
>
> Xavier S.
>  
>   
Well my auto-complete is built on the facet prefix search 
> component.
I think that 100-700 requests per seconds is maybe a better 
> approximation.


Re: Tomcat vs. WebSphere

2010-04-23 Thread Otis Gospodnetic
I've never used WebSphere, but I always got the impression that people have 
more issues with it than with simpler solutions.
Personally, I would suggest Jetty.  I've used it dozens of times and never had 
issues with it.  It's small, simple, and fast.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Ken Lane (kenlane) 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 3:53:07 PM
> Subject: Tomcat vs. WebSphere
> 
> Does anyone know of any advantages/disadvantages to running SOLR on
WebSphere 
> versus Tomcat?



Thanks,

Ken


Tomcat vs. WebSphere

2010-04-23 Thread Ken Lane (kenlane)
Does anyone know of any advantages/disadvantages to running SOLR on
WebSphere versus Tomcat?

 

Thanks,

Ken



Solr does not honor facet.mincount and field.facet.mincount

2010-04-23 Thread Umesh_

Hi All, 

I am trying to restrict facets in solr response, by setting facet.mincount =
1, which does not work as the request and response are shown below: 

REQUEST:
http://localhost:8983/solr/select/?q=*%3A*&version=2.2&rows=0&start=0&indent=on&facet=true&facet.field=Instrument&facet.field=Location&facet.mincount=9

RESPONSE: 
 
− 
 
0 
1 
− 
 
true 
on 
0 
*:* 
− 
 
Instrument 
Location 
 
9 
2.2 
0 
 
 
 
− 
 
 
− 
 
 
− 
 
118 
7 
 
 
 
 
 

As we can see from the response that Instrument facet which has zero number
of distinct values, is included in the response. Also facet "Philadelphia,
Pennsylvania [unconfirmed]" which has count less than mincount (9) is
included in the response. 

I also tried Instrument.facet.mincount=1 but still I see Instrument facet in
the response. 

Please let me know if my understanding of mincount is different than what it
is intended to do, OR if I am doing something which is not correct. 

Regards, 
Umesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-does-not-honor-facet-mincount-and-field-facet-mincount-tp746499p746499.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Comparing two queries

2010-04-23 Thread Erik Hatcher
Or, use facet.query to get the overlap.  Here's ? 
q=&facet=on&facet.query=


You'll get the hit count from query #1 in the results, and the  
overlapping count to query #2 in the facet query response.


Erik - http://www.lucidimagination.com

On Apr 23, 2010, at 11:01 AM, Otis Gospodnetic wrote:


Hello Gert,

I think you'd have to apply custom heuristics that involves looking  
at top N hits for each query and looking at the % overlap.


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: "Villemos, Gert" 
To: solr-user@lucene.apache.org
Sent: Fri, April 23, 2010 10:20:54 AM
Subject: Comparing two queries

We want to support that a user can register for interest in
information,

based on a query he has defined himself. For example that he

type in a

query, press a save button, provides his email and the system will

now

email him with a daily digest.



As part of this, it would

be nice to be able to tell the user that the

same / a similar query are

already being monitored by another user, as

the users will likely have the

same interests. I would therefore like to

evaluate whether two queries will

return (almost) the same set of

results.



But how can I

compare two queries to determine if they will return

(almost) the same set of

results?




Thanks,

Gert.



Please help Logica
to respect the environment by not printing this email  / Pour  
contribuer
comme Logica au respect de l'environnement, merci de ne pas  
imprimer ce mail
/  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so  
Logica
dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a  
respeitar o

ambiente nao imprimindo este correio electronico.




This e-mail and
any attachment is for authorised use by the intended recipient(s)  
only. It may
contain proprietary material, confidential information and/or be  
subject to
legal privilege. It should not be copied, disclosed to, retained or  
used by, any
other party. If you are not an intended recipient then please  
promptly delete
this e-mail and any attachment and all copies and inform the  
sender. Thank

you.




SolrJ + BasicAuth

2010-04-23 Thread Jon Baer
Uggg I just got bit hard by this on a Tomcat project ... 

https://issues.apache.org/jira/browse/SOLR-1238

Is there anyway to get access to that RequestEntity w/o patching?  Also are 
there security implications w/ using the repeatable payloads?

Thanks.

- Jon

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Paul Borgermans
On Fri, Apr 23, 2010 at 5:48 PM, Marc Ghorayeb  wrote:
>
> Yes, the only log i can actually get is the one in the command console from 
> windows and there are no errors there ...
> Here are the last lines when i upload a pdf to the update/extract url:



I am pretty sure it is the tika itself that does not manage to convert
your pdf. I'm not using solr cell but tika from a commandline, and it
is only with very recent tika builds pdf extraction works in most
cases.

So I suggest to build tika from svn yourself, and if the commandlien
extraction works, integarte it back with Solr. See

http://wiki.apache.org/solr/ExtractingRequestHandler

for instructions (the comitter section)

hth
Paul


Re: What hardware do I need ?

2010-04-23 Thread Xavier Schepler

Le 23/04/2010 17:08, Otis Gospodnetic a écrit :

Xavier,

0-1000 QPS is a pretty wide range.  Plus, it depends on how good your 
auto-complete is, which depends on types of queries it issues, among other 
things.
100K short docs is small, so that will all fit in RAM nicely, assuming those 
other processes leave enough RAM for the OS to cache the index.

  That said, you do need more than 1 box if you want your auto-complete more 
fault tolerant.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
   

From: Xavier Schepler
To: solr-user@lucene.apache.org
Sent: Fri, April 23, 2010 11:01:24 AM
Subject: What hardware do I need ?

Hi,
 

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm
   

using full text search in short strings (~ 30-100 terms) and facetted
search.
 

My index will have 100 000 documents.

The number of requests
   

per second will be low. Let's say between 0 and 1000 because of
auto-complete.
 

Is a standard server (3ghz proc, 4gb ram) with the client
   

application (apache + php5 + ZF + apc) and Tomcat + Solr enough ???
 

Do I need
   

more hardware ?
 

Thanks in advance,

Xavier S.
   

Well my auto-complete is built on the facet prefix search component.
I think that 100-700 requests per seconds is maybe a better approximation.



RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Yes, the only log i can actually get is the one in the command console from 
windows and there are no errors there ...
Here are the last lines when i upload a pdf to the update/extract url:
Apr 23, 2010 5:47:03 PM org.apache.solr.servlet.SolrServlet initINFO: 
SolrServlet.init() doneApr 23, 2010 5:47:03 PM org.apache.solr.core.SolrCore 
executeINFO: [] webapp=null path=null 
params={event=firstSearcher&q=static+firstSearcher+warming+query+from+solrconfig.xml}
 hits=0 status=0 QTime=0Apr 23, 2010 5:47:03 PM 
org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: JNDI not configured 
for solr (NoInitialContextEx)Apr 23, 2010 5:47:03 PM 
org.apache.solr.core.SolrResourceLoader locateSolrHomeINFO: solr home defaulted 
to 'solr/' (could not find system property or JNDI)Apr 23, 2010 5:47:03 PM 
org.apache.solr.servlet.SolrUpdateServlet initINFO: SolrUpdateServlet.init() 
doneApr 23, 2010 5:47:03 PM org.apache.solr.core.QuerySenderListener 
newSearcherINFO: QuerySenderListener done.Apr 23, 2010 5:47:03 PM 
org.apache.solr.handler.component.SpellCheckComponent$SpellCheckerListener 
newSearcherINFO: Loading spell index for spellchecker: default2010-04-23 
17:47:03.530:INFO::Opened 
E:\users\M1B\search\solr-new\example\logs\2010_04_23.request.log2010-04-23 
17:47:03.546:INFO::Started socketconnec...@0.0.0.0:8983Apr 23, 2010 5:47:03 PM 
org.apache.solr.core.SolrCore registerSearcherINFO: [] Registered new searcher 
searc...@259a8416 mainApr 23, 2010 5:47:11 PM 
org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0 297Apr 23, 
2010 5:47:11 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr 
path=/update/extract 
params={extractOnly=true&literal.url=http://www.3ds.com/lucidworks-solr-refguide-1.4.pdf&literal.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\lucidworks-solr-refguide-1.4.pdf&literal.type=document&literal.appKey=media&literal.title=lucidworks-solr-refguide-1.4.pdf&wt=javabin&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&version=1&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17}
 status=0 QTime=297
Apr 23, 2010 5:47:12 PM org.apache.solr.update.processor.LogUpdateProcessor 
finishINFO: {} 0 0Apr 23, 2010 5:47:12 PM org.apache.solr.core.SolrCore 
executeINFO: [] webapp=/solr path=/update/extract 
params={extractOnly=true&literal.url=http://www.3ds.com/mysql-proxy-en.pdf&literal.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\mysql-proxy-en.pdf&literal.type=document&literal.appKey=media&literal.title=mysql-proxy-en.pdf&wt=javabin&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&version=1&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17}
 status=0 QTime=0Apr 23, 2010 5:47:13 PM 
org.apache.solr.update.processor.LogUpdateProcessor finishINFO: {} 0 0Apr 23, 
2010 5:47:13 PM org.apache.solr.core.SolrCore executeINFO: [] webapp=/solr 
path=/update/extract 
params={extractOnly=true&literal.url=http://www.3ds.com/python-cheat-sheet-v1.pdf&literal.id=C:\Documents+and+Settings\M1B\workspace\3DS_FileIndexer\test\python-cheat-sheet-v1.pdf&literal.type=document&literal.appKey=media&literal.title=python-cheat-sheet-v1.pdf&wt=javabin&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&version=1&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17}
 status=0 QTime=0Apr 23, 2010 5:47:14 PM 
org.apache.solr.update.DirectUpdateHandler2 commitINFO: start 
commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDeletes=false)Apr 
23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher INFO: 
Opening searc...@2efeecca mainApr 23, 2010 5:47:14 PM 
org.apache.solr.update.DirectUpdateHandler2 commitINFO: end_commit_flushApr 23, 
2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: autowarming 
searc...@2efeecca main from searc...@259a8416 main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming result for searc...@2efeecca main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINFO: 
autowarming searc...@2efeecca main from searc...@259a8416 main
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}Apr
 23, 2010 5:47:14 PM org.apache.solr.search.SolrIndexSearcher warmINF

Re: What hardware do I need ?

2010-04-23 Thread Otis Gospodnetic
Xavier,

0-1000 QPS is a pretty wide range.  Plus, it depends on how good your 
auto-complete is, which depends on types of queries it issues, among other 
things.
100K short docs is small, so that will all fit in RAM nicely, assuming those 
other processes leave enough RAM for the OS to cache the index.

 That said, you do need more than 1 box if you want your auto-complete more 
fault tolerant.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Xavier Schepler 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 11:01:24 AM
> Subject: What hardware do I need ?
> 
> Hi,

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm 
> using full text search in short strings (~ 30-100 terms) and facetted 
> search.
My index will have 100 000 documents.

The number of requests 
> per second will be low. Let's say between 0 and 1000 because of 
> auto-complete.

Is a standard server (3ghz proc, 4gb ram) with the client 
> application (apache + php5 + ZF + apc) and Tomcat + Solr enough ???
Do I need 
> more hardware ?

Thanks in advance,

Xavier S.


Re: Multiple query searches in one request

2010-04-23 Thread Otis Gospodnetic
Hi,

Yes, a custom SearchComponent will do this.  We'd done stuff like this before 
and actually have this sort of functionality in some of Sematext products - it 
works well if you don't mind writing and adding another SearchComponent to your 
chain.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: phoey 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 10:23:38 AM
> Subject: Multiple query searches in one request
> 
> 
Hi there,

Is it possible to do a search more than once, where only 
> the filter query
changes. The response is the three different search 
> results.

We want a page which shows a "clustered" view of 5 of each of 
> the three
types (images, news articles, editorial articles), ordered by their 
> score.

One possibility is doing three seperate solr search requests, but 
> its not
really a neat solution. 

One answer could be making a custom 
> request handler, could that be possible
to solve this issue? Could you give 
> me some pointers on how to implement
one?

thanks
-- 
View this 
> message in context: 
> href="http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html";
>  
> target=_blank 
> >http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html
Sent 
> from the Solr - User mailing list archive at Nabble.com.


Merging Solr Cores Urgent

2010-04-23 Thread abhatna...@vantage.com

Hi,

I have a Question- Merging Solr Cores

The Wiki Documentation says that "Merged" core must exist prior to calling
the merge command

So I created the "Merged" core and pointed it to some "data dir".

However even after merging the cores it does still points to the old "data
dir"

Shouldn't the merge command create a new data/index or at least the contents
of the merged index.?


Ankit
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Merging-Solr-Cores-Urgent-tp745938p745938.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
Marc,

These are your request logs.  You want to look at your Solr logs.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Marc Ghorayeb 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 9:12:39 AM
> Subject: RE: Problem with pdf, upgrading Cell
> 
> 
I'm launching it with the start.jar utility, and there doesn't seem to be 
> anything weird inside the console when i upload a pdf. Is there a way to 
> output 
> the console to a log file? The only log file that get's updated is a log file 
> in 
> the logs directory, and it seems to only show the input/ouput of the web 
> requests (get and posts...).
for example:127.0.0.1 -  -  
> [23/Apr/2010:13:06:47 +] "GET /solr/core0/admin/luke?show=schema&wt=json 
> HTTP/1.1" 200 21690 127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] "GET 
> /solr/core0/admin/luke?wt=json HTTP/1.1" 200 780 127.0.0.1 -  -  
> [23/Apr/2010:13:06:57 +] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdf&literal.title=lucidworks-solr-refguide-1.4.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  
> HTTP/1.1" 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdf&literal.title=mysql-proxy-en.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  
> HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdf&literal.title=python-cheat-sheet-v1.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  
> HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] "POST 
> /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 -  -  
> [23/Apr/2010:13:07:00 +] "POST /solr/core0/update HTTP/1.1" 200 41 
> 127.0.0.1 
> -  -  [23/Apr/2010:13:07:05 +] "GET /solr/core0/admin/schema.jsp 
> HTTP/1.1" 200 26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] "GET 
> /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1" 304 0 
I don't think that's 
> going to help much :)
> Date: Fri, 23 Apr 2010 06:04:34 -0700
> 
> From: 
> href="mailto:otis_gospodne...@yahoo.com";>otis_gospodne...@yahoo.com
> 
> Subject: Re: Problem with pdf, upgrading Cell
> To: 
> ymailto="mailto:solr-user@lucene.apache.org"; 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
> 
> 
> Marc, got anything in your logs?
> 
>  Otis
> 
> 
> Sematext :: 
> >http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem 
> search :: 
> >http://search-lucene.com/
> 
> 
> 
> - Original 
> Message 
> > From: Marc Ghorayeb <
> ymailto="mailto:dekay...@hotmail.com"; 
> href="mailto:dekay...@hotmail.com";>dekay...@hotmail.com>
> > To: 
> 
> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
> 
> > Sent: Fri, April 23, 2010 8:42:53 AM
> > Subject: Problem with 
> pdf, upgrading Cell
> > 
> > 
> Hello,
> I 
> configured a Solr server to be able to extract data from various 
> > 
> documents, including pdfs. Unfortunately, the data extraction fails on 
> several 
> 
> > pdfs. I have read around here that this may be due to the old Tika 
> library being 
> > used?I looked around and saw that the svn had a 
> newer version so i checked out 
> > the trunk, and built it using ant 
> dist, and ant example.I then set up my schema 
> > in the newly built 
> server, and inserted the library from the newly built cell 
> > into 
> the lib directory (in solr's home). However, now all i get is a blank 
> 
> > response... The indexing works, but it doesn't extract anything, only the 
> 
> > literal values that i pass on are indexed.
> Any help would 
> be greatly 
> > appreciated!! :)
> Thank you.
> Marc 
> Ghorayeb
> >
> 
> >  
> > 
> 
> _
> 
> Hotmail 
> > arrive sur votre téléphone ! Compatible Iphone, Windo

What hardware do I need ?

2010-04-23 Thread Xavier Schepler

Hi,

I'm working with Solr 1.4.
My schema has about 50 fields.
I'm using full text search in short strings (~ 30-100 terms) and 
facetted search.

My index will have 100 000 documents.

The number of requests per second will be low. Let's say between 0 and 
1000 because of auto-complete.


Is a standard server (3ghz proc, 4gb ram) with the client application 
(apache + php5 + ZF + apc) and Tomcat + Solr enough ???

Do I need more hardware ?

Thanks in advance,

Xavier S.




Re: Comparing two queries

2010-04-23 Thread Otis Gospodnetic
Hello Gert,

I think you'd have to apply custom heuristics that involves looking at top N 
hits for each query and looking at the % overlap.

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Villemos, Gert" 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 10:20:54 AM
> Subject: Comparing two queries
> 
> We want to support that a user can register for interest in 
> information,
based on a query he has defined himself. For example that he 
> type in a
query, press a save button, provides his email and the system will 
> now
email him with a daily digest.



As part of this, it would 
> be nice to be able to tell the user that the
same / a similar query are 
> already being monitored by another user, as
the users will likely have the 
> same interests. I would therefore like to
evaluate whether two queries will 
> return (almost) the same set of
results.



But how can I 
> compare two queries to determine if they will return
(almost) the same set of 
> results?



Thanks,

Gert.



Please help Logica 
> to respect the environment by not printing this email  / Pour contribuer 
> comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail 
> /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica 
> dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a respeitar o 
> ambiente nao imprimindo este correio electronico.



This e-mail and 
> any attachment is for authorised use by the intended recipient(s) only. It 
> may 
> contain proprietary material, confidential information and/or be subject to 
> legal privilege. It should not be copied, disclosed to, retained or used by, 
> any 
> other party. If you are not an intended recipient then please promptly delete 
> this e-mail and any attachment and all copies and inform the sender. Thank 
> you.


Multiple query searches in one request

2010-04-23 Thread phoey

Hi there,

Is it possible to do a search more than once, where only the filter query
changes. The response is the three different search results.

We want a page which shows a "clustered" view of 5 of each of the three
types (images, news articles, editorial articles), ordered by their score.

One possibility is doing three seperate solr search requests, but its not
really a neat solution. 

One answer could be making a custom request handler, could that be possible
to solve this issue? Could you give me some pointers on how to implement
one?

thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-query-searches-in-one-request-tp745827p745827.html
Sent from the Solr - User mailing list archive at Nabble.com.


Comparing two queries

2010-04-23 Thread Villemos, Gert
We want to support that a user can register for interest in information,
based on a query he has defined himself. For example that he type in a
query, press a save button, provides his email and the system will now
email him with a daily digest.

 

As part of this, it would be nice to be able to tell the user that the
same / a similar query are already being monitored by another user, as
the users will likely have the same interests. I would therefore like to
evaluate whether two queries will return (almost) the same set of
results.

 

But how can I compare two queries to determine if they will return
(almost) the same set of results?

 

Thanks,

Gert.



Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Seems like i'm not the only one with this "no extraction" 
problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
 he tried the same thing, building from the trunk, and indexing a pdf, and no 
extraction occured... Strange.
Marc G.
  
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Seems like i'm not the only one with this "no extraction" 
problem:http://www.mail-archive.com/solr-user@lucene.apache.org/msg33609.htmlApparently
 he tried the same thing, building from the trunk, and indexing a pdf, and no 
extraction occured... Strange.
Marc G.

> From: dekay...@hotmail.com
> To: solr-user@lucene.apache.org
> Subject: RE: Problem with pdf, upgrading Cell
> Date: Fri, 23 Apr 2010 15:12:39 +0200
> 
> 
> I'm launching it with the start.jar utility, and there doesn't seem to be 
> anything weird inside the console when i upload a pdf. Is there a way to 
> output the console to a log file? The only log file that get's updated is a 
> log file in the logs directory, and it seems to only show the input/ouput of 
> the web requests (get and posts...).
> for example:127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] "GET 
> /solr/core0/admin/luke?show=schema&wt=json HTTP/1.1" 200 21690 127.0.0.1 -  - 
>  [23/Apr/2010:13:06:47 +] "GET /solr/core0/admin/luke?wt=json HTTP/1.1" 
> 200 780 127.0.0.1 -  -  [23/Apr/2010:13:06:57 +] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdf&literal.title=lucidworks-solr-refguide-1.4.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  HTTP/1.1" 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdf&literal.title=mysql-proxy-en.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] "POST 
> /solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdf&literal.title=python-cheat-sheet-v1.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
>  HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] "POST 
> /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 -  -  [23/Apr/2010:13:07:00 
> +] "POST /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 -  -  
> [23/Apr/2010:13:07:05 +] "GET /solr/core0/admin/schema.jsp HTTP/1.1" 200 
> 26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] "GET 
> /solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1" 304 0 
> I don't think that's going to help much :)
> > Date: Fri, 23 Apr 2010 06:04:34 -0700
> > From: otis_gospodne...@yahoo.com
> > Subject: Re: Problem with pdf, upgrading Cell
> > To: solr-user@lucene.apache.org
> > 
> > Marc, got anything in your logs?
> > 
> >  Otis
> > 
> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem search :: http://search-lucene.com/
> > 
> > 
> > 
> > - Original Message 
> > > From: Marc Ghorayeb 
> > > To: solr-user@lucene.apache.org
> > > Sent: Fri, April 23, 2010 8:42:53 AM
> > > Subject: Problem with pdf, upgrading Cell
> > > 
> > > 
> > Hello,
> > I configured a Solr server to be able to extract data from various 
> > > documents, including pdfs. Unfortunately, the data extraction fails on 
> > > several 
> > > pdfs. I have read around here that this may be due to the old Tika 
> > > library being 
> > > used?I looked around and saw that the svn had a newer version so i 
> > > checked out 
> > > the trunk, and built it using ant dist, and ant example.I then set up my 
> > > schema 
> > > in the newly built server, and inserted the library from the newly built 
> > > cell 
> > > into the lib directory (in solr's home). However, now all i get is a 
> > > blank 
> > > response... The indexing works, but it doesn't extract anything, only the 
> > > literal values that i pass on are indexed.
> > Any help would be greatly 
> > > appreciated!! :)
> > Thank you.
> > Marc Ghorayeb 
> > > 
> > >   
> > > 
> > _
> > Hotmail 
> > > arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
> > > Blackberry, 
> > > …
> > 
> > > >http://www.messengersurvotremobile.com/?d=Hotmail
> > 
> 
> _

RE: Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

I'm launching it with the start.jar utility, and there doesn't seem to be 
anything weird inside the console when i upload a pdf. Is there a way to output 
the console to a log file? The only log file that get's updated is a log file 
in the logs directory, and it seems to only show the input/ouput of the web 
requests (get and posts...).
for example:127.0.0.1 -  -  [23/Apr/2010:13:06:47 +] "GET 
/solr/core0/admin/luke?show=schema&wt=json HTTP/1.1" 200 21690 127.0.0.1 -  -  
[23/Apr/2010:13:06:47 +] "GET /solr/core0/admin/luke?wt=json HTTP/1.1" 200 
780 127.0.0.1 -  -  [23/Apr/2010:13:06:57 +] "POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Clucidworks-solr-refguide-1.4.pdf&literal.title=lucidworks-solr-refguide-1.4.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Flucidworks-solr-refguide-1.4.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
 HTTP/1.1" 200 41 127.0.0.1 -  -  [23/Apr/2010:13:06:58 +] "POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cmysql-proxy-en.pdf&literal.title=mysql-proxy-en.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fmysql-proxy-en.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
 HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:06:59 +] "POST 
/solr/core0/update/extract?literal.id=C%3A%5CDocuments+and+Settings%5CM1B%5Cworkspace%5C3DS_FileIndexer%5Ctest%5Cpython-cheat-sheet-v1.pdf&literal.title=python-cheat-sheet-v1.pdf&literal.url=http%3A%2F%2Fwww.3ds.com%2Fpython-cheat-sheet-v1.pdf&literal.appKey=media&literal.type=document&literal.siteHash=53e446a6b81860dcfa1cc2fef4ef976b&literal.group=portal&literal.group=var&literal.group=0&literal.group=caa_gold&literal.group=caa_partner&literal.group=ag12&literal.group=ag17&wt=javabin&version=1
 HTTP/1.1" 200 44 127.0.0.1 -  -  [23/Apr/2010:13:07:00 +] "POST 
/solr/core0/update HTTP/1.1" 200 41 127.0.0.1 -  -  [23/Apr/2010:13:07:00 
+] "POST /solr/core0/update HTTP/1.1" 200 41 127.0.0.1 -  -  
[23/Apr/2010:13:07:05 +] "GET /solr/core0/admin/schema.jsp HTTP/1.1" 200 
26395 127.0.0.1 -  -  [23/Apr/2010:13:07:05 +] "GET 
/solr/core0/admin/jquery-1.2.3.min.js HTTP/1.1" 304 0 
I don't think that's going to help much :)
> Date: Fri, 23 Apr 2010 06:04:34 -0700
> From: otis_gospodne...@yahoo.com
> Subject: Re: Problem with pdf, upgrading Cell
> To: solr-user@lucene.apache.org
> 
> Marc, got anything in your logs?
> 
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
> > From: Marc Ghorayeb 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, April 23, 2010 8:42:53 AM
> > Subject: Problem with pdf, upgrading Cell
> > 
> > 
> Hello,
> I configured a Solr server to be able to extract data from various 
> > documents, including pdfs. Unfortunately, the data extraction fails on 
> > several 
> > pdfs. I have read around here that this may be due to the old Tika library 
> > being 
> > used?I looked around and saw that the svn had a newer version so i checked 
> > out 
> > the trunk, and built it using ant dist, and ant example.I then set up my 
> > schema 
> > in the newly built server, and inserted the library from the newly built 
> > cell 
> > into the lib directory (in solr's home). However, now all i get is a blank 
> > response... The indexing works, but it doesn't extract anything, only the 
> > literal values that i pass on are indexed.
> Any help would be greatly 
> > appreciated!! :)
> Thank you.
> Marc Ghorayeb 
> > 
> >   
> > 
> _
> Hotmail 
> > arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, 
> > …
> 
> > >http://www.messengersurvotremobile.com/?d=Hotmail
> 
  
_
Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans 
HOTMAIL !
http://www.windowslive.fr/hotmail/agregation/

Re: Problem with pdf, upgrading Cell

2010-04-23 Thread Otis Gospodnetic
Marc, got anything in your logs?

 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: Marc Ghorayeb 
> To: solr-user@lucene.apache.org
> Sent: Fri, April 23, 2010 8:42:53 AM
> Subject: Problem with pdf, upgrading Cell
> 
> 
Hello,
I configured a Solr server to be able to extract data from various 
> documents, including pdfs. Unfortunately, the data extraction fails on 
> several 
> pdfs. I have read around here that this may be due to the old Tika library 
> being 
> used?I looked around and saw that the svn had a newer version so i checked 
> out 
> the trunk, and built it using ant dist, and ant example.I then set up my 
> schema 
> in the newly built server, and inserted the library from the newly built cell 
> into the lib directory (in solr's home). However, now all i get is a blank 
> response... The indexing works, but it doesn't extract anything, only the 
> literal values that i pass on are indexed.
Any help would be greatly 
> appreciated!! :)
Thank you.
Marc Ghorayeb 
> 
>   
> 
_
Hotmail 
> arrive sur votre téléphone ! Compatible Iphone, Windows Phone, Blackberry, 
> …

> >http://www.messengersurvotremobile.com/?d=Hotmail



Problem with pdf, upgrading Cell

2010-04-23 Thread Marc Ghorayeb

Hello,
I configured a Solr server to be able to extract data from various documents, 
including pdfs. Unfortunately, the data extraction fails on several pdfs. I 
have read around here that this may be due to the old Tika library being used?I 
looked around and saw that the svn had a newer version so i checked out the 
trunk, and built it using ant dist, and ant example.I then set up my schema in 
the newly built server, and inserted the library from the newly built cell into 
the lib directory (in solr's home). However, now all i get is a blank 
response... The indexing works, but it doesn't extract anything, only the 
literal values that i pass on are indexed.
Any help would be greatly appreciated!! :)
Thank you.
Marc Ghorayeb 
_
Hotmail arrive sur votre téléphone ! Compatible Iphone, Windows Phone, 
Blackberry, …
http://www.messengersurvotremobile.com/?d=Hotmail

Questions on autocommit and optimize operations

2010-04-23 Thread dipti khullar
Hi Solr Gurus

We are thinking about optimizing our production master slave solr setup,
just wanted to poll the group on following questions:

1. Currently we are using autocommit feature with setting of 50 docs and 5
mins. Now the requirement is to reduce this time. So we are analyzing the
situation where we will use the time based feature of autocommit. The time
to autocommit will be *1 min*.
Can anyone think of any disadvantages this change can have on index? Is it
possible that autocommit process itself takes more that 1 min?

2. We want to trace the average time it takes to perform commit operation.
Right now on production we have Lucid Solr 1.4 on master/slaves but we are
still using old script based replication method. But we will be moving to
new JAVA based replication soon, hence want to focus more on autocommit and
the time it takes to commit the data. So, how to trace back the logs of
autocommit? Does autocommit executes commit script present under bin folder?

3. What should be the optimum time for optimizing the data? After going
through some posts like -
http://www.mail-archive.com/solr-user@lucene.apache.org/msg10920.html. it
makes sense to optimize the data infrequently.
How to configure this in 1.4? Currently we optimize using optimize script
twice a day. Also, can there be a situation where the optimize can conflict
with commit operation? If yes, then how to avoid such kind of situation.

Many Thanks & Regards
Dipti Khullar


Re: Best way to prevent this search lockup (apparently caused during big segment merges)?

2010-04-23 Thread Michael McCandless
I don't know much about how Solr does its locking, so I'm guessing below:

It looks like one thread is doing a commit, by closing the writer, and
is likely holding a lock that prevents other (add/delete) ops from
running? Probably this lock is held because the writer is in the
process of being closed, and on close, the write waits for running
merges to complete, so it can take a very long time if a large merge
is running.

And then your while loop keeps using up another of the 200 threads,
blocking on the add/delete request.

I think Solr could, instead, call IndexWriter.finishMerges, without
holding the lock, and then perhaps IndexWriter.close(false), which
would be fast (ie, aborts any running merges, for the race condition
where another merge just started after finishMerges and before close).
 Alternatively, Solr could call IndexWriter.commit, not
IndexWriter.close, and not hold the lock that prevents add/deletes
(but maybe there are other reasons why the IW must be closed?).

Maybe Solr should also have a way to restrict the max # threads to be
used for pending add/delete ops, so that there are always thread free
in the app server's pool for searching?

Or... maybe you could drastically increase the timeout on your client
side HTTP connections?  Or, is there some way to check how many
threads are "tied up" in Solr and block your add/delete requests when
this gets too large...?

Mike

On Thu, Apr 22, 2010 at 6:28 PM, Chris Harris  wrote:
> I'm running Solr 1.4+ under Tomcat 6, with indexing and searching
> requests simultaneously hitting the same Solr machine. Sometimes Solr,
> Tomcat, and my (C#) indexing process conspire to render search
> inoperable. So far I've only noticed this while big segment merges
> (i.e. merges that take multiple minutes) are taking place.
>
> Let me explain the situation as best as I understand it.
>
> My indexer has a main loop that looks roughly like this:
>
>  while true:
>    try:
>      submit a new add or delete request to Solr via HTTP
>    catch timeoutException:
>      sleep a few seconds
>
> When things are going wrong (i.e., when a large segment merge is
> happening), this loop is problematic:
>
> * When the indexer's request hits Solr, then the corresponding thread
> in Tomcat blocks. (It looks to me like the thread is destined to block
> until the entire merge is complete. I'll paste in what the Java stack
> traces look like at the end of the message if they can help diagnose
> things.)
> * Because the Solr thread stays blocked for so long, eventually the
> indexer hits a timeoutException. (That is, it gives up on Solr.)
> * Hitting the timeout exception doesn't cause the corresponding Tomcat
> thread to die or unblock. Therefore, each time through the loop,
> another Solr-handling thread inside Tomcat enters a blocked state.
> * Eventually so many threads (maxThreads, whose Tomcat default is 200)
> are blocked that Tomcat starts rejecting all new Solr HTTP requests --
> including those coming in from the web tier.
> * Users are unable to search. The problem might self-correct once the
> merge is complete, but that could be quite a while.
>
> What are my options for changing Solr settings or changing my indexing
> process to avoid this lockup scenario? Do you agree that the segment
> merge is helping cause the lockup? Do adds and deletes really need to
> block on segment merges?
>
> Partial thread dumps follow, showing example add and delete threads
> that are blocked. Also the active Lucene Merge Thread, and the thread
> that kicked off the merge.
>
> [doc deletion thread, waiting for DirectUpdateHandler2.iwCommit.lock()
> to return]
> "http-1800-200" daemon prio=6 tid=0x0a58cc00 nid=0x1028
> waiting on condition [0x0f9ae000..0x0f9afa90]
>   java.lang.Thread.State: WAITING (parking)
>        at sun.misc.Unsafe.park(Native Method)
>        - parking to wait for  <0x00016d801ae0> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
>        at java.util.concurrent.locks.LockSupport.park(Unknown Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown
> Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Unknown
> Source)
>        at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(Unknown
> Source)
>        at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(Unknown
> Source)
>        at 
> org.apache.solr.update.DirectUpdateHandler2.deleteByQuery(DirectUpdateHandler2.java:320)
>        at 
> org.apache.solr.update.processor.RunUpdateProcessor.processDelete(RunUpdateProcessorFactory.java:71)
>        at org.apache.solr.handler.XMLLoader.processDelete(XMLLoader.java:234)
>        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:180)
>        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>        at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(C

Re: Solr full-import not working as expected

2010-04-23 Thread MitchK

Saratv,

is there any unique-ID (defined in your schema.xml) that may be duplicate?

- Mitch


saratv wrote:
> 
> I am trying to use DIH (where database has around 93k rows..from different
> tables), and when i ran full import few times, only 91k documents were
> indexed (not sure why and what documents were unindexed)..is there a way
> to find what went wrong as i am unable to see any errors in log files.
> Also  is there a way to fix the problem and get all of those 93k docs
> (also i checked the database and saw there are no duplicates). Please 
> respond me if anyone has seen a similar behaviour. Appreciate your input.
> 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-full-import-not-working-as-expected-tp744937p745102.html
Sent from the Solr - User mailing list archive at Nabble.com.