RE: Filter by relevance

2010-11-04 Thread Jason Brown
I have a dismax query where I check for values in 3 fields against documents in 
the index - a title, a list of keyword tags and then full-text of the document.

I usually get lots of results and I can see that the first results are OK - 
it's giving precedence to titles and tag matches, as my dismax boosts on title 
and keywords (normal boost and phrase boost).

After say 20/30 good results I start to get matches based upon just the 
full-text, so these are less relevant. 

I am also facet.couting on my keyword tags (and presenting in the results as a 
way of filtering) and as you can imagine the counts are high because of the 
number of overall results. I want to somehow make the facet counts more 
associated with the higher relevancy results.

My options as I see it are - 

1) exclude full-text from the dismax altogether
2) configure the dismax normal boost on full-text to zero, but phrase boost to 
something higher (the aim here is to only really get a hit on the full-text if 
my search term is foound as a phrase in the full-text)
3) limit my results by relevancy or number of results

If I do (3) above will the facet.counts respect the lower number of results - 
this is the overall aim really.

Thank You

Jason.


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wed 03/11/2010 23:15
To: solr-user@lucene.apache.org
Subject: Re: Filter by relevance
 
Be aware, though, that relevance isn't absolute, it's only interesting
#within# a query. And it's
then normed between 0 and 1. So picking a certain value is rarely doing
what you think it will.
Limiting to the top N docs is usually more reasonable

But this may be an XY problem. What is it you're trying to accomplish?
Perhaps if you
state the problem, some other suggestions may be in the offing

Best
Erick

On Wed, Nov 3, 2010 at 4:48 PM, Jason Brown jason.br...@sjp.co.uk wrote:

 Is it possible to filter my search results by relevance? For example,
 anything below a certain value shouldn't be returned?

 I also retrieve facet counts in my search queries, so it would be useful if
 the facet counts also respected the filter on the relevance.

 Thank You.

 Jason.

 If you wish to view the St. James's Place email disclaimer, please use the
 link below

 http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer



If you wish to view the St. James's Place email disclaimer, please use the link 
below

http://www.sjp.co.uk/portal/internet/SJPemaildisclaimer


mergeFactor questions

2010-11-04 Thread Tommaso Teofili
Hi all,
Having read the SolrPerformanceFactors wiki page [1], I'd still need a
couple of clarifications about mergeFactor (I am using version 1.4.1) so if
anyone can help it would be nice.

   - Is mergeFactor a one time configuration setting that is considered only
   when creating the index for the first time or can it be adjusted later even
   with some docs inside the index? e.g. I have mF to 10 then I realize I want
   quicker searches and I set it to 2 so that at the next optimize/commit I
   will have no more than 2 segments. My understanding is that one can adjust
   mF over time, is it right?
   - In a replicated environment does it make sense to define different
   mergeFactors on master and slave? I'd say no since it influences the number
   of segments created, that being a concern of who actually index documents
   (the master) not of who receives (segments of) index, but please correct me
   if I am wrong.

Thanks for your help,
Regards,
Tommaso

[1] : http://wiki.apache.org/solr/SolrPerformanceFactors


Re: Negative or zero value for fieldNorm

2010-11-04 Thread Markus Jelsma
Hi,

I've worked around the issue by setting omitNorms=true on the title field. Now 
all fieldNorm values are 1.0f and therefore do not mess up my scores anymore. 
This, of course, is hardly a solution even though i currently do not use 
index-time boosts on any field.

The question remains, why does the title field return a fieldNorm=0 for many 
queries? And a subquestion, does the luke request handler return boost values 
for documents? I know i get boost values for fields but i haven't seen boost 
values for documents. 

Cheers,

On Wednesday 03 November 2010 20:44:48 Markus Jelsma wrote:
  Regarding Negative or zero value for fieldNorm, I don't see any
  negative fieldNorms here... just very small positive ones?
 
 Of course, you're right. The E-# got twisted in my mind and became
 negative. Silly me.
 
  Anyway the fieldNorm is the product of the lengthNorm and the
  index-time boost of the field (which is itself the product of the
  index time boost on the document and the index time boost of all
  instances of that field).  Index time boosts default to 1 though, so
  they have no effect unless something has explicitly set a boost.
 
 I've just checked docs 7 and 1462 (resp. first and second in debug output
 below) with Luke. The title and content fields have no index time boosts,
 thus defaulting to 1.0f which is fine.
 
 Then, why does doc 7 have a fieldNorm of 0.0 on title (and so setting a 0.0
 score on the doc in the result set) and does doc 1462 have a very very
 small fieldNorm?
 
 debugOutput for doc 7:
 0.0 = fieldNorm(field=title, doc=7)
 
 Luke on the title field of doc 7.
 float name=boost1.0/float
 
 Thanks for your reply!
 
  -Yonik
  http://www.lucidimagination.com
  
  
  
  On Wed, Nov 3, 2010 at 2:30 PM, Markus Jelsma
  
  markus.jel...@openindex.io wrote:
   Hi all,
   
   I've got some puzzling issue here. During tests i noticed a document at
   the bottom of the results where it should not be. I query using DisMax
   on title and content field and have a boost on title using qf. Out of
   30 results, only two documents also have the term in the title.
   
   Using debugQuery and fl=*,score i quickly noticed large negative
   maxScore of the complete resultset and a portion of the resultset
   where scores sum up to zero because of a product with 0 (fieldNorm).
   
   See below for debug output for a result with score = 0:
   
   0.0 = (MATCH) sum of:
0.0 = (MATCH) max of:
  0.0 = (MATCH) weight(content:kunstgrasveld in 7), product of:
0.75658196 = queryWeight(content:kunstgrasveld), product of:
  6.6516633 = idf(docFreq=33, maxDocs=9682)
  0.113743275 = queryNorm

0.0 = (MATCH) fieldWeight(content:kunstgrasveld in 7), product of:
  2.236068 = tf(termFreq(content:kunstgrasveld)=5)
  6.6516633 = idf(docFreq=33, maxDocs=9682)
  0.0 = fieldNorm(field=content, doc=7)
  
  0.0 = (MATCH) fieldWeight(title:kunstgrasveld in 7), product of:
1.0 = tf(termFreq(title:kunstgrasveld)=1)
8.791729 = idf(docFreq=3, maxDocs=9682)
0.0 = fieldNorm(field=title, doc=7)
   
   And one with a negative score:
   
   3.0716116E-4 = (MATCH) sum of:
3.0716116E-4 = (MATCH) max of:
  3.0716116E-4 = (MATCH) weight(content:kunstgrasveld in 1462),
  product
   
   of: 0.75658196 = queryWeight(content:kunstgrasveld), product of:
   6.6516633 = idf(docFreq=33, maxDocs=9682)
   
  0.113743275 = queryNorm

4.059853E-4 = (MATCH) fieldWeight(content:kunstgrasveld in 1462),
   
   product of:
  1.0 = tf(termFreq(content:kunstgrasveld)=1)
  6.6516633 = idf(docFreq=33, maxDocs=9682)
  6.1035156E-5 = fieldNorm(field=content, doc=1462)
   
   There are no funky issues with term analysis for the text fieldType, in
   fact, the term passes through unchanged. I don't do omitNorms, i store
   termVectors etc.
   
   Because fieldNorm = fieldBoost / sqrt(numTermsForField) i suspect my
   input from Nutch is messed up. A fieldNorm can never be = 0 for a
   normal positive boost and field boosts should not be zero or negative
   (correct me if i'm wrong). But, since i can't yet figure out what field
   boosts Nutch sends to me i thought i'd drop by on this mailing list
   first.
   
   There are quite a few query terms that return with zero or negative
   scores and many that behave as i expect. I find it also a bit hard to
   comprehend why the docs with negative score rank higher in the result
   set than documents with zero score. Sorting defaults to score DESC, 
   but this is perhaps another issue.
   
   Anyway, the test runs on a Solr 1.4.1 instance with Java 6 under the
   hood. Help or directions are appreciated =)
   
   Cheers,
   
   --
   Markus Jelsma - CTO - Openindex
   http://www.linkedin.com/in/markus17
   050-8536600 / 06-50258350

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


ContentStreamDataSource

2010-11-04 Thread Theodor Tolstoy
Hi!
I am trying to get the ContentStreamDataSource to work properly , but there are 
not many examples out there.

What I have done is that  I have made a copy of my HttpDataSource config and 
replaced the dataSource type=HttpDataSource  with dataSource type= 
ContentStreamDataSource 

If understand everything correctly I should be able to use the same URL syntax 
as with HttpDataSource and supply the XML file as  post data.

I have tried to post data - both as binary, file and string to the URL, but 
nothing happens.


This is the log file:
2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DataImporter 
doFullImport
INFO: Starting Full Import
2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.SolrWriter 
readIndexerProperties
VARNING: Unable to read: datapush.properties
2010-nov-04 12:32:17 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:0.0
2010-nov-04 12:32:17 org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/datapush 
params={clean=falseentity=suLIBRIScommand=full-import} status=0 QTime=0


What am I doing wrong?

Regards
Theodor Tolstoy
Developer Stockholm university library



querying multiple fields as one

2010-11-04 Thread Tommaso Teofili
Hi all,
having two fields named 'type' and 'cat' with identical type and options,
but different values recorded, would it be possible to query them as they
were one field?
For instance
 q=type:electronics cat:electronics
should return same results as
 q=common:electronics
I know I could make it defining a third field 'common' with copyFields from
'type' and 'cat' to 'common' but this wouldn't be feasible if you've already
lots of documents in your index and don't want to reindex everything, isn't
it?
Any suggestions?
Thanks in advance,
Tommaso


Re: querying multiple fields as one

2010-11-04 Thread Ken Stanley
On Thu, Nov 4, 2010 at 8:21 AM, Tommaso Teofili
tommaso.teof...@gmail.comwrote:

 Hi all,
 having two fields named 'type' and 'cat' with identical type and options,
 but different values recorded, would it be possible to query them as they
 were one field?
 For instance
  q=type:electronics cat:electronics
 should return same results as
  q=common:electronics
 I know I could make it defining a third field 'common' with copyFields from
 'type' and 'cat' to 'common' but this wouldn't be feasible if you've
 already
 lots of documents in your index and don't want to reindex everything, isn't
 it?
 Any suggestions?
 Thanks in advance,
 Tommaso


Tommaso,

If re-indexing is not feasible/preferred, you might try looking into
creating a dismax handler that should give you what you're looking for in
your query: http://wiki.apache.org/solr/DisMaxQParserPlugin. The same
solrconfig.xml that comes with SOLR has a dismax parser that you can modify
to your needs.

- Ken Stanley


Re: querying multiple fields as one

2010-11-04 Thread Erick Erickson
Ken's suggestion to look at dismax is a good one, but I have
a question
q=type:electronics cat:electronics

should do what you want assuming your default operator
is OR.  Is it failing? Or is the real question how you can
do this automatically?

I'd expect the ranking to be a bit different, but I'm guessing
that's not a big issue

Best
Erick

On Thu, Nov 4, 2010 at 8:21 AM, Tommaso Teofili
tommaso.teof...@gmail.comwrote:

 Hi all,
 having two fields named 'type' and 'cat' with identical type and options,
 but different values recorded, would it be possible to query them as they
 were one field?
 For instance
  q=type:electronics cat:electronics
 should return same results as
  q=common:electronics
 I know I could make it defining a third field 'common' with copyFields from
 'type' and 'cat' to 'common' but this wouldn't be feasible if you've
 already
 lots of documents in your index and don't want to reindex everything, isn't
 it?
 Any suggestions?
 Thanks in advance,
 Tommaso



Re: mergeFactor questions

2010-11-04 Thread Shawn Heisey

On 11/4/2010 3:27 AM, Tommaso Teofili wrote:

- Is mergeFactor a one time configuration setting that is considered only
when creating the index for the first time or can it be adjusted later even
with some docs inside the index? e.g. I have mF to 10 then I realize I want
quicker searches and I set it to 2 so that at the next optimize/commit I
will have no more than 2 segments. My understanding is that one can adjust
mF over time, is it right?


The mergeFactor is applied anytime documents are added to the index, not 
just when it is built for the first time.  You can adjust it later, and 
reload the core or restart Solr.  It will apply to any additional 
indexing from that point forward.


With a mergeFactor of 10, having 21 segments (and more) temporarily on 
the disk at the same time is reasonably possible.  I know this applies 
if you are doing a continuous large insert, not sure if you are doing 
several small inserts separately. These segments are:


* The small segment that is being built right now.
* The previous 10 small segments.
* The merged segment being created from those above.
* The previous 9 merged segments.

If it takes a really long time to merge the last 10 small segments and 
then merge the 10 large segments into an even larger segment, you can 
end up with even more small segments from your continuous insert.  If it 
should take long enough that you actually get 10 more new small 
segments, the large merge will pause while it completes the small 
merge.  I saw this happen recently when I decided to see what happens if 
I built a single shard from our entire database.  It took a really long 
time, partly from that super-merge and the optimize that happened later, 
and took up 85GB of disk space.


I'm not really sure what happens if you have this continue beyond a 
single super-merge like I have mentioned.



- In a replicated environment does it make sense to define different
mergeFactors on master and slave? I'd say no since it influences the number
of segments created, that being a concern of who actually index documents
(the master) not of who receives (segments of) index, but please correct me
if I am wrong.


Because it only applies when indexes are being built, it has no meaning 
on a slave, which as you said, just copies the data from the master.


Shawn



Re: Negative or zero value for fieldNorm

2010-11-04 Thread Yonik Seeley
On Thu, Nov 4, 2010 at 8:04 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
 The question remains, why does the title field return a fieldNorm=0 for many
 queries?

Because the index-time boost was set to 0 when the doc was indexed.  I
can't say how that happened... look to your indexing code.

 And a subquestion, does the luke request handler return boost values
 for documents? I know i get boost values for fields but i haven't seen boost
 values for documents.

The doc boost is just multiplied into each field boost and doesn't
have a separate representation in the index.

-Yonik
http://www.lucidimagination.com


Re: Problem escaping question marks

2010-11-04 Thread Jean-Sebastien Vachon
Have you tried encoding it with %3F?

firstname:*%3F*

On 2010-11-04, at 1:44 AM, Stephen Powis wrote:

 I'm having difficulty properly escaping ? in my search queries.  It seems as
 tho it matches any character.
 
 Some info, a simplified schema and query to explain the issue I'm having.
 I'm currently running solr1.4.1
 
 Schema:
 
 field name=id type=sint indexed=true stored=true required=true /
 field name=first_name type=string indexed=true stored=true
 required=false /
 
 I want to return any first name with a Question Mark in it
 Query: first_name: *\?*
 
 Returns all documents with any character in it.
 
 Can anyone lend a hand?
 Thanks!
 Stephen



Re: Problem escaping question marks

2010-11-04 Thread Robert Muir
On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis stephen.po...@pardot.com wrote:
 I want to return any first name with a Question Mark in it
 Query: first_name: *\?*


There is no way to escape the metacharacters * or ? for a wildcard
query (regardless of queryparser, even if you write your own).
See https://issues.apache.org/jira/browse/LUCENE-588

Its something we could fix, but in all honesty it seems one reason it
isn't fixed is because the bug is so old, yet there hasn't really been
any indication of demand for such a thing...


Re: Negative or zero value for fieldNorm

2010-11-04 Thread Markus Jelsma
I've done some testing with the example docs and it behaves similar when there 
is a zero doc boost. Luke, however, does not show me the index-time boosts. 
Bost document and field boosts are not visible in Luke's output. I've changed 
doc boost and field boosts for the mp500.xml document but all i ever see 
returned is boost=1.0. Is this correct?

Anyway, i'm looking at Nutch now for reasons why i sends a zero boost on a 
docuement.

On Thursday 04 November 2010 14:16:22 Yonik Seeley wrote:
 On Thu, Nov 4, 2010 at 8:04 AM, Markus Jelsma
 
 markus.jel...@openindex.io wrote:
  The question remains, why does the title field return a fieldNorm=0 for
  many queries?
 
 Because the index-time boost was set to 0 when the doc was indexed.  I
 can't say how that happened... look to your indexing code.
 
  And a subquestion, does the luke request handler return boost values
  for documents? I know i get boost values for fields but i haven't seen
  boost values for documents.
 
 The doc boost is just multiplied into each field boost and doesn't
 have a separate representation in the index.
 
 -Yonik
 http://www.lucidimagination.com

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Re: Negative or zero value for fieldNorm

2010-11-04 Thread Yonik Seeley
On Thu, Nov 4, 2010 at 9:51 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
 I've done some testing with the example docs and it behaves similar when there
 is a zero doc boost. Luke, however, does not show me the index-time boosts.

Remember that the norm is a product of the length norm and the index
time boost... it's recorded as a single number in the index.

 Bost document and field boosts are not visible in Luke's output. I've changed
 doc boost and field boosts for the mp500.xml document but all i ever see
 returned is boost=1.0. Is this correct?

Perhaps you still have omitNorms=true for the field you are querying?

-Yonik
http://www.lucidimagination.com


Re: Negative or zero value for fieldNorm

2010-11-04 Thread Markus Jelsma
On Thursday 04 November 2010 15:12:23 Yonik Seeley wrote:
 On Thu, Nov 4, 2010 at 9:51 AM, Markus Jelsma
 
 markus.jel...@openindex.io wrote:
  I've done some testing with the example docs and it behaves similar when
  there is a zero doc boost. Luke, however, does not show me the
  index-time boosts.
 
 Remember that the norm is a product of the length norm and the index
 time boost... it's recorded as a single number in the index.

Yes.

  Bost document and field boosts are not visible in Luke's output. I've
  changed doc boost and field boosts for the mp500.xml document but all i
  ever see returned is boost=1.0. Is this correct?
 
 Perhaps you still have omitNorms=true for the field you are querying?

The example schema does not have omitNorms=true on the name, cat or features 
field.

 
 -Yonik
 http://www.lucidimagination.com

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


Updating Solr index - DIH delta vs. task queues

2010-11-04 Thread Andy
Hi,
I have data stored in a database that is being updated constantly. I need to 
find a way to update Solr index as data in the database is being updated.
There seems to be 2 main schools of thoughts on this:
1) DIH delta - query the database for all records that have a timestamp later 
than the last_index_time. Import those records for indexing to Solr
2) Task queue - every time a record is updated in the database, throw a task to 
a queue to index that record to Solr
Just want to know what are the pros and cons of each approach and what is your 
experience. For someone starting new, what'd be your recommendation?
ThanksAndy


  

Re: Updating Solr index - DIH delta vs. task queues

2010-11-04 Thread Ezequiel Calderara
I'm in the same scenario, so this answer would be helpful too..
I'm adding...

3) Web Service - Request a webservice for all the new data that has been
updated (can this be done?
On Thu, Nov 4, 2010 at 2:38 PM, Andy angelf...@yahoo.com wrote:

 Hi,
 I have data stored in a database that is being updated constantly. I need
 to find a way to update Solr index as data in the database is being updated.
 There seems to be 2 main schools of thoughts on this:
 1) DIH delta - query the database for all records that have a timestamp
 later than the last_index_time. Import those records for indexing to Solr
 2) Task queue - every time a record is updated in the database, throw a
 task to a queue to index that record to Solr
 Just want to know what are the pros and cons of each approach and what is
 your experience. For someone starting new, what'd be your recommendation?
 ThanksAndy







-- 
__
Ezequiel.

Http://www.ironicnet.com


Re: Optimize Index

2010-11-04 Thread Shawn Heisey

On 11/4/2010 7:22 AM, stockiii wrote:

how can i start an optimize by using DIH, but NOT after an delta- or
full-import ?


I'm not aware of a way to do this with DIH, though there might be 
something I'm not aware of.  You can do it with an HTTP POST.  Here's 
how to do it with curl:


/usr/bin/curl http://HOST:PORT/solr/CORE/update; \
-H Content-Type: text/xml \
--data-binary 'optimize waitFlush=true waitSearcher=true/'

Shawn



Using setStart in solrj

2010-11-04 Thread Olson, Ron
Hi all-

First, thanks to all the folks to have helped me so far getting the hang of 
Solr; I promise to give back when I think my contributions will be useful :)

I am at the point where I'm trying to return results back from a search in a 
war file, using Java with solrj. On the result page of the website I'd want to 
limit the actual results to probably around 20 or so, with the usual next/prev 
page paradigm. The issue I've been wrestling with is keeping the SolrQuery 
object around so that I don't need to transmit the entire thing back to the 
client, especially if they search for something like truck, which could 
return a lot of results.

I was thinking that one solution would be to do a query.setRows(20); for the 
query, then return the results back with some sort of an identifier so that on 
subsequent queries, I could also include query.setStart(someCounter + 1); to 
get the next set of 20. In theory, that would work at the cost of having to 
re-execute the query.

I've been looking for information about setStart() and haven't found much more 
than Javadoc that says sets the starting row for the result set. My question 
is, how do I know what the starting row is? Maybe, based on the search 
parameters, it will always return the results in an implicit order in which 
case is it just like executing a fixed query in a database and then grabbing 
the next 20 rows from the result set? Because the user would be pressing the 
prev/next buttons, even though the query is being re-executed, the parameters 
would not be changing.

That's the theory, anyway. It seems excessive to keep executing the same query 
over and over again just because the user wants to see the next set of results, 
especially if the original SolrQuery object has them all, but maybe that's just 
what needs to be done, given the stateless nature of the web.

Any info on this method/strategy would be most appreciated.

Thanks,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Optimize Index

2010-11-04 Thread Rich Cariens
For what it's worth, the Solr class instructor at the Lucene Revolution
conference recommended *against* optimizing, and instead suggested to just
let the merge factor do it's job.

On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey s...@elyograg.org wrote:

 On 11/4/2010 7:22 AM, stockiii wrote:

 how can i start an optimize by using DIH, but NOT after an delta- or
 full-import ?


 I'm not aware of a way to do this with DIH, though there might be something
 I'm not aware of.  You can do it with an HTTP POST.  Here's how to do it
 with curl:

 /usr/bin/curl http://HOST:PORT/solr/CORE/update; \
 -H Content-Type: text/xml \
 --data-binary 'optimize waitFlush=true waitSearcher=true/'

 Shawn




Re: Optimize Index

2010-11-04 Thread Markus Jelsma
Huh? That's something new for me. Optmize removed documents that have been 
flagged for deletion. For relevancy it's important those are removed because 
document frequencies are not updated for deletes.

Did i miss something?

 For what it's worth, the Solr class instructor at the Lucene Revolution
 conference recommended *against* optimizing, and instead suggested to just
 let the merge factor do it's job.
 
 On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey s...@elyograg.org wrote:
  On 11/4/2010 7:22 AM, stockiii wrote:
  how can i start an optimize by using DIH, but NOT after an delta- or
  full-import ?
  
  I'm not aware of a way to do this with DIH, though there might be
  something I'm not aware of.  You can do it with an HTTP POST.  Here's
  how to do it with curl:
  
  /usr/bin/curl http://HOST:PORT/solr/CORE/update; \
  -H Content-Type: text/xml \
  --data-binary 'optimize waitFlush=true waitSearcher=true/'
  
  Shawn


Does DataImportHandler support Digest authentication

2010-11-04 Thread jayant

I need to connect to a RETS api through a http url. But the REST service uses
digest authentication. Can I use DataImportHandler to pass the credentials
for digest authentication?
Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-DataImportHandler-support-Digest-authentication-tp1844497p1844497.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does DataImportHandler support Digest authentication

2010-11-04 Thread jayant

I mean to say RESTful Apis.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-DataImportHandler-support-Digest-authentication-tp1844497p1844501.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Testing/packaging question

2010-11-04 Thread Bernhard Reiter
Hi, 

I'm now trying to 

export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml

and restarting tomcat (v6 package from ubuntu maverick) via

sudo /etc/init.d/tomcat6 restart

but solr still doesn't seem to find that schema.xml, as it complains
about unknown fields when running the tests that require that schema.xml

Can someone please tell me what I'm doing wrong -- and what I should be
doing?

TIA again,
Bernhard

Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
 Hi, 
 
 I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
 see
 http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/
 
 In order to run solrpy's supplied tests at build time, I'd need Solr to
 know about the schema.xml that comes with the tests.
 Can anyone tell me how do that properly? I'd basically need Solr to
 temporarily recognize that schema.xml without permanently installing it
 -- is there any way to do this, eg via environment variables?
 
 TIA
 Bernhard Reiter




Re: Problem escaping question marks

2010-11-04 Thread Stephen Powis
Looking at the JIRA issue, looks like there's been a new patch related to
this.  This is good news!  We've re-written a portion of our web app to use
Solr instead of mysql.  This part of our app allows clients to construct
rules to match data within their account, and automatically apply actions to
those matched data points.  So far our testing and then rollout has been
smooth, until we encountered the above rule/query.  I guess I assumed since
these metacharacters were escaped that they would be parsed correctly under
any type of query.

What is the likelihood of this being included in the next release/bug fix
version of Solr?  Are there docs available online with basic information
about rolling our own build of Solr that includes this patch?

I appreciate your help!
Thanks!
Stephen


On Thu, Nov 4, 2010 at 9:26 AM, Robert Muir rcm...@gmail.com wrote:

 On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis stephen.po...@pardot.com
 wrote:
  I want to return any first name with a Question Mark in it
  Query: first_name: *\?*
 

 There is no way to escape the metacharacters * or ? for a wildcard
 query (regardless of queryparser, even if you write your own).
 See https://issues.apache.org/jira/browse/LUCENE-588

 Its something we could fix, but in all honesty it seems one reason it
 isn't fixed is because the bug is so old, yet there hasn't really been
 any indication of demand for such a thing...



RE: Testing/packaging question

2010-11-04 Thread Olson, Ron
I believe it should point to the directory above, where conf and lib are 
located (though I have a multi-core setup).

Mine is set to:

/usr/local/jboss-5.1.0.GA/server/solr/solr_data/

And in solr_data the solr.xml defines the two cores, but in each core 
directory, is a conf, data, and lib directory, which contains the schema.xml.



-Original Message-
From: Bernhard Reiter [mailto:ock...@raz.or.at]
Sent: Thursday, November 04, 2010 3:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Testing/packaging question

Hi,

I'm now trying to

export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml

and restarting tomcat (v6 package from ubuntu maverick) via

sudo /etc/init.d/tomcat6 restart

but solr still doesn't seem to find that schema.xml, as it complains
about unknown fields when running the tests that require that schema.xml

Can someone please tell me what I'm doing wrong -- and what I should be
doing?

TIA again,
Bernhard

Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
 Hi,

 I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
 see
 http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/

 In order to run solrpy's supplied tests at build time, I'd need Solr to
 know about the schema.xml that comes with the tests.
 Can anyone tell me how do that properly? I'd basically need Solr to
 temporarily recognize that schema.xml without permanently installing it
 -- is there any way to do this, eg via environment variables?

 TIA
 Bernhard Reiter




DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Deletes writing bytes len 0, corrupting the index

2010-11-04 Thread Jason Rutherglen
I'm still seeing this error after downloading the latest 2.9 branch
version, compiling, copying to Solr 1.4 and deploying.  Basically as
mentioned, the .del files are of zero length... Hmm...

On Wed, Oct 13, 2010 at 1:33 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
 Thanks Robert, that Jira issue aptly describes what I'm seeing, I think.

 On Wed, Oct 13, 2010 at 10:22 AM, Robert Muir rcm...@gmail.com wrote:
 if you are going to fill up your disk space all the time with solr
 1.4.1, I suggest replacing the lucene jars with lucene jars from
 2.9-branch 
 (http://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9/).

 then you get the fix for https://issues.apache.org/jira/browse/LUCENE-2593 
 too.

 On Wed, Oct 13, 2010 at 11:37 AM, Jason Rutherglen
 jason.rutherg...@gmail.com wrote:
 We have unit tests for running out of disk space?  However we have
 Tomcat logs that fill up quickly and starve Solr 1.4.1 of space.  The
 main segments are probably not corrupted, however routinely now, there
 are deletes files of length 0.

 0 2010-10-12 18:35 _cc_8.del

 Which is fundamental index corruption, though less extreme.  Are we
 testing for this?





Re: Problem escaping question marks

2010-11-04 Thread Jonathan Rochkind
Wildcard queries, especially a wildcard query with a wildcard both 
_before_ and _after_, are going to be fairly slow for Solr to process, 
anyhow. (In fact, for some reason I thought wildcards weren't even 
supported both before and after, just one or the other).


Still, it's a bug in lucene, it ought not to do that, true.

But there may be a better design to handle your actual use case with 
much better performance anyhow. Based around doing something at indexing 
time to tokenize in a different field on individual letters (if perhaps 
you frequently want to search on arbitrary individual characters), or to 
simply index a 1 or 0 in a field depending on whether it includes a 
question mark if you specifically want to search all the time on 
question marks and don't care about other letters. Or some kind of more 
complex ngram'ing, if you want to be able to search on all sorts of 
sub-strings, efficiently. The trade-off will be disk space for 
performance... but if you start to have a lot of records, that 
wildcard-on-both-sides thing will have unacceptable performance, I predict.


Jonathan

Stephen Powis wrote:

Looking at the JIRA issue, looks like there's been a new patch related to
this.  This is good news!  We've re-written a portion of our web app to use
Solr instead of mysql.  This part of our app allows clients to construct
rules to match data within their account, and automatically apply actions to
those matched data points.  So far our testing and then rollout has been
smooth, until we encountered the above rule/query.  I guess I assumed since
these metacharacters were escaped that they would be parsed correctly under
any type of query.

What is the likelihood of this being included in the next release/bug fix
version of Solr?  Are there docs available online with basic information
about rolling our own build of Solr that includes this patch?

I appreciate your help!
Thanks!
Stephen


On Thu, Nov 4, 2010 at 9:26 AM, Robert Muir rcm...@gmail.com wrote:

  

On Thu, Nov 4, 2010 at 1:44 AM, Stephen Powis stephen.po...@pardot.com
wrote:


I want to return any first name with a Question Mark in it
Query: first_name: *\?*

  

There is no way to escape the metacharacters * or ? for a wildcard
query (regardless of queryparser, even if you write your own).
See https://issues.apache.org/jira/browse/LUCENE-588

Its something we could fix, but in all honesty it seems one reason it
isn't fixed is because the bug is so old, yet there hasn't really been
any indication of demand for such a thing...




  


RE: Testing/packaging question

2010-11-04 Thread Turner, Robbin J
You need to either add that to catalina.sh or create a setenv.sh in the 
CATALINA_HOME/bin directory.  Then you can restart tomcat.  

So, setenv.sh would contain the following:

   export JAVA_HOME=/path/to/jre
   export JAVA_OPTS==$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml

If you were setting the export in your own environment and then issuing the 
restart, tomcat was not picking up your local environment because it's running 
as root.  You don't want to change root's environment.

You could also, create a context.xml in you 
CATALINA_HOME/conf/CATALINA/localhost.  You should be able to find those 
instruction on/through the Solr FAQ.

Hope this helps. 

From: Bernhard Reiter [ock...@raz.or.at]
Sent: Thursday, November 04, 2010 4:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Testing/packaging question

Hi,

I'm now trying to

export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml

and restarting tomcat (v6 package from ubuntu maverick) via

sudo /etc/init.d/tomcat6 restart

but solr still doesn't seem to find that schema.xml, as it complains
about unknown fields when running the tests that require that schema.xml

Can someone please tell me what I'm doing wrong -- and what I should be
doing?

TIA again,
Bernhard

Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
 Hi,

 I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
 see
 http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/

 In order to run solrpy's supplied tests at build time, I'd need Solr to
 know about the schema.xml that comes with the tests.
 Can anyone tell me how do that properly? I'd basically need Solr to
 temporarily recognize that schema.xml without permanently installing it
 -- is there any way to do this, eg via environment variables?

 TIA
 Bernhard Reiter

Re: Using setStart in solrj

2010-11-04 Thread Peter Karich

 Hi Ron,


 how do I know what the starting row


Always 0.


 especially if the original SolrQuery object has them all


thats the point. solr will normally cache it for you. This is your friend:
queryResultWindowSize40/queryResultWindowSize
!-- Maximum number of documents to cache for any entry in the
   queryResultCache. --

just try it first with http to get an impression what start is good for:
it just sets the starting doc for the current query.
E.g. you have a very complicated query ala
select?q=xyparam1=...param2=...paramN=...rows=20start=0

the next *page* would be
select?q=xyparam1=...param2=...paramN=...rows=20start=20

(newStart=oldStart+rows)

(To get the next page you'll need to keep the params either in the 
session or 'encoded' within the url.)


Just try and ask if you need more info :-)

Regards,
Peter.


Hi all-

First, thanks to all the folks to have helped me so far getting the hang of 
Solr; I promise to give back when I think my contributions will be useful :)

I am at the point where I'm trying to return results back from a search in a war file, using Java 
with solrj. On the result page of the website I'd want to limit the actual results to probably 
around 20 or so, with the usual next/prev page paradigm. The issue I've been wrestling 
with is keeping the SolrQuery object around so that I don't need to transmit the entire thing back 
to the client, especially if they search for something like truck, which could return a 
lot of results.

I was thinking that one solution would be to do a query.setRows(20); for the query, 
then return the results back with some sort of an identifier so that on subsequent queries, I could 
also include query.setStart(someCounter + 1); to get the next set of 20. In theory, 
that would work at the cost of having to re-execute the query.

I've been looking for information about setStart() and haven't found much more than 
Javadoc that says sets the starting row for the result set. My question is, 
how do I know what the starting row is? Maybe, based on the search parameters, it will 
always return the results in an implicit order in which case is it just like executing a 
fixed query in a database and then grabbing the next 20 rows from the result set? Because 
the user would be pressing the prev/next buttons, even though the query is being 
re-executed, the parameters would not be changing.

That's the theory, anyway. It seems excessive to keep executing the same query 
over and over again just because the user wants to see the next set of results, 
especially if the original SolrQuery object has them all, but maybe that's just 
what needs to be done, given the stateless nature of the web.

Any info on this method/strategy would be most appreciated.

Thanks,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.




--
http://jetwick.com twitter search prototype



Re: Optimize Index

2010-11-04 Thread Peter Karich

 what you can try maxSegments=2 or more as a 'partial' optimize:

If the index is so large that optimizes are taking longer than desired 
or using more disk space during optimization than you can spare, 
consider adding the maxSegments parameter to the optimize command. In 
the XML message, this would be an attribute; the URL form and SolrJ have 
the corresponding option too. By default this parameter is 1 since an 
optimize results in a single Lucene segment. By setting it larger than 
1 but less than the mergeFactor, you permit partial optimization to no 
more than this many segments. Of course the index won't be fully 
optimized and therefore searches will be slower. 


from http://wiki.apache.org/solr/PacktBook2009 (I only found that link 
there must be sth. on the real wiki for the maxSegments parameter ...)



Hello.

My Index have ~30 Million documents and a optimize=true is very heavy. it
takes long time ...

how can i start an optimize by using DIH, but NOT after an delta- or
full-import ?

i set my index to compound-index.

thx



--
http://jetwick.com twitter search prototype



RE: Testing/packaging question

2010-11-04 Thread Bernhard Reiter
The thing is, I only have a schema.xml -- no data, no lib directories.

See the tests subdirectory in the solrpy package:
http://pypi.python.org/packages/source/s/solrpy/solrpy-0.9.3.tar.gz

Bernhard

Am Donnerstag, den 04.11.2010, 15:59 -0500 schrieb Olson, Ron:
 I believe it should point to the directory above, where conf and lib are 
 located (though I have a multi-core setup).
 
 Mine is set to:
 
 /usr/local/jboss-5.1.0.GA/server/solr/solr_data/
 
 And in solr_data the solr.xml defines the two cores, but in each core 
 directory, is a conf, data, and lib directory, which contains the schema.xml.
 
 
 
 -Original Message-
 From: Bernhard Reiter [mailto:ock...@raz.or.at]
 Sent: Thursday, November 04, 2010 3:49 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Testing/packaging question
 
 Hi,
 
 I'm now trying to
 
 export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml
 
 and restarting tomcat (v6 package from ubuntu maverick) via
 
 sudo /etc/init.d/tomcat6 restart
 
 but solr still doesn't seem to find that schema.xml, as it complains
 about unknown fields when running the tests that require that schema.xml
 
 Can someone please tell me what I'm doing wrong -- and what I should be
 doing?
 
 TIA again,
 Bernhard
 
 Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
  Hi,
 
  I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
  see
  http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/
 
  In order to run solrpy's supplied tests at build time, I'd need Solr to
  know about the schema.xml that comes with the tests.
  Can anyone tell me how do that properly? I'd basically need Solr to
  temporarily recognize that schema.xml without permanently installing it
  -- is there any way to do this, eg via environment variables?
 
  TIA
  Bernhard Reiter
 
 
 
 
 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.




Re: Problem escaping question marks

2010-11-04 Thread Robert Muir
On Thu, Nov 4, 2010 at 4:58 PM, Stephen Powis stephen.po...@pardot.com wrote:
 What is the likelihood of this being included in the next release/bug fix
 version of Solr?

In this case, not likely. It will have to wait for Solr 4.0

 Are there docs available online with basic information
 about rolling our own build of Solr that includes this patch?

you can checkout trunk with 'svn checkout
http://svn.apache.org/repos/asf/lucene/dev/trunk' and apply the patch
with 'patch -p0  foo.patch'


RE: Testing/packaging question

2010-11-04 Thread Bernhard Reiter
Thanks for your instructions. Unfortunately, I need to do all that as
part of my package's (python-solrpy) build procedure, so I can't change
any global configuration, such as in the catalina subdirectories.

I've already sensed that restarting tomcat is also just too
system-invasive and would include changing its (system-wide)
configuration. 

Are there any other ways to use solr for running the tests from
http://pypi.python.org/packages/source/s/solrpy/solrpy-0.9.3.tar.gz
without having to change any system configuration? Maybe via a user
Tomcat instance such as provided by the tomcat6-user debian package?

Thanks for your help!
Bernhard

Am Donnerstag, den 04.11.2010, 16:15 -0500 schrieb Turner, Robbin J:
 You need to either add that to catalina.sh or create a setenv.sh in the 
 CATALINA_HOME/bin directory.  Then you can restart tomcat.  
 
 So, setenv.sh would contain the following:
 
export JAVA_HOME=/path/to/jre
export JAVA_OPTS==$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml
 
 If you were setting the export in your own environment and then issuing the 
 restart, tomcat was not picking up your local environment because it's 
 running as root.  You don't want to change root's environment.
 
 You could also, create a context.xml in you 
 CATALINA_HOME/conf/CATALINA/localhost.  You should be able to find those 
 instruction on/through the Solr FAQ.
 
 Hope this helps. 
 
 From: Bernhard Reiter [ock...@raz.or.at]
 Sent: Thursday, November 04, 2010 4:49 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Testing/packaging question
 
 Hi,
 
 I'm now trying to
 
 export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml
 
 and restarting tomcat (v6 package from ubuntu maverick) via
 
 sudo /etc/init.d/tomcat6 restart
 
 but solr still doesn't seem to find that schema.xml, as it complains
 about unknown fields when running the tests that require that schema.xml
 
 Can someone please tell me what I'm doing wrong -- and what I should be
 doing?
 
 TIA again,
 Bernhard
 
 Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
  Hi,
 
  I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
  see
  http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/
 
  In order to run solrpy's supplied tests at build time, I'd need Solr to
  know about the schema.xml that comes with the tests.
  Can anyone tell me how do that properly? I'd basically need Solr to
  temporarily recognize that schema.xml without permanently installing it
  -- is there any way to do this, eg via environment variables?
 
  TIA
  Bernhard Reiter




Re: querying multiple fields as one

2010-11-04 Thread Tommaso Teofili
Hi Erick

2010/11/4 Erick Erickson erickerick...@gmail.com

 Ken's suggestion to look at dismax is a good one, but I have
 a question
 q=type:electronics cat:electronics

 should do what you want assuming your default operator
 is OR.


correct


  Is it failing? Or is the real question how you can
 do this automatically?


No failing, just looking for how to do such expansion of fields
automatically (with fields in OR but that's not an issue I think)



 I'd expect the ranking to be a bit different, but I'm guessing
 that's not a big issue


right, no problem if the scoring isn't exactly the same.
Thanks,
Tommaso




 Best
 Erick

 On Thu, Nov 4, 2010 at 8:21 AM, Tommaso Teofili
 tommaso.teof...@gmail.comwrote:

  Hi all,
  having two fields named 'type' and 'cat' with identical type and options,
  but different values recorded, would it be possible to query them as they
  were one field?
  For instance
   q=type:electronics cat:electronics
  should return same results as
   q=common:electronics
  I know I could make it defining a third field 'common' with copyFields
 from
  'type' and 'cat' to 'common' but this wouldn't be feasible if you've
  already
  lots of documents in your index and don't want to reindex everything,
 isn't
  it?
  Any suggestions?
  Thanks in advance,
  Tommaso
 



RE: Testing/packaging question

2010-11-04 Thread Turner, Robbin J
You can setup your own tomcat instance which would contain just configurations 
you need.  You won't even have to recreate all the tomcat configuration and 
binaries, just the ones that were not defaults.  So, if you lookup multiple 
tomcat configuration instance (google it), and then you'll have a set of 
directories.   You'll need to have your own startup script that points to your 
configurations.  You can use the current startup script as a model, then in 
your build procedures (I've done all this with a script) have this added to the 
system so you can preform restart.  You'd have to have a couple of other 
environment variables set:

export CATALINA_BASE=/path/to/your/tomcat/instance/conf/files
export CATALINA_HOME=/path/to/default/installation/bin/files
export SOLR_HOME=/path/to/solr/dataNconf

Good luck


From: Bernhard Reiter [ock...@raz.or.at]
Sent: Thursday, November 04, 2010 5:49 PM
To: solr-user@lucene.apache.org
Subject: RE: Testing/packaging question

Thanks for your instructions. Unfortunately, I need to do all that as
part of my package's (python-solrpy) build procedure, so I can't change
any global configuration, such as in the catalina subdirectories.

I've already sensed that restarting tomcat is also just too
system-invasive and would include changing its (system-wide)
configuration.

Are there any other ways to use solr for running the tests from
http://pypi.python.org/packages/source/s/solrpy/solrpy-0.9.3.tar.gz
without having to change any system configuration? Maybe via a user
Tomcat instance such as provided by the tomcat6-user debian package?

Thanks for your help!
Bernhard

Am Donnerstag, den 04.11.2010, 16:15 -0500 schrieb Turner, Robbin J:
 You need to either add that to catalina.sh or create a setenv.sh in the 
 CATALINA_HOME/bin directory.  Then you can restart tomcat.

 So, setenv.sh would contain the following:

export JAVA_HOME=/path/to/jre
export JAVA_OPTS==$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml

 If you were setting the export in your own environment and then issuing the 
 restart, tomcat was not picking up your local environment because it's 
 running as root.  You don't want to change root's environment.

 You could also, create a context.xml in you 
 CATALINA_HOME/conf/CATALINA/localhost.  You should be able to find those 
 instruction on/through the Solr FAQ.

 Hope this helps.
 
 From: Bernhard Reiter [ock...@raz.or.at]
 Sent: Thursday, November 04, 2010 4:49 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Testing/packaging question

 Hi,

 I'm now trying to

 export JAVA_OPTS=$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml

 and restarting tomcat (v6 package from ubuntu maverick) via

 sudo /etc/init.d/tomcat6 restart

 but solr still doesn't seem to find that schema.xml, as it complains
 about unknown fields when running the tests that require that schema.xml

 Can someone please tell me what I'm doing wrong -- and what I should be
 doing?

 TIA again,
 Bernhard

 Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
  Hi,
 
  I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
  see
  http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/
 
  In order to run solrpy's supplied tests at build time, I'd need Solr to
  know about the schema.xml that comes with the tests.
  Can anyone tell me how do that properly? I'd basically need Solr to
  temporarily recognize that schema.xml without permanently installing it
  -- is there any way to do this, eg via environment variables?
 
  TIA
  Bernhard Reiter

Re: querying multiple fields as one

2010-11-04 Thread Jonathan Rochkind

Tommaso Teofili wrote:



No failing, just looking for how to do such expansion of fields
automatically (with fields in OR but that's not an issue I think)
  

the dismax query parser is that way.



Re: mergeFactor questions

2010-11-04 Thread Tommaso Teofili
Thanks so much Shawn, I am in a scenario with many inserts while searching,
each add consisting of ~ 500documents, I will monitor the number of
segments taking your considerations in mind :-)
Regards,
Tommaso

2010/11/4 Shawn Heisey s...@elyograg.org

 On 11/4/2010 3:27 AM, Tommaso Teofili wrote:

- Is mergeFactor a one time configuration setting that is considered
 only

when creating the index for the first time or can it be adjusted later
 even
with some docs inside the index? e.g. I have mF to 10 then I realize I
 want
quicker searches and I set it to 2 so that at the next optimize/commit
 I
will have no more than 2 segments. My understanding is that one can
 adjust
mF over time, is it right?


 The mergeFactor is applied anytime documents are added to the index, not
 just when it is built for the first time.  You can adjust it later, and
 reload the core or restart Solr.  It will apply to any additional indexing
 from that point forward.

 With a mergeFactor of 10, having 21 segments (and more) temporarily on the
 disk at the same time is reasonably possible.  I know this applies if you
 are doing a continuous large insert, not sure if you are doing several small
 inserts separately. These segments are:

 * The small segment that is being built right now.
 * The previous 10 small segments.
 * The merged segment being created from those above.
 * The previous 9 merged segments.

 If it takes a really long time to merge the last 10 small segments and then
 merge the 10 large segments into an even larger segment, you can end up with
 even more small segments from your continuous insert.  If it should take
 long enough that you actually get 10 more new small segments, the large
 merge will pause while it completes the small merge.  I saw this happen
 recently when I decided to see what happens if I built a single shard from
 our entire database.  It took a really long time, partly from that
 super-merge and the optimize that happened later, and took up 85GB of disk
 space.

 I'm not really sure what happens if you have this continue beyond a single
 super-merge like I have mentioned.

 - In a replicated environment does it make sense to define different

mergeFactors on master and slave? I'd say no since it influences the
 number
of segments created, that being a concern of who actually index
 documents
(the master) not of who receives (segments of) index, but please
 correct me
if I am wrong.


 Because it only applies when indexes are being built, it has no meaning on
 a slave, which as you said, just copies the data from the master.

 Shawn




RE: Does Solr support Natural Language Search

2010-11-04 Thread Steven A Rowe
Hi Jayant,

I think you mean NL search as opposed to Boolean search: the ability to return 
ranked results from queries based on non-required term matches.  Right?

If that is what you meant, then the answer is: Yes!.  If not, then you should 
rephrase your question.  

Otherwise, the answer could eventually be: Maybe!!!.  YMMV, TMR.

Steve

 -Original Message-
 From: jayant [mailto:jayan...@hotmail.com]
 Sent: Wednesday, November 03, 2010 11:49 PM
 To: solr-user@lucene.apache.org
 Subject: Does Solr support Natural Language Search
 
 
 Does Solr support Natural Language Search? I did not find any thing about
 this in the reference manual. Please let me know.
 Thanks.
 --
 View this message in context: http://lucene.472066.n3.nabble.com/Does-
 Solr-support-Natural-Language-Search-tp1839262p1839262.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Testing/packaging question

2010-11-04 Thread Peter Karich

 Hi,

don't know if the python package provides one but solrj offers to start 
solr embedded (|EmbeddedSolrServer|) and

setting up different schema + config is possible. for this see:
https://karussell.wordpress.com/2010/06/10/how-to-test-apache-solrj/

if you need an 'external solr' (via jetty and java -jar start.jar) while 
tests running see this:

http://java.dzone.com/articles/getting-know-solr

Regards,
Peter.



Hi,

I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
see
http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/

In order to run solrpy's supplied tests at build time, I'd need Solr to
know about the schema.xml that comes with the tests.
Can anyone tell me how do that properly? I'd basically need Solr to
temporarily recognize that schema.xml without permanently installing it
-- is there any way to do this, eg via environment variables?

TIA
Bernhard Reiter





Dataimporthandler crashed raidcontroller

2010-11-04 Thread Robert Gründler
Hi all,

we had a severe problem with our raidcontroller on one of our servers today 
during importing a table with ~8 million rows into a solr index. After 
importing about 4 million
documents, our server shutdown, and failed to restart due to a corrupt raid
disk. 

The Solr data import was the only heavy process running on that machine during
the crash.

Has anyone experienced hdd/raid-related problems during indexing large sql 
databases into solr?


thanks!


-robert

 




Re: Optimize Index

2010-11-04 Thread Erick Erickson
no, you didn't miss anything. The comment at Lucen Revolution was more
along the lines that optimize didn't actually improve much #absent# deletes.

Plus, on a significant size corpus, the doc frequencies won't changed that
much by deleting documents, but that's a case-by-case thing

Best
Erick

On Thu, Nov 4, 2010 at 4:31 PM, Markus Jelsma markus.jel...@openindex.iowrote:

 Huh? That's something new for me. Optmize removed documents that have been
 flagged for deletion. For relevancy it's important those are removed
 because
 document frequencies are not updated for deletes.

 Did i miss something?

  For what it's worth, the Solr class instructor at the Lucene Revolution
  conference recommended *against* optimizing, and instead suggested to
 just
  let the merge factor do it's job.
 
  On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey s...@elyograg.org wrote:
   On 11/4/2010 7:22 AM, stockiii wrote:
   how can i start an optimize by using DIH, but NOT after an delta- or
   full-import ?
  
   I'm not aware of a way to do this with DIH, though there might be
   something I'm not aware of.  You can do it with an HTTP POST.  Here's
   how to do it with curl:
  
   /usr/bin/curl http://HOST:PORT/solr/CORE/update; \
   -H Content-Type: text/xml \
   --data-binary 'optimize waitFlush=true waitSearcher=true/'
  
   Shawn



Re: Dataimporthandler crashed raidcontroller

2010-11-04 Thread Fuad Efendi
I experienced similar problems. It was because we didn't perform load stress 
tests properly, before going to production. Nothing is forever, replace 
controller, change hardware vendor, maintain low temperature inside a rack. 
Thanks
--Original Message--
From: Robert Gründler
To: solr-user@lucene.apache.org
ReplyTo: solr-user@lucene.apache.org
Subject: Dataimporthandler crashed raidcontroller
Sent: Nov 4, 2010 7:21 PM

Hi all,

we had a severe problem with our raidcontroller on one of our servers today 
during importing a table with ~8 million rows into a solr index. After 
importing about 4 million
documents, our server shutdown, and failed to restart due to a corrupt raid
disk. 

The Solr data import was the only heavy process running on that machine during
the crash.

Has anyone experienced hdd/raid-related problems during indexing large sql 
databases into solr?


thanks!


-robert

 




Sent on the TELUS Mobility network with BlackBerry

Re: Does Solr support Natural Language Search

2010-11-04 Thread Li Li
I don't think current lucene will offer what you want now.
There are 2 main tasks in a search process.
One is understanding users' intension. Because natural language
understanding is difficult, Current Information Retrival systems
force users input some terms to express their needs. But terms have
ambiguations. e.g. apple may means a fruit or electronics. so users
are asked to inpinput more terms to disambiguate . e.g. apple fruit
may suggest user want fruit apple. There are many things to help
detect user's demand -- query expansion(Searches related to in google)
suggests when user type .. The ultimate goal is understanding
intension by analyzing users' natural language.

Another is understanding documents. Current models such as VSM
don't understanding document. it just regards documents as words'
collections. when users input a word, it returns documents contains
this word(tf). of course idf is also taken into consideration.
But it's far from understanding. That's why Keyword stuffing comes
out. Because machine don't really understanding the document and can't
judge whether the document is good or bad or whether it matchs query
good or bad
So PageRank and some other external informations are used to
relieve this problem. But can't fully solve it.
To fully understand documents need more advaned NLP techs. But I
don't think it will achieve human's intelligence in near future
although I am a NLPer
Another road is human help machine understanding, That's which
called web 2.0 social networks, semantic web ... But also not an easy
task.


2010/11/4 jayant jayan...@hotmail.com:

 Does Solr support Natural Language Search? I did not find any thing about
 this in the reference manual. Please let me know.
 Thanks.
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Does-Solr-support-Natural-Language-Search-tp1839262p1839262.html
 Sent from the Solr - User mailing list archive at Nabble.com.



How to Facet on a price range

2010-11-04 Thread jayant

I am able to facet on a particular field because I have index on that field.
But I am not sure how to facet on a price range when I have the exact price
in the 'price' field. Can anyone help here.
Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-Facet-on-a-price-range-tp1846392p1846392.html
Sent from the Solr - User mailing list archive at Nabble.com.