Re: AW: Leading wildcards

2007-04-23 Thread Maarten . De . Vilder
hey,

we've stumbled on something weird while using wildcards 

we enabled leading wildcards in solr (see previous message from Christian 
Burkamp)

when we do a search on a nonexisting field, we get a  SolrException: 
undefined field
(this was for query nonfield:test)

but when we use wildcards in our query, we dont get the undefined field 
exception,
so the query nonfield:*test works fine ... just zero results...

is this normal behaviour ? 




Burkamp, Christian [EMAIL PROTECTED] 
19/04/2007 12:37
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
AW: Leading wildcards






Hi there,

Solr does not support leading wildcards, because it uses Lucene's standard 
QueryParser class without changing the defaults. You can easily change 
this by inserting the line

parser.setAllowLeadingWildcards(true);

in QueryParsing.java line 92. (This is after creating a QueryParser 
instance in QueryParsing.parseQuery(...))

and it obviously means that you have to change solr's source code. It 
would be nice to have an option in the schema to switch leading wildcards 
on or off per field. Leading wildcards really make no sense on richly 
populated fields because queries tend to result in too many clauses 
exceptions most of the time.

This works for leading wildcards. Unfortunately it does not enable 
searches with leading AND trailing wildcards. (E.g. searching for *lega* 
does not find results even if the term elegance is in the index. If you 
put a second asterisk at the end, the term elegance is found. (search 
for *lega** to get hits).
Can anybody explain this though it seems to be more of a lucene 
QueryParser issue?

-- Christian

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Gesendet: Donnerstag, 19. April 2007 08:35
An: solr-user@lucene.apache.org
Betreff: Leading wildcards


hi,

we have been trying to get the leading wildcards to work.

we have been looking around the Solr website, the Lucene website, wiki's 
and the mailing lists etc ...
but we found a lot of contradictory information.

so we have a few question : 
- is the latest version of lucene capable of handling leading wildcards ? 
- is the latest version of solr capable of handling leading wildcards ?
- do we need to make adjustments to the solr source code ?
- if we need to adjust the solr source, what do we need to change ?

thanks in advance !
Maarten




browse a facet without a query?

2007-04-23 Thread Jennifer Seaman
When there is no q Solr complains. How can I browse a facet without 
a keyword query? For example, I want to view all document for a given state;


?q=fq=state:California

Thank you.
Jennifer Seaman  

Re: browse a facet without a query?

2007-04-23 Thread Yonik Seeley

On 4/23/07, Jennifer Seaman [EMAIL PROTECTED] wrote:

When there is no q Solr complains. How can I browse a facet without
a keyword query? For example, I want to view all document for a given state;

?q=fq=state:California


With a relatively recent nightly build, you can use q=*:*
Before that, use an open-ended range query like q=state:[* TO *]

-Yonik


Re: solr utf 16 ?

2007-04-23 Thread Ken Krugler

Are there any plans to make solr UTF-16 compliant in the future?
If so, is it in the short-term or long-term?


I'm curious what you mean by UTF-16 complaint. Do you mean being 
able to handle UTF-16 encoded XML?


Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
Find Code, Find Answers


Re: solr utf 16 ?

2007-04-23 Thread brian beard
Yes. I'm assuming if you have UTF-16 encoded data in a document that needs 
to be added to the index, that solr would not be able to handle this?


I'm curious what you mean by UTF-16 complaint. Do you mean being able to 
handle UTF-16 encoded XML?




_
Don’t quit your job – Take Classes Online and Earn your Degree in 1 year. 
Start Today! 
http://www.classesusa.com/clickcount.cfm?id=866146goto=http%3A%2F%2Fwww.classesusa.com%2Ffeaturedschools%2Fonlinedegreesmp%2Fform-dyn1.html%3Fsplovr%3D866144




Re: solr utf 16 ?

2007-04-23 Thread Ken Krugler
I'm curious what you mean by UTF-16 complaint. Do you mean being 
able to handle UTF-16 encoded XML?


Yes. I'm assuming if you have UTF-16 encoded data in a document that 
needs to be added to the index, that solr would not be able to 
handle this?


I've never tried sending anything but UTF-8 to Solr, so I can't 
comment on what issues you'll run into.


But based on my experience to date, I'd strongly suggest converting 
it to UTF-8 before you post it to Solr.


-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
Find Code, Find Answers


Re: solr utf 16 ?

2007-04-23 Thread Mike Klaas

On 4/23/07, brian beard [EMAIL PROTECTED] wrote:

Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
to be added to the index, that solr would not be able to handle this?


I believe that handling arbitrary encodings is on the list of future
enhancements, but I couldn't give you a timeline.

For the time being, consider that
1. utf-8 is the lingua franca of xml document encoding
2. it is very easy to convert it yourself (it would be a 3-4 line
python commandline filter, frinstance).

-Mike


Solr on Lucene/Solr site?

2007-04-23 Thread Matthew Runo

Hey there -

It just occurred to me that the search on lucene.apache.org is  
powered by google. Shouldn't it be Solr? heh


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++




Re: does solr handle updates quickly?

2007-04-23 Thread Matthew Runo
This might also be a cool was to increase relevancy. Does Lucene/Solr  
do, or can it do, any sort of increase on relevancy depending on  
which search result a user picks?


Would it be feasible for me to update an index_id with a click count  
each time a user clicks a result, and give this field a boost in the  
results?


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Apr 22, 2007, at 7:17 PM, Tait Larson wrote:

Hi, I'm new to Solr.  I've just started playing around with it and  
learning

what it can do.

I'd like to include a vote field on all of my indexed documents.   
Users vote
on the content they like.  A vote tally is displayed along with the  
each

document returned in the results of a search.

Let's say I create a vote field of type SortableIntField.  Users vote
relatively frequently. Assume I send update commands to solr which  
change

only the vote field approximately 1 time for every 50 searches a user
performs.   What effects will this have on my index? Will search  
performance

degrade.

Thanks,

Tait




Re: solr utf 16 ?

2007-04-23 Thread brian beard

Thanks for all the comments. The conversion seems like a good alternative.


From: Mike Klaas [EMAIL PROTECTED]
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: solr utf 16 ?
Date: Mon, 23 Apr 2007 11:13:54 -0700

On 4/23/07, brian beard [EMAIL PROTECTED] wrote:

Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
to be added to the index, that solr would not be able to handle this?


I believe that handling arbitrary encodings is on the list of future
enhancements, but I couldn't give you a timeline.

For the time being, consider that
1. utf-8 is the lingua franca of xml document encoding
2. it is very easy to convert it yourself (it would be a 3-4 line
python commandline filter, frinstance).

-Mike


_
Need a break? Find your escape route with Live Search Maps. 
http://maps.live.com/?icid=hmtag3




Re: browse a facet without a query?

2007-04-23 Thread Tom Hill

Hi -

On 4/23/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 4/23/07, Jennifer Seaman [EMAIL PROTECTED] wrote:
 When there is no q Solr complains. How can I browse a facet without
 a keyword query? For example, I want to view all document for a given
state;

 ?q=fq=state:California

With a relatively recent nightly build, you can use q=*:*
Before that, use an open-ended range query like q=state:[* TO *]



I was doing the q=state[* TO *] for a short time, and found it very slow. I
switched to doing a query on a single field that covered the part of the
index I was interested in, for example:

inStock:true

And got much faster performance. I was getting execution times in seconds
(for example, I just manually did this and got. 2.2 seconds for the [* TO
*], and 50 milliseconds for the latter (inStock:true), uncached)

In my case the filter query hits about 80% of the docs, so it's doing a
similar amount of work. I don't know how well *:* performs, but if it is
similar to state:[* TO *], I would benchmark it before using.

For us, facet queries are a high percentage, so the time was critical. It
might even be worth adding a field, if you don't already have an appropriate
one.

Tom


Re: Leading wildcards

2007-04-23 Thread Walter Underwood
Here is a late response, apache.org was rejecting our e-mails...

Allowing leading wildcards opens up a denial of service attack. It becomes
trivial to overload the search engine and take it out of service, just
hammer it with leading wildcard queries. Please leave the default as
disabled. If we add a config option, there should be a  security warning
with it.

wunder

On 4/19/07 8:04 AM, Michael Kimsal [EMAIL PROTECTED] wrote:

 It still seems like it's only something that would be invoked by a user's
 query.
 
 If I queried for *foobar and leading wildcards were not on in the server,
 I'd get back nothing, which isn't really correct.  I'd think the application
 should
 tell the user that that syntax isn't supported.
 
 Perhaps I'm simplifying it a bit.  It would certainly help out our comfort
 level
 to have it either be on or configurable by default, rather than having to
 maintain a
 'patched' version (yes, the patch is only one line, but it's the principle
 of the thing).
 I suspect this would be the same for others.
 
 
 
 On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 
 
 On Apr 19, 2007, at 10:39 AM, Yonik Seeley wrote:
 On 4/19/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 parser.setAllowLeadingWildcards(true);
 
 I have also run into this issue and have intended to fix up Solr to
 allow configuring that switch on QueryParser.
 
 Any reason that parser.setAllowLeadingWildcards(true) shouldn't be
 the default?
 
 That's fine by me.  But...
 
 Does it really need to be configurable?
 
 It all depends on how bad of a hit it'd take on Solr.   What's the
 breaking point where the performance of full-term scanning (and
 subsequently faceting, of course) kills over or dies?   FuzzyQuery's
 die on my 3.7M index and not-super-beefy hardware and system setup.
 
 Erik
 
 
 



Re: browse a facet without a query?

2007-04-23 Thread Chris Hostetter

: I was doing the q=state[* TO *] for a short time, and found it very slow. I
: switched to doing a query on a single field that covered the part of the
: index I was interested in, for example:
:
: inStock:true

if you have the filterCache enabled and you aren't opening new searchers
very often, the open ended range query should results in a cached bitself
just as good as something like inStock:true ... i think yonik just
suggested it because if you are faceting on state then you can be
confident that you are only interested in docs that have a state field.

: And got much faster performance. I was getting execution times in seconds
: (for example, I just manually did this and got. 2.2 seconds for the [* TO
: *], and 50 milliseconds for the latter (inStock:true), uncached)

[* TO *] on the default field might be very slow (because it's iterating
over all the terms) but on a field with a small number of discrete values
(like state, or inStock) it should be very fast.

: similar amount of work. I don't know how well *:* performs, but if it is
: similar to state:[* TO *], I would benchmark it before using.

*:* is implemented extremeley efficienlty ... it doesn't look at any term
info, it just iterates over all the non-deleted docs.



-Hoss



Re: browse a facet without a query?

2007-04-23 Thread Yonik Seeley

On 4/23/07, Tom Hill [EMAIL PROTECTED] wrote:

I was doing the q=state[* TO *] for a short time, and found it very slow. I
switched to doing a query on a single field that covered the part of the
index I was interested in, for example:

inStock:true

And got much faster performance.


Good point... the fewer the terms, the faster the performance.


I don't know how well *:* performs, but if it is
similar to state:[* TO *], I would benchmark it before using.


*:* will be the fastest as it translates to a MatchAllDocsQuery, which
does no term lookups at all, but just skips over deleted documents.

-Yonik


Re: Snapshooting or replicating recently indexed data

2007-04-23 Thread Bill Au

Here's the Solr Wiki on collection distribution:

http://wiki.apache.org/solr/CollectionDistribution

It describes the incremental nature of the distribution:

A collection is a directory of many files. Collections are distributed
to the slaves as snapshots of these files. Each snapshot is made up of
hard links to the files so copying of the actual files is not
necessary when snapshots are created. Lucene only significantly
rewrites files following an optimization command. Generally, a file
once written, will change very little if at all. This makes the
underlying transport of rsync very useful. Files that have already
been transfered and have not changed do not need to be re-transferred
with the new edition of a collection.

Bill

On 4/21/07, Kevin Lewandowski [EMAIL PROTECTED] wrote:

snapshooter does create incremental builds of the index. It doesn't
appear so if you look at the contents because the existing files are
hard links. But it is incremental.

On 4/20/07, Doss [EMAIL PROTECTED] wrote:
 Hi Yonik,

 Thanks for your quick response, my question is this, can we take incremental
 backup/replication in SOLR?

 Regards,
 Doss.


 M. MOHANDOSS Software Engineer Ext: 507 (A BharatMatrimony Enterprise)
 - Original Message -
 From: Yonik Seeley [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, April 19, 2007 7:42 PM
 Subject: Re: Snapshooting or replicating recently indexed data


  On 4/19/07, Doss [EMAIL PROTECTED] wrote:
  It seems the snapshooter  takes the exact copy of the indexed data, that
  is all the contents inside the index directory,  how can we take the
  recently added once?
  ...
  cp -lr ${data_dir}/index ${temp}
  mv ${temp} ${name} ...
 
 
  I don't quite understand your question, but since hard links are used,
  it's more like pointing to the index files instead of copying them.
  Rsync is used as a transport to only move the files that were changed
  from the master to slaves.
 
  -Yonik





Re: snapshooter on OS X

2007-04-23 Thread Bill Au

You can also run the script with the -V option.  It shows debugging
info but not as much as bash -x.

I tried snapshooter on OS X 10.4.9.  I did get the cp: illegal option
-- l error.
But that's the only error I got.

Bill

On 4/23/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote:

On 4/23/07, Grant Ingersoll [EMAIL PROTECTED] wrote:
 ...The error says something about command not found line 15, but all the
 files I looked at, line 15 was a comment...

Running your script with

  bash -x myscript

should help, it will echo commands before executing them.

-Bertrand