Re: delete changed?

2007-06-21 Thread Chris Hostetter
:  curl http://192.168.7.6:8080/solr0/update --data-binary
: 'deletequerynodeid:20/query/delete'
:
: i remember it is ok when i use solr 1.1
...
: HTTP Status 400 - missing content stream


please note the Upgrading from Solr 1.1 section of the 1.2 CHANGES.txt
file, which states...

The Solr Request Handler framework has been updated in two key ways:
First, if a Request Handler is registered in solrconfig.xml with a name
starting with / then it can be accessed using path-based URL, instead of
using the legacy /select?qt=name URL structure.  Second, the Request
Handler framework has been extended making it possible to write Request
Handlers that process streams of data for doing updates, and there is a
new-style Request Handler for XML updates given the name of /update in
the example solrconfig.xml.  Existing installations without this /update
handler will continue to use the old update servlet and should see no
changes in behavior.  For new-style update handlers, errors are now
reflected in the HTTP status code, Content-type checking is more strict,
and the response format has changed and is controllable via the wt
parameter.



-Hoss



Re: Multi-language indexing and searching

2007-06-21 Thread Daniel Alheiros
Hi Hoss.

I've tried that yesterday using the same approach you just said (I've
created the base fields for any language with basic analyzers) and it worked
alright.

Thanks again for you time.

Regards,
Daniel


On 20/6/07 21:00, Chris Hostetter [EMAIL PROTECTED] wrote:

 
 : So far it sounds good for my needs, now I'm going to try if my other
 : features still work (I'm worried about highlighting as I'm going to return a
 : different field)...
 
 i'm not really a highlighting guy so i'm not sure ... but if you're okay
 with *simple* highlighting you can probably just highlight your title
 field (using a whitespace analyzer or something) and get decent results
 without needing to worry about the fact that you are using differnet
 langauges.
 
 
 
 -Hoss
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: Multiple doc types in schema

2007-06-21 Thread Otis Gospodnetic
SOLR-215 support multiple indices on a single Solr instance.  It does *not* 
support searching of multiple indices at once (e.g. parallel search) and 
merging of results.

This has nothing to do with NFS, though.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: James liu [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Thursday, June 21, 2007 3:45:06 AM
Subject: Re: Multiple doc types in schema

I see SOLR-215 from this mail.

Does it now really support multi index and search it will return merged
data?

for example:

i wanna search: aaa, and i have index1, index2, index3, index4it should
return the result from index1,index2,index3, index4 and merge result by
score, datetime, or other thing.

Does it support NFS and how its performance?



2007/6/21, Otis Gospodnetic [EMAIL PROTECTED]:

 This sounds like a potentially good use-case for SOLR-215!
 See https://issues.apache.org/jira/browse/SOLR-215

 Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

 - Original Message 
 From: Chris Hostetter [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org; Jack L [EMAIL PROTECTED]
 Sent: Wednesday, June 6, 2007 6:58:10 AM
 Subject: Re: Multiple doc types in schema


 : This is based on my understanding that solr/lucene does not
 : have the concept of document type. It only sees fields.
 :
 : Is my understanding correct?

 it is.

 : It seems a bit unclean to mix fields of all document types
 : in the same schema though. Or, is there a way to allow multiple
 : document types in the schema, and specify what type to use
 : when indexing and searching?

 it's really just an issue of semantics ... the schema.xml is where you
 list all of the fields you need in your index, any notion of doctype is
 entire artificial ... you could group all of the
 fields relating to doctypeA in one section of the schema.xml, then have a
 big !-- ##...## -- line and then list the fields in doctypeB, etc... but
 wat if there are fields you use in both doctypes ? .. how much you mix
 them is entirely up to you.



 -Hoss







-- 
regards
jl





Re: problems getting data into solr index

2007-06-21 Thread vanderkerkoff

Hi Mike, Brian

Thanks for helping with this, and for clearing up my misunderstanding.  Solr
the python module and Solr the package being two different things, I've got
you.

The issues I have are compounded by the fact that we're hovering between
using the Unicode branch of Django and the older branch that has newforms,
both of which have an impact on what I'm trying to do.

It's getting closer to being resolved, and it's down to your advice, so
thanks again.






-- 
View this message in context: 
http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a11230922
Sent from the Solr - User mailing list archive at Nabble.com.



Re: All facet.fields for a given facet.query?

2007-06-21 Thread Thomas Traeger



: Faceting on manufacturers and categories first and than present the
: corresponding facets might be used under some circumstances, but in my case
: the category structure is quite deep, detailed and complex. So when
: the user enters a query I like to say to him Look, here are the
: manufacturers and categories with matches to your query, choose one if you
: want, but maybe there is another one with products that better fit your
: needs or products that you didn't even know about. So maybe you like to
: filter based on the following attributes. Something like this ;o)

categories was just an example i used because it tends to be a common use
case ... my point is the decision about which facet qualifies for the
maybe there is another one with products that better fit your needs part
of the response either requires computing counts for *every* facet
constraint and then looking at them to see which ones provide good
distribution, or by knowing something more about your metadata (ie: having
stats that show the majority of people who search on the word canon want
to facet on megapixels) .. this is where custom biz logic comes in,
becuase in a lot of situations computing counts for every possible facet
may not be practical (even if the syntax to request it was easier)
I get your point, but how to know where additional metadata is of value 
if not
just trying? Currently I start with a generic approach to see what 
really is

in the product data, to get an overview of the quality of the data and
what happens if I use the data in the new search solution. Then I can 
decide

what is to do to optimize the system, i.e. try to reduce the count of
attributes, get the marketing to split somewhat generic attributes into 
more
detailed ones, find a way to display the most relevant facets for the 
current

query first and so on...

Tom


Re: Multiple doc types in schema

2007-06-21 Thread Walter Underwood
I used Solr with indexes on NFS and I do not recommend it.
It was either 100 or 1000 times slower than local disc
for indexing, I forget which. Unusable.

This is not a problem with Solr/Lucene, I have seen the
same NFS performance cost with other search engines.

wunder

On 6/21/07 3:22 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 SOLR-215 support multiple indices on a single Solr instance.  It does *not*
 support searching of multiple indices at once (e.g. parallel search) and
 merging of results.
 
 This has nothing to do with NFS, though.
 
 Otis
  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
 
 - Original Message 
 From: James liu [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Thursday, June 21, 2007 3:45:06 AM
 Subject: Re: Multiple doc types in schema
 
 I see SOLR-215 from this mail.
 
 Does it now really support multi index and search it will return merged
 data?
 
 for example:
 
 i wanna search: aaa, and i have index1, index2, index3, index4it should
 return the result from index1,index2,index3, index4 and merge result by
 score, datetime, or other thing.
 
 Does it support NFS and how its performance?
 
 
 
 2007/6/21, Otis Gospodnetic [EMAIL PROTECTED]:
 
 This sounds like a potentially good use-case for SOLR-215!
 See https://issues.apache.org/jira/browse/SOLR-215
 
 Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
 Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
 
 - Original Message 
 From: Chris Hostetter [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org; Jack L [EMAIL PROTECTED]
 Sent: Wednesday, June 6, 2007 6:58:10 AM
 Subject: Re: Multiple doc types in schema
 
 
 : This is based on my understanding that solr/lucene does not
 : have the concept of document type. It only sees fields.
 :
 : Is my understanding correct?
 
 it is.
 
 : It seems a bit unclean to mix fields of all document types
 : in the same schema though. Or, is there a way to allow multiple
 : document types in the schema, and specify what type to use
 : when indexing and searching?
 
 it's really just an issue of semantics ... the schema.xml is where you
 list all of the fields you need in your index, any notion of doctype is
 entire artificial ... you could group all of the
 fields relating to doctypeA in one section of the schema.xml, then have a
 big !-- ##...## -- line and then list the fields in doctypeB, etc... but
 wat if there are fields you use in both doctypes ? .. how much you mix
 them is entirely up to you.
 
 
 
 -Hoss
 
 
 
 
 
 



Re: Multiple doc types in schema

2007-06-21 Thread Frédéric Glorieux


Otis,

Thanks for the link and the work !
Maybe around september, I will need this patch, if it's not already 
commit to the Solr sources.


I will also need multiple indexes searches, but understand that there is 
no simple, fast and genereric solution in solr context. Maybe I should 
lose solr caching, but it seems not an impossible work to design its own 
custom request handler to query different indexes, like lucene allow it.



SOLR-215 support multiple indices on a single Solr instance.  It does *not* 
support searching of multiple indices at once (e.g. parallel search) and 
merging of results.





--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Re: Multiple doc types in schema

2007-06-21 Thread Yonik Seeley

On 6/21/07, Frédéric Glorieux [EMAIL PROTECTED] wrote:

I will also need multiple indexes searches,


Do you mean:

1) Multiple unrelated indexes with different schemas, that you will
search separately... but you just want them in the same JVM for some
reason.

2) Multiple indexes with different schemas, search will search across
all or some subset and combine the results (federated search)

3) Multiple indexes with the same schema, each index is a shard that
contains part of the total collection.  Search will merge results
across all shards to give appearance of a single large collection
(distributed search).

-Yonik


Re: Recent updates to Solrsharp

2007-06-21 Thread Jeff Rodenburg

great, thanks Yonik.

On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/21/07, Jeff Rodenburg [EMAIL PROTECTED] wrote:
 As an aside, it would be nice to record these issues more granularly in
 JIRA.  Could we get a component created for our client library, similar
to
 java/php/ruby?

Done.

-Yonik



Re: DismaxRequestHandler reports sort by score as invalid

2007-06-21 Thread J.J. Larrea
Because score desc is the default Lucene  Solr behavior when no explicit 
sort is specified, QueryParsing.parseSort() returns a null sort so that the 
non-sort versions of the query execution routines get called.  However the 
caller SolrPluginUtils.parseSort issues that warning whenever it gets a null 
sort.  Perhaps that interaction should be altered, or perhaps it should be left 
in as a sort of are you sure you want to tell me what I already know?, er, 
warning.  But as it stands you can simply ignore it, or else leave the sort off 
entirely when it is score desc; if the behavior were different in those two 
cases it would certainly be a bug, but as you noted that's not the case.

- J.J.

At 10:50 AM -0400 6/21/07, gerard sychay wrote:
Hello all,

This is a minor issue and does not affect Solr operation, but I could not find 
it in the issue tracking.

To reproduce:

- I set up a Solr server with the example docs indexed by following the Solr 
tutorial.

- I clicked on the following example search under the Sorting section:

http://localhost:8983/solr/select/?indent=onq=videosort=score+desc

- I added a qt parameter to try out the DisMax Request Handler:

http://localhost:8983/solr/select/?indent=onq=videosort=score+descqt=dismax

- In the Solr output, I get:

WARNING: Invalid sort score desc was specified, ignoring Jun 21, 2007 
10:33:37 AM org.apache.solr.core.SolrCore execute
INFO: /select/ sort=score+descindent=onqt=dismaxq=video 0 131

The WARNING line is the issue. It does not seem that it should be there. But 
as I said, it does not appear to affect operation as the results are sorted by 
score descending anyway (because that is the default?).



Re: DismaxRequestHandler reports sort by score as invalid

2007-06-21 Thread Yonik Seeley

A little background:
I originally conceived of query operation chains (based on some of my
previous hacking in mechanical investing stock screens: select all
stocks; take top 10% lowest PE; then take the top 20 highest growth
rate; then sort descending by 13 week relative strength).

So, I thought that the next thing after a query *might* be a sort, so
getSort() shouldn't throw an exception if it wasn't.  I think this
idea is now outdated (we know when we have a sort spec) and an
exception should just be thrown on a syntax error.

-Yonik

On 6/21/07, J.J. Larrea [EMAIL PROTECTED] wrote:

Because score desc is the default Lucene  Solr behavior when no explicit sort is specified, 
QueryParsing.parseSort() returns a null sort so that the non-sort versions of the query execution routines get 
called.  However the caller SolrPluginUtils.parseSort issues that warning whenever it gets a null sort.  Perhaps 
that interaction should be altered, or perhaps it should be left in as a sort of are you sure you want to 
tell me what I already know?, er, warning.  But as it stands you can simply ignore it, or else leave the 
sort off entirely when it is score desc; if the behavior were different in those two cases it would 
certainly be a bug, but as you noted that's not the case.

- J.J.

At 10:50 AM -0400 6/21/07, gerard sychay wrote:
Hello all,

This is a minor issue and does not affect Solr operation, but I could not find 
it in the issue tracking.

To reproduce:

- I set up a Solr server with the example docs indexed by following the Solr 
tutorial.

- I clicked on the following example search under the Sorting section:

http://localhost:8983/solr/select/?indent=onq=videosort=score+desc

- I added a qt parameter to try out the DisMax Request Handler:

http://localhost:8983/solr/select/?indent=onq=videosort=score+descqt=dismax

- In the Solr output, I get:

WARNING: Invalid sort score desc was specified, ignoring Jun 21, 2007 
10:33:37 AM org.apache.solr.core.SolrCore execute
INFO: /select/ sort=score+descindent=onqt=dismaxq=video 0 131

The WARNING line is the issue. It does not seem that it should be there. But 
as I said, it does not appear to affect operation as the results are sorted by 
score descending anyway (because that is the default?).




Re: Multiple doc types in schema

2007-06-21 Thread Frédéric Glorieux


Hi Sonic,


I will also need multiple indexes searches,


Do you mean:



2) Multiple indexes with different schemas, search will search across
all or some subset and combine the results (federated search)


Exactly that. I'm comming from a quite old lucene based project, called SDX
http://www.nongnu.org/sdx/docs/html/doc-sdx2/en/presentation/bases.html. 
Sorry for the link, the project is mainly documented in french. The 
framework is cocoon base, maybe heavy now. It allows to host multiple 
applications, with multiple bases, a base is a kind of Solr Schema, 
in 2000.


From this experience, I can say cross search between different schemas 
is possible, and users may find it important. Take for example a 
library. They have different collections, lets say : csv records 
obtained from digitized photos, a light model, no write waited ; and a 
complex librarian model documented every day. These collections share at 
least a title and author field, and should be opened behind the same 
form for public ; but each one should have also its own application, 
according to its information model.


With the SDX framework upper, I know real life applications with 30 
lucene indexes. It's possible, because lucene allow it (MultiReader) 
http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/MultiReader.html.



--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


 1) Multiple unrelated indexes with different schemas, that you will
 search separately... but you just want them in the same JVM for some
 reason.



3) Multiple indexes with the same schema, each index is a shard that
contains part of the total collection.  Search will merge results
across all shards to give appearance of a single large collection
(distributed search).

-Yonik





Re: Multiple doc types in schema

2007-06-21 Thread Yonik Seeley

On 6/21/07, Frédéric Glorieux [EMAIL PROTECTED] wrote:

 I will also need multiple indexes searches,

 Do you mean:

 2) Multiple indexes with different schemas, search will search across
 all or some subset and combine the results (federated search)

Exactly that. I'm comming from a quite old lucene based project, called SDX
http://www.nongnu.org/sdx/docs/html/doc-sdx2/en/presentation/bases.html.
Sorry for the link, the project is mainly documented in french. The
framework is cocoon base, maybe heavy now. It allows to host multiple
applications, with multiple bases, a base is a kind of Solr Schema,
in 2000.

 From this experience, I can say cross search between different schemas
is possible, and users may find it important. Take for example a
library. They have different collections, lets say : csv records
obtained from digitized photos, a light model, no write waited ; and a
complex librarian model documented every day. These collections share at
least a title and author field, and should be opened behind the same
form for public ; but each one should have also its own application,
according to its information model.


This doesn't sound like true federated search, since you have a number
of fields that are the same in each index that you search across, and
you treat them all the same.  This is functionally equivalent to
having a single schema and a single index.  You can still have
multiple applications that query the single collection differently.

Depending on update patterns and index sizes, you can probably get
better efficiency with multiple indexes, but not really more
functionality (in your case), right?

-Yonik


commit script with solr 1.2 response format

2007-06-21 Thread Ryan McKinley

I just started running the scripts and

The commit script seems to run fine, but it says there was an error.  I 
looked into it, and the scripts expect 1.1 style response:


  result status=0/result

1.2 /update returns:

  ?xml version=1.0 encoding=UTF-8?
  response
  lst name=responseHeader
int name=status0/int
int name=QTime44/int
  /lst
  /response


ryan


Re: Multiple doc types in schema

2007-06-21 Thread Frédéric Glorieux

Thanks Yonik to share your reflexion,

This doesn't sound like true federated search, 


I'm affraid to not understand federated search, you seems to have a 
precise idea behind the head.



since you have a number
of fields that are the same in each index that you search across, and
you treat them all the same.  This is functionally equivalent to
having a single schema and a single index.  You can still have
multiple applications that query the single collection differently.


Before a pointer or a web example from you, what you describe seems to 
me like implement a complete database with a single table (not easy to 
understand and maintain, but possible). To my experience, a collection 
is a schema, with thousands or millions XML documents, could be 10, 20 
or more fields, and search configuration is generated from a kind of 
data schema (there's no real standard for explaining for example, that a 
title or a subject need one field for exact match, and another for word 
search). If an index was too big (hopefully I never touch this limit 
with lucene), I guess there are solutions. My problem is to maintain 
different collections with each their intellectual logic, some shared 
FieldNames, like Dublin Core, or at least fulltext, but also specific 
for each ones.



Depending on update patterns and index sizes, you can probably get
better efficiency with multiple indexes, but not really more
functionality (in your case), right?


Maybe let it understandable could be accepted as a functionality ? 
Perhaps less now, but it was a time when lucene index could become 
corrupted, so that separate them was important.


I guess that those specific problems will not be Solr priorities, but 
till I have been corrected, I'm still feeling that multiple indexes are 
useful.



--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Re: Multiple doc types in schema

2007-06-21 Thread Frédéric Glorieux


After further reading, especially 
http://people.apache.org/~hossman/apachecon2006us/faceted-searching-with-solr.pdf

(Thanks Hoss)


Depending on update patterns and index sizes, you can probably get
better efficiency with multiple indexes, but not really more
functionality (in your case), right?


Maybe I'm approaching your point of view : Loose Schema with Dynamic 
Fields, this is probably my solution. There's something strange to me 
to consider a lucene index as a blob, but if it works for bigger than 
me, I should follow. So, it means one fieldtype by analyzer, and the 
datamodel logic is only from the collection side. I think I got my idea 
for september, but I would be very glad if you have something to add.


--
Frédéric Glorieux
École nationale des chartes
direction des nouvelles technologies et de l'informatique


Re: commit script with solr 1.2 response format

2007-06-21 Thread James liu

aha,,same question i found few days ago.

i m sorry to forget submit it.

2007/6/22, Yonik Seeley [EMAIL PROTECTED]:


On 6/21/07, Ryan McKinley [EMAIL PROTECTED] wrote:
 I just started running the scripts and

 The commit script seems to run fine, but it says there was an error.  I
 looked into it, and the scripts expect 1.1 style response:

result status=0/result

 1.2 /update returns:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
  int name=status0/int
  int name=QTime44/int
/lst
/response

I guess we should look for 'status=0' ?

Or,  if you get a response code of 200, it's a success unless
you see status=nonzero

-Yonik





--
regards
jl


Re: commit script with solr 1.2 response format

2007-06-21 Thread Chris Hostetter

: I guess we should look for 'status=0' ?

that wouldn't quite work.

: Or,  if you get a response code of 200, it's a success unless
: you see status=nonzero

we could always make it an option in the scripts.conf file -- what
substring to match on ... just in case people want to write their own
crazy commit handler and still use the script ... but that may be
overkill.



-Hoss



RE: Faceted Search!

2007-06-21 Thread Chris Hostetter

: generating XML feed file and feeding to the Solr server.  However, I was
: also looking into implementing having sub-categories within the
: categories if that make sense.  For example, in the shopper.com we have
: the categories of by price, manufactures and so on and with in them,they
: are sub categories (price is sub-cat into $100, 100-200, 200-300 etc).
: I don't have constraint in terms of technology.  If I have to implement
: db server I won't mind implementing it.  Anyway, plz shine a light on
: how would you handle this issue.  Any suggestion will be appericated.

the shopper.com solution is very VERY specialized and specific to the
datamodel used to manage the category metadata  if i had to do it
overagain i would do it a lot differnetly.

way way back there was a thread about complex faceting where i included
some ideas on a possible facet configuration xml syntax which could
then be parsed by a request handler, with different types of faceting
(simple query, ranges, based on terms, prefix) delegated to helper
classes.  there was also the idea of being able groups facets or make
facets depend on other facets (ie: don't show the author facet untill a
value has been picked from the author_initial facet)

nothing ever really came of it, but it's how i'd probably approach trying
to tackle something like the shopper.com functionality if CNET threw away
our product metadata data model and started from scratch.

http://www.nabble.com/metadata-about-result-sets--t1243321.html#a3334244



-Hoss