Re: All facet.fields for a given facet.query?

2007-06-20 Thread Martin Grotzke
On Tue, 2007-06-19 at 11:09 -0700, Chris Hostetter wrote:
 I solve this problem by having metadata stored in my index which tells
 my custom request handler what fields to facet on for each category ...
How do you define this metadata?

Cheers,
Martin


 but i've also got several thousand categories.  If you've got less then
 100 categories, you could easily enumerate them all with default
 facet.field params in your solrconfig using seperate requesthandler
 instances.
 
 : What do the experts think about this?
 
 you may want to read up on the past discussion of this in SOLR-247 ... in
 particular note the link to the mail archive where there was assitional
 discussion about it as well.  Where we left things is that it
 might make sense to support true globging in both fl and facet.field, so
 you can use naming conventions and say things like facet.field=facet_*
 but that in general trying to do something like facet.field=* would be a
 very bad idea even if it was supported.
 
 http://issues.apache.org/jira/browse/SOLR-247
 
 
 -Hoss
 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/


signature.asc
Description: This is a digitally signed message part


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Martin Grotzke
On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
 Hi,
 
 I'm also just at that point where I think I need a wildcard facet.field 
 parameter (or someone points out another solution for my problem...). 
 Here is my situation:
 
 I have many products of different types with totally different 
 attributes. There are currently more than 300 attributes
 I use dynamic fields to import the attributes into solr without having 
 to define a specific field for each attribute. Now when I make a query I 
 would like to get back all facet.fields that are relevant for that query.
 
 I think it would be really nice, if I don't have to know which facets 
 fields are there at query time, instead just import attributes into 
 dynamic fields, get the relevant facets back and decide in the frontend 
 which to display and how...
Do you really need all facets in the frontend?

Would it be a solution to have a facet ranking in the field definitions,
and then decide at query time, on which fields to facet on? This would
need an additional query parameter like facet.query.count.

E.g. if you have a query with q=foo+AND+prop1:bar+AND+prop2:baz
and you have fields
prop1 with facet-ranking 100
prop2 with facet-ranking 90
prop3 with facet-ranking 80
prop4 with facet-ranking 70
prop5 with facet-ranking 60

then you might decide not to facet on prop1 and prop2 as you have
already a constraint on it, but to facet on prop3 and prop4 if
facet.query.count is 2.

Just thinking about that... :)

Cheers,
Martin


 
 What do the experts think about this?
 
 Tom
 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/


signature.asc
Description: This is a digitally signed message part


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Thomas Traeger

Martin Grotzke schrieb:

On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
  

Hi,

I'm also just at that point where I think I need a wildcard facet.field 
parameter (or someone points out another solution for my problem...). 
Here is my situation:


I have many products of different types with totally different 
attributes. There are currently more than 300 attributes
I use dynamic fields to import the attributes into solr without having 
to define a specific field for each attribute. Now when I make a query I 
would like to get back all facet.fields that are relevant for that query.


I think it would be really nice, if I don't have to know which facets 
fields are there at query time, instead just import attributes into 
dynamic fields, get the relevant facets back and decide in the frontend 
which to display and how...


Do you really need all facets in the frontend?
  

no, only the subset with matches for the current query.

Would it be a solution to have a facet ranking in the field definitions,
and then decide at query time, on which fields to facet on? This would
need an additional query parameter like facet.query.count.

E.g. if you have a query with q=foo+AND+prop1:bar+AND+prop2:baz
and you have fields
prop1 with facet-ranking 100
prop2 with facet-ranking 90
prop3 with facet-ranking 80
prop4 with facet-ranking 70
prop5 with facet-ranking 60

then you might decide not to facet on prop1 and prop2 as you have
already a constraint on it, but to facet on prop3 and prop4 if
facet.query.count is 2.

Just thinking about that... :)

Cheers,
Martin

  
One step after the other ;o), the ranking of the facets will be another 
problem I have to solve, counts of facets and matching documents will be 
a starting point. Another idea is to use the score of the documents 
returned by the query to compute a score for the facet.field...


Tom


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Martin Grotzke
On Wed, 2007-06-20 at 12:59 +0200, Thomas Traeger wrote:
 Martin Grotzke schrieb:
  On Tue, 2007-06-19 at 19:16 +0200, Thomas Traeger wrote:
[...]
  I think it would be really nice, if I don't have to know which facets 
  fields are there at query time, instead just import attributes into 
  dynamic fields, get the relevant facets back and decide in the frontend 
  which to display and how...
  
  Do you really need all facets in the frontend?

 no, only the subset with matches for the current query.
ok, that's somehow similar to our requirement, but we want to get only
e.g. the first 5 relevant facets back from solr and not handle this
in the frontend.

  Would it be a solution to have a facet ranking in the field definitions,
  and then decide at query time, on which fields to facet on? This would
  need an additional query parameter like facet.query.count.
[...]

 One step after the other ;o), the ranking of the facets will be another 
 problem I have to solve, counts of facets and matching documents will be 
 a starting point. Another idea is to use the score of the documents 
 returned by the query to compute a score for the facet.field...
Yep, this is also different for different applications.

I'm also interested in this problem and would like to help solving
this problem (though I'm really new to lucene and solr)...

Cheers,
Martin


 
 Tom
 
-- 
Martin Grotzke
http://www.javakaffee.de/blog/


signature.asc
Description: This is a digitally signed message part


Re: problems getting data into solr index

2007-06-20 Thread Brian Whitman
Mike is talking about solr.py, the python script, I'm talking about  
Solr itself.
I think your problem is in the former. You should play around with  
unicode in python for awhile. Remember that your terminal itself  
probably doesn't support utf-8, the biggest problem I run into is doing


 print utf8string

Python forces you to be good about this stuff, but it's a steep  
climb. Google for python unicode and read the various tutorials to  
get a handle on it.


-b


On Jun 20, 2007, at 9:38 AM, vanderkerkoff wrote:



Hello Mike, Brian

My brain is approcahing saturation point and I'm reading these two  
opinoins

as opposing each other.

I'm sure I'm reading it incorrectly, but they seem to contradict  
each other.


Are they?


Brian Whitman wrote:


Solr has no problems with proper utf8 and you don't need to do
anything special to get it to work. Check out the newer solr.py in  
JIRA.





Mike Klaas wrote:


Perhaps this is why: solr.py expects unicode.  You can pass it ascii,
and it will transparently convert to unicode fine because that is the
default codec.  If you end up with utf-8, it will try to convert to
unicode using the ascii codec and fail.



--
View this message in context: http://www.nabble.com/problems- 
getting-data-into-solr-index-tf3915542.html#a11213488

Sent from the Solr - User mailing list archive at Nabble.com.



--
http://variogr.am/
[EMAIL PROTECTED]





snapinstaller safety

2007-06-20 Thread Otis Gospodnetic
Hi,

Looking at src/scripts/snapinstaller more closely, I saw this block of code:

# install using hard links into temporary directory
# remove original index and then atomically copy new one into place
logMessage installing snapshot ${name}
cp -lr ${name}/ ${data_dir}/index.tmp$$
/bin/rm -rf ${data_dir}/index
mv -f ${data_dir}/index.tmp$$ ${data_dir}/index


Is there a technical reason why this wasn't written as:

logMessage installing snapshot ${name}

cp -lr ${name}/ ${data_dir}/index.tmp$$  \

/bin/rm -rf ${data_dir}/index  \

mv -f ${data_dir}/index.tmp$$ ${data_dir}/index

This feels a little safer to me - I'd hate to have the main index rm -rf-ed if 
the cp -lr command failed for some reason (e.g. disk full), but maybe Bill Au  
Co. have a good reason for not using 's.  There may be other places in 
various scripts that this might be applicable to, but this is the first place I 
saw the extra safety possibility.

Thanks,
Otis





Slave/Master swap

2007-06-20 Thread Otis Gospodnetic
Hi,



I saw https://issues.apache.org/jira/browse/SOLR-265 (Make IndexSchema 
updateable in live system) which made me think of something I wished for a 
while back.

Having a single Solr Master and a couple of Solr Slaves is a common setup.  If 
any of the Slaves fails, a decent LB knows not to talk to it until it's back 
up.  What happens when the single Solr Master fails?  One (cheap) way to deal 
with that might be to promote one of the Solr Slaves to the new Master role.  
If the snapshooter script is called manually on the Master, the appropriate 
monitoring tools would need to start the same calls on the new Master (former 
Slave) box.  But if the snapshooter is configured via solrconfig.xml to run 
after commit and/or optimize, we'd have to swap solrconfig.xml and restart Solr 
on the ex-Slave to make it the new Master (and also make some changes in the LB 
VIPs, most likely).

I'm wondering if there are slicker ways to do this, ways that would minimize 
the downtime, for instance.  Perhaps, just like Will Johnson is trying to make 
IndexSchema updateable in a live system, the snapshooter could be turned on/off 
programatically, say via a special request handler.

Thanks,
Otis





Re: SolrSharp example

2007-06-20 Thread Jeff Rodenburg

Hi Michael -

Moving this conversations to the general solr mailing list...



 1. SolrSharp example solution works with schema.xml from

apache-solr-1.1.0-incubating.If I'm using schema.xml from
apache-solr-1.2.0 example program doesn't update index...

I didn't realize the solr 1.2 release code sample schema.xml was different
from the solr 1.1 version.  In my implementation, I had solr 1.1 already
installed and upgraded to 1.2 by replacing the war file (per the
instructions in solr.)  So, the example code is geared to go against
the 1.1schema.

For the example code, adding the timestamp field in the
ExampleIndexDocument public constructor such as:

   this.Add(new IndexFieldValue(timestamp, DateTime.Now.ToString
(s)+Z)));

will take care of the solr 1.2 schema invalidation issue.

The addition of the @default attribute on this field in the schema is not
presently accommodated in the validation routine.  If I'm not mistaken, the
default attribute value will be applied for all documents without that field
present in the xml payload.  This would imply that any field with a default
attribute is not required for any implemented UpdateIndexDocument.  I'll
look into this further.



2. When I run example with schema.xml from

apache-solr-1.1.0-incubating program
throw Exception

Hmmm, can't really help you with this one.  It sounds as if solr is
incurring an error when the xml is posted to the server.  Try the standard
step-through troubleshooting routines to see what messages are being passed
back from the server.



-- j







On 6/19/07, Michael Plax [EMAIL PROTECTED] wrote:


 Hello Jeff,

thank you again for updating files.
I just run with some  problems. I don't know what is the best way to
report them solr maillist/solrsharp jira.

1.
SolrSharp example solution works with schema.xml from
apache-solr-1.1.0-incubating.
   If I'm using schema.xml from apache-solr-1.2.0 example program doesn't
update index because:

   line 33: if (solrSearcher.SolrSchema.IsValidUpdateIndexDocument(iDoc))
return false.
   update falls because of configuration file

schema.xml file:

line 265: field name=word type=string indexed=true
stored=true/
...
line 279:field name=timestamp type=date indexed=true
stored=true default=NOW multiValued=false/

those fields word, timestamp don't pass validation in SolrSchema.csline 217.

2.
When I run example with schema.xml from apache-solr-1.1.0-incubating
program throw Exception

System.Exception was unhandled
  Message=Http error in request/response to
http://localhost:8983/solr/update/;
  Source=SolrSharp
  StackTrace:
   at org.apache.solr.SolrSharp.Configuration.SolrSearcher.WebPost(String
url, Byte[] bytesToPost, String statusDescription) in
E:\SOLR-CSharp\src\Configuration\SolrSearcher.cs:line 229
   at org.apache.solr.SolrSharp.Update.SolrUpdater.PostToIndex(IndexDocument
oDoc, Boolean bCommit) in E:\SOLR-CSharp\src\Update\SolrUpdater.cs:line 70
   at SolrSharpExample.Program.Main(String[] args) in
E:\SOLR-CSharp\example\Program.cs:line 35
   at System.AppDomain.nExecuteAssembly(Assembly assembly, String[]
args)
   at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence
assemblySecurity, String[] args)
   at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly
()
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.Run(ExecutionContext
executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()

xmlstring value from oDoc.SerializeToString()

?xml version=\1.0\ encoding=\utf-8\?add xmlns:xsi=\
http://www.w3.org/2001/XMLSchema-instance\http://www.w3.org/2001/XMLSchema-instance%5C
xmlns:xsd=\http://www.w3.org/2001/XMLSchema\;docfieldhttp://www.w3.org/2001/XMLSchema%5C%22%3E%3Cdoc%3E%3Cfieldname=\id\101/fieldfield
 name=\name\One oh one/fieldfield
name=\manu\Sony/fieldfield name=\cat\Electronics/fieldfield
name=\cat\Computer/fieldfield name=\features\Good/fieldfield
name=\features\Fast/fieldfield name=\features\Cheap/fieldfield
name=\includes\USB cable/fieldfield name=\weight\1.234/fieldfield
name=\price\99.99/fieldfield name=\popularity\1/fieldfield
name=\inStock\True/field/doc/add

I checked  all features from Solr tutorial, they are working. I'm running
solr on Windows XP Pro without firewall.

Do you know how to solve those problems? Do you recommend to handle all
communication by maillist/jira ?

Regards
Michael




page rank

2007-06-20 Thread David Xiao
Hello folks,

 

I am using solr to index web contents. I want to know is that possible to tell 
solr about rank information of contents?

For example, I give each content an integer number.

 

And I hope solr take this number into consideration when it generates search 
result. (larger number, more priority)

 

Best Regards,

David



Re: page rank

2007-06-20 Thread Daniel Alheiros
Hi David.

Yes you can. 

Just define a field as a slong type field:

field name=numberField type=slong /

It can be used to sort (sort=numberField desc) or to boost your score (it
will depend on the RequestHandler you are going to use).

In terms of score which RequestHandler are you planning to use?
If using dismax you can define a boost function:
recip(rord(numberField),1,1000,1000)

I hope it helps.

Regards,
Daniel Alheiros

On 20/6/07 16:47, David Xiao [EMAIL PROTECTED] wrote:

 Hello folks,
 
  
 
 I am using solr to index web contents. I want to know is that possible to tell
 solr about rank information of contents?
 
 For example, I give each content an integer number.
 
  
 
 And I hope solr take this number into consideration when it generates search
 result. (larger number, more priority)
 
  
 
 Best Regards,
 
 David
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: Slave/Master swap

2007-06-20 Thread Chris Hostetter

: I'm wondering if there are slicker ways to do this, ways that would
: minimize the downtime, for instance.  Perhaps, just like Will Johnson is
: trying to make IndexSchema updateable in a live system, the snapshooter
: could be turned on/off programatically, say via a special request
: handler.

and easy way to do that would be to modify the configuration of
RunExecutableListener in the solrconfig.xml to execute a wrapper script
around snapshooter that only runs it if a flag file exists on disk.

the problem is there are other things you typically want differnet between
a master and a slave ... uses of the QuerySenderListener (it could also be
modified to check for flag file i suppose)m cache sizes, and cache
autowarming.



-Hoss



Re: Faceted Search!

2007-06-20 Thread Chris Hostetter

: Thanks Chris for replying my question.  So I'm thinking about using a
: CMS and when somebody publishes a page in CMS, I would generated this
: well structure XML file and feed that xml to Solr to generate the index
: on those data. Then, I can simply do faceted search using the correct
: Lucene query format, rite?  Do you have any other ideas or comment on my
: CMS approach?

that sounds fine ... as long as you have well structured data and you
aren't trying to extract it from unstructured HTML.



-Hoss



Re: All facet.fields for a given facet.query?

2007-06-20 Thread Chris Hostetter
: to make it clear, i agree that it doesn't make sense faceting on all
: available fields, I only want faceting on those 300 attributes that are
: stored together with the fields for full text searches. A
: product/document has typically only 5-10 attributes.
:
: I like to decide at index time which attributes of a product might be of
: interest for faceting and store those in dynamic fields with the
: attribute-name and some kind of prefix or suffix to identify them at
: query time as facet.fields. Exactly the naming convention you mentioned.

but if the facet fields are different for every document, and they use a
simple dynamicField prefix (like facet_* for example) how do you know at
query time which fields to facet on? ... even if wildcards work in
facet.field, usingfacet.field=facet_* would require solr to compute the
counts for *every* field matching that pattern to find out which ones have
positive counts for the current result set -- there may only be 5 that
actually matter, but it's got to try all 300 of them to find out which 5
that is.

this is where custom request handlers that understand that faceting
metadata for your documents becomes key ... so you can say when
querying across the entire collection, only try to facet on category and
manufacturer.  if the search is constrained by category, then lookup other
facet options to offer based on that category name from our metadata
store, etc...



-Hoss



Re: Multi-language indexing and searching

2007-06-20 Thread Chris Hostetter

: So far it sounds good for my needs, now I'm going to try if my other
: features still work (I'm worried about highlighting as I'm going to return a
: different field)...

i'm not really a highlighting guy so i'm not sure ... but if you're okay
with *simple* highlighting you can probably just highlight your title
field (using a whitespace analyzer or something) and get decent results
without needing to worry about the fact that you are using differnet
langauges.



-Hoss



Re: Faceted Search!

2007-06-20 Thread niraj tulachan
Hi Chris,
thank you for the reply.  I was reading other posting regarding faceted 
search and seems like they are using the filtering capability of Lucene for 
that.  If that the case, can we have control over the label of categories?  
For example: in shopper.com when we search for camera gives us the cluster by 
price, pixal, manufacture and so on.  and if we are feeding the xml file to 
Solr server for faceted search, how can we define the sub-categories.  let's 
say from the above example, the category price has different sub-categories 
like less than 100 ,100-200?  I'm guessing, we explicit define this in XML 
feed file, but I could be very wrong.  In any case, can you please give me the 
short example achieve that implementation.  Well, thanks once again.
  Cheers,
  Niraj

Chris Hostetter [EMAIL PROTECTED] wrote:
  
: Thanks Chris for replying my question. So I'm thinking about using a
: CMS and when somebody publishes a page in CMS, I would generated this
: well structure XML file and feed that xml to Solr to generate the index
: on those data. Then, I can simply do faceted search using the correct
: Lucene query format, rite? Do you have any other ideas or comment on my
: CMS approach?

that sounds fine ... as long as you have well structured data and you
aren't trying to extract it from unstructured HTML.



-Hoss



 
-
Food fight? Enjoy some healthy debate
in the Yahoo! Answers Food  Drink QA.

Re: SolrSharp example

2007-06-20 Thread Jeff Rodenburg

On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/20/07, Michael Plax [EMAIL PROTECTED] wrote:
 This is a log that I got after runnning SolrSharp example. I think
example
 program posts not properly formatted xml.
 I'm running Solr on Windows XP, Java 1.5. Are those settings could be
the
 problem?

Solr1.2 is pickier about the Content-type in the HTTP headers.
I bet it's being set incorrectly.




Ahh, good point.  Within SolrSearcher.cs, the WebPost method contains this
setting:

oRequest.ContentType = application/x-www-form-urlencoded;

Looking through the CHANGES.txt file in the 1.2 tagged release on svn:

9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using
the new request dispatcher (SOLR-104). This requires posted content to have
a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'.  The
response format matches that of /select and returns standard error codes. To
enable solr1.1 style /update, do not map /update to any handler in
solrconfig.xml (ryan)

For SolrSearcher.cs, it sounds as though changing the ContentType setting to
text/xml may fix this issue.

I don't have a 1.2 instance to test this against available to me right now,
but can check this later.  Michael, try updating your SolrSearcher.cs file
for this content-type setting to see if that resolves your issue.


thanks,
jeff r.


Re: problems getting data into solr index

2007-06-20 Thread Mike Klaas



On 20-Jun-07, at 6:38 AM, vanderkerkoff wrote:



Hello Mike, Brian

My brain is approcahing saturation point and I'm reading these two  
opinoins

as opposing each other.

I'm sure I'm reading it incorrectly, but they seem to contradict  
each other.


Are they?


solr.py takes unicode and encodes it as utf-8 to send to Solr.

-Mike


Re: Faceted Search!

2007-06-20 Thread Chris Hostetter

: define the sub-categories.  let's say from the above example, the
: category price has different sub-categories like less than 100
: ,100-200?  I'm guessing, we explicit define this in XML feed file, but
: I could be very wrong.  In any case, can you please give me the short
: example achieve that implementation.  Well, thanks once again.

there's nothing out of the box from Solrthat will do this, it's
something you would need to implement either in the lcient or in a custom
request handler ... Solr's Simple Faceting support is esigned to be just
that: simple.  but the underlying methods/mechanisms of computing DocSet
intersetions can be used by any custom requets handler to generate
application specific results.

I've got 3 or 4 indexes that use the out of the box SimpleFacet support
Solr provides, but the major faceting we do (product based facets) all
uses custom request handlers so we can have very exact control on all of
this kind of stuff driven by our data management tools.



-Hoss



Re: Slave/Master swap

2007-06-20 Thread Otis Gospodnetic
Hi,

Yes, I thought of flag file + wrapper script tricks, but that didn't sound 
super elegant either, and the other differences in behaviour between master and 
slave are also true.

H, I've always wanted to try DRDB (http://www.drbd.org/).  
Master-(Master+Slaves) replication via DRDB?  I imagine it would be 
expensive...

So if I want to turn a Slave into a Master, the best thing to do is to swap 
solrconfigs and restart the ex-Slave to turn it into a Master.  The more 
expensive solution might be to have Solr instances run on top of a SAN and then 
one could really have multiple Master instances, one in stand-by mode and ready 
to be started as the new Master if the current Master decides to go on 
vacation.  Any flaws there?  Out of curiosity, how does CNet handle Master 
redundancy?

Otis


- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, June 20, 2007 9:40:51 PM
Subject: Re: Slave/Master swap


: I'm wondering if there are slicker ways to do this, ways that would
: minimize the downtime, for instance.  Perhaps, just like Will Johnson is
: trying to make IndexSchema updateable in a live system, the snapshooter
: could be turned on/off programatically, say via a special request
: handler.

and easy way to do that would be to modify the configuration of
RunExecutableListener in the solrconfig.xml to execute a wrapper script
around snapshooter that only runs it if a flag file exists on disk.

the problem is there are other things you typically want differnet between
a master and a slave ... uses of the QuerySenderListener (it could also be
modified to check for flag file i suppose)m cache sizes, and cache
autowarming.



-Hoss






Re: Slave/Master swap

2007-06-20 Thread Chris Hostetter


: The more expensive solution might be to have Solr instances run on top
: of a SAN and then one could really have multiple Master instances, one
: in stand-by mode and ready to be started as the new Master if the

i *believe* that if you have two solr isntances pointed at the same
physical data directory (SAN or otherwise) but you only send update/commit
commands to one, they won't interfere with eachother.  so concievable you
can have both masters up and running and your failover approach if the
primary goes down is just to start sending updates to the secondary.
you'll loose any unflushed changes that hte primary had in memory, but
those are lost anyway.

don't trust me on that though, test it out yourself.

: curiosity, how does CNet handle Master redundancy?

I don't know how much i'm allowed to talk about our processes and systems
for redundency, disastery recovery, fallover, etc... but i don't think
i'll upset anyone if i tell you: as far as i know, we've never needed to
take advantage of them with a solr master.  ie: we've never had a solr
master crash so hard we had to bring up another one in it's place ...
knock on wood.  (that probably has more to do with having good hardware
then anything else though).

(and no, i honestly don't know what hardware we use ... i don't bother
paying attention, i let hte hardware guys worry about that)


-Hoss



Re: Slave/Master swap

2007-06-20 Thread Otis Gospodnetic
Right, that SAN con 2 Masters sounds good.  Lucky you with your lonely Master!  
Where I work hw failures are pretty common.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, June 20, 2007 11:43:02 PM
Subject: Re: Slave/Master swap



: The more expensive solution might be to have Solr instances run on top
: of a SAN and then one could really have multiple Master instances, one
: in stand-by mode and ready to be started as the new Master if the

i *believe* that if you have two solr isntances pointed at the same
physical data directory (SAN or otherwise) but you only send update/commit
commands to one, they won't interfere with eachother.  so concievable you
can have both masters up and running and your failover approach if the
primary goes down is just to start sending updates to the secondary.
you'll loose any unflushed changes that hte primary had in memory, but
those are lost anyway.

don't trust me on that though, test it out yourself.

: curiosity, how does CNet handle Master redundancy?

I don't know how much i'm allowed to talk about our processes and systems
for redundency, disastery recovery, fallover, etc... but i don't think
i'll upset anyone if i tell you: as far as i know, we've never needed to
take advantage of them with a solr master.  ie: we've never had a solr
master crash so hard we had to bring up another one in it's place ...
knock on wood.  (that probably has more to do with having good hardware
then anything else though).

(and no, i honestly don't know what hardware we use ... i don't bother
paying attention, i let hte hardware guys worry about that)


-Hoss






Re: All facet.fields for a given facet.query?

2007-06-20 Thread Thomas Traeger

Chris Hostetter schrieb:

: to make it clear, i agree that it doesn't make sense faceting on all
: available fields, I only want faceting on those 300 attributes that are
: stored together with the fields for full text searches. A
: product/document has typically only 5-10 attributes.
:
: I like to decide at index time which attributes of a product might be of
: interest for faceting and store those in dynamic fields with the
: attribute-name and some kind of prefix or suffix to identify them at
: query time as facet.fields. Exactly the naming convention you mentioned.

but if the facet fields are different for every document, and they use a
simple dynamicField prefix (like facet_* for example) how do you know at
query time which fields to facet on? ... even if wildcards work in
facet.field, usingfacet.field=facet_* would require solr to compute the
counts for *every* field matching that pattern to find out which ones have
positive counts for the current result set -- there may only be 5 that
actually matter, but it's got to try all 300 of them to find out which 5
that is.
I just made a quick test by building a facet query with those 300 
attributes.
I realized, that the facets are build out of the whole index, not the 
subset

returned by the initial query. Therefore I have a large number of empty
facets which I simply ignore. In my case the QueryTime is somewhat 
higher (of

course) but it is still at some milliseconds. (wow!!!) :o)

So at this state of my investigation and in my use case I don't have to 
worry
about performance even if I use the system in a way that uses more 
resources

than necessary.

this is where custom request handlers that understand that faceting
metadata for your documents becomes key ... so you can say when
querying across the entire collection, only try to facet on category and
manufacturer.  if the search is constrained by category, then lookup other
facet options to offer based on that category name from our metadata
store, etc...

Faceting on manufacturers and categories first and than present the
corresponding facets might be used under some circumstances, but in my case
the category structure is quite deep, detailed and complex. So when
the user enters a query I like to say to him Look, here are the
manufacturers and categories with matches to your query, choose one if you
want, but maybe there is another one with products that better fit your
needs or products that you didn't even know about. So maybe you like to
filter based on the following attributes. Something like this ;o)

The point is, that i currently don't want to know too much about the data,
I just want to feed it into solr, follow some conventions and get the most
out of it as quickly as possible. Optimizations can and will take place at
a later time.

I hope to find some time to dig into solr SimpleFacets this weekend.

Regards,

Tom


Re: page rank

2007-06-20 Thread Nick Jenkin

Also if you are using the standard request handler you can use the val hack:

foo:bar _val_:recip(rord(numberField),1,1000,1000)

You can find more info about this here:
http://wiki.apache.org/solr/FunctionQuery

-Nick

On 6/21/07, Daniel Alheiros [EMAIL PROTECTED] wrote:

Hi David.

Yes you can.

Just define a field as a slong type field:

field name=numberField type=slong /

It can be used to sort (sort=numberField desc) or to boost your score (it
will depend on the RequestHandler you are going to use).

In terms of score which RequestHandler are you planning to use?
If using dismax you can define a boost function:
recip(rord(numberField),1,1000,1000)

I hope it helps.

Regards,
Daniel Alheiros

On 20/6/07 16:47, David Xiao [EMAIL PROTECTED] wrote:

 Hello folks,



 I am using solr to index web contents. I want to know is that possible to tell
 solr about rank information of contents?

 For example, I give each content an integer number.



 And I hope solr take this number into consideration when it generates search
 result. (larger number, more priority)



 Best Regards,

 David



http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.




Re: Multiple doc types in schema

2007-06-20 Thread Otis Gospodnetic
This sounds like a potentially good use-case for SOLR-215!
See https://issues.apache.org/jira/browse/SOLR-215

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org; Jack L [EMAIL PROTECTED]
Sent: Wednesday, June 6, 2007 6:58:10 AM
Subject: Re: Multiple doc types in schema


: This is based on my understanding that solr/lucene does not
: have the concept of document type. It only sees fields.
:
: Is my understanding correct?

it is.

: It seems a bit unclean to mix fields of all document types
: in the same schema though. Or, is there a way to allow multiple
: document types in the schema, and specify what type to use
: when indexing and searching?

it's really just an issue of semantics ... the schema.xml is where you
list all of the fields you need in your index, any notion of doctype is
entire artificial ... you could group all of the
fields relating to doctypeA in one section of the schema.xml, then have a
big !-- ##...## -- line and then list the fields in doctypeB, etc... but
wat if there are fields you use in both doctypes ? .. how much you mix
them is entirely up to you.



-Hoss






Rejecting fields with null values

2007-06-20 Thread Thiago Jackiw

I'm not sure if this is possible or not, but, is there a way to do a
search and reject fields that are empty or have null values like the
pseudo code below?

?q=test+AND+(NOT+field_b:NULL)

If this is not currently supported, does anyone think this is not a
god idea to be implemented?

Thanks,

--
Thiago Jackiw
acts_as_solr = http://acts-as-solr.railsfreaks.com


Re: RAMDirecotory instead of FSDirectory for SOLR

2007-06-20 Thread Otis Gospodnetic
Hi Jeryl,

Three weeks later - any luck with Solr + Terracotta?

Thanks,
Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Jeryl Cook [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, June 1, 2007 3:59:21 AM
Subject: RE: RAMDirecotory instead of FSDirectory for SOLR

i have Terracotta to work with Lucene , and it works find with the 
RAMDirectory...i am trying to get it to work with SOLR(Hook the 
RAMDirectory..)..., when i do, ill post the findings,problems,etc..Thanks for 
feedback from everyone.Jeryl Cook 



/^\ Pharaoh /^\ 

http://pharaohofkush.blogspot.com/ 



..Act your age, and not your shoe size..

-Prince(1986) Date: Thu, 31 May 2007 18:24:26 -0700 From: [EMAIL PROTECTED] 
To: solr-user@lucene.apache.org Subject: RE: RAMDirecotory instead of 
FSDirectory for SOLR   Jeryl,  If you need any help getting Terracotta to 
work under Lucene or if you have any questions about performance tuning and/or 
load testing, you can also use the Terracotta community resources (mailing 
lists, forums, IRC, whatnot): 
http://www.terracotta.org/confluence/display/orgsite/Community.  We'd be 
more than happy to help you get this stuff working.  Cheers, Orion   
Jeryl Cook wrote:Thats the thing,Terracotta persists everything it has 
in memory to the  disk when it overflows(u can set how much u want to use in 
memory), or  when the server goes offline.  When the server comes back the 
master  terracotta simply loads it back into the memory of the once offline 
 worker..identical to the approach SOLR already does to handle  
scalability, this allows
 unlimited storage of the items in memory, ...  you just need to cluster the 
RAMDirectory according to the sample giving  by TerracottaHowever i 
read some of the post here...I read some say:   i wonder how performance 
will be.,etci was trying to get it  working..andload test the hell out 
it, and see how it acts with large  amounts of data, and how it ompares with 
SOLR using typical FSDirectory  approach.i plan to post findings..Jeryl Cook 
/^\ Pharaoh /^\ http://pharaohofkush.blogspot.com/   
  ..Act your age, and not your shoe size..-Prince(1986) 
Date: Thu, 31 May 2007 13:51:53 -0700 From:  [EMAIL PROTECTED] To: 
solr-user@lucene.apache.org Subject: RE:  RAMDirecotory instead of 
FSDirectory for SOLR   : board, looks like i  can achieve this with the 
embedded version of SOLR : uses the lucene  RAMDirectory to store the 
index..Jeryl Cook  yeah ... adding  asolrconfig.xml option
 for using a RAMDirectory would be possible ... but  almost meaningless for 
most people (the directory would go away when the  server shuts down) ... 
even for use cases like what you describe (hooking  in terrecota) it 
wouldn't be enough in itself, because there would be no  hook to give 
terracota access to it.   -Hoss --  View this message in context: 
http://www.nabble.com/RAMDirecotory-instead-of-FSDirectory-for-SOLR-tf3843377.html#a10905062
 Sent from the Solr - User mailing list archive at Nabble.com. 




Re: Rejecting fields with null values

2007-06-20 Thread Chris Hostetter
: I'm not sure if this is possible or not, but, is there a way to do a
: search and reject fields that are empty or have null values like the
: pseudo code below?

As an inverted index, the Lucene index Solr uses doesn't know when
documents have an empty value ... it stores the inverted mapping of
value=documents, so there is no way to query for field_b:NULL, let alone
NOT field_b:bull

you can however query forthings like:  field_b:[* TO *] which requres
field_b to have some value (that seems to be the use case you are after)

as a general rule, if you really want to be abel to support searches for
rhings like find all docs wher there is no value in field X the easiest
way to achieve something like that in Solr is to configure the field with
a default value in the schema ... something that would never normally
appear in your data (a placeholder for 'null' so to speak) and query on
that.


-Hoss



Re: All facet.fields for a given facet.query?

2007-06-20 Thread Martin Grotzke
On Wed, 2007-06-20 at 12:49 -0700, Chris Hostetter wrote:
 :  I solve this problem by having metadata stored in my index which tells
 :  my custom request handler what fields to facet on for each category ...
 : How do you define this metadata?
 
 this might be a good place to start, note that this message is almost two
 years old, and predates the opensourcing of Solr ... the Servlet refered
 to in this thread is Solr.
 
 http://www.nabble.com/Announcement%3A-Lucene-powering-CNET.com-Product-Category-Listings-p748420.html
 
 ...i think i also talked a bit about the metadata documents in my
 apachecon slides from last yera ... but i don't really remember, and i
 haven't look at them in a while...
 
 http://people.apache.org/~hossman/apachecon2006us/

thx, I'll have a look at these resources.

cheers,
martin


 
 
 -Hoss
 



signature.asc
Description: This is a digitally signed message part


Re: All facet.fields for a given facet.query?

2007-06-20 Thread Chris Hostetter

: I realized, that the facets are build out of the whole index, not the
: subset
: returned by the initial query. Therefore I have a large number of empty
: facets which I simply ignore. In my case the QueryTime is somewhat

facet.mincount is a way to tell solr not to bother giving you those 0
counts ... you sill still get the name of hte field though so that you
know it tried it.

: Faceting on manufacturers and categories first and than present the
: corresponding facets might be used under some circumstances, but in my case
: the category structure is quite deep, detailed and complex. So when
: the user enters a query I like to say to him Look, here are the
: manufacturers and categories with matches to your query, choose one if you
: want, but maybe there is another one with products that better fit your
: needs or products that you didn't even know about. So maybe you like to
: filter based on the following attributes. Something like this ;o)

categories was just an example i used because it tends to be a common use
case ... my point is the decision about which facet qualifies for the
maybe there is another one with products that better fit your needs part
of the response either requires computing counts for *every* facet
constraint and then looking at them to see which ones provide good
distribution, or by knowing something more about your metadata (ie: having
stats that show the majority of people who search on the word canon want
to facet on megapixels) .. this is where custom biz logic comes in,
becuase in a lot of situations computing counts for every possible facet
may not be practical (even if the syntax to request it was easier)


-Hoss



Re: Rejecting fields with null values

2007-06-20 Thread Yonik Seeley

Keep in mind filters too... they can be much more efficient if used often:
?q=testfq=field_b:[* TO *]

-Yonik

On 6/20/07, Thiago Jackiw [EMAIL PROTECTED] wrote:

Hoss,

 As an inverted index, the Lucene index Solr uses doesn't know when
 documents have an empty value ... it stores the inverted mapping of
 value=documents, so there is no way to query for field_b:NULL, let alone
 NOT field_b:bull

I see what you mean.  I guess searching for fields that require to
have a value like the way you explained is a good way to go.

Thanks!

--
Thiago Jackiw
acts_as_solr = http://acts-as-solr.railsfreaks.com


On 6/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:
 : I'm not sure if this is possible or not, but, is there a way to do a
 : search and reject fields that are empty or have null values like the
 : pseudo code below?

 As an inverted index, the Lucene index Solr uses doesn't know when
 documents have an empty value ... it stores the inverted mapping of
 value=documents, so there is no way to query for field_b:NULL, let alone
 NOT field_b:bull

 you can however query forthings like:  field_b:[* TO *] which requres
 field_b to have some value (that seems to be the use case you are after)

 as a general rule, if you really want to be abel to support searches for
 rhings like find all docs wher there is no value in field X the easiest
 way to achieve something like that in Solr is to configure the field with
 a default value in the schema ... something that would never normally
 appear in your data (a placeholder for 'null' so to speak) and query on
 that.


 -Hoss





Re: All facet.fields for a given facet.query?

2007-06-20 Thread Yonik Seeley

On 6/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:

facet.mincount is a way to tell solr not to bother giving you those 0
counts ...


An aside: shouldn't that be the default?  All of the people using
facets that I have seen always have to set facet.mincount=1 (or
facet.zeros=false)

-Yonik


RE: Faceted Search!

2007-06-20 Thread Mike Austin
Niraj: What environment are you using? SQL Server/.NET/Windows? or something
else?

-Mike

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 20, 2007 4:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Faceted Search!



: define the sub-categories.  let's say from the above example, the
: category price has different sub-categories like less than 100
: ,100-200?  I'm guessing, we explicit define this in XML feed file, but
: I could be very wrong.  In any case, can you please give me the short
: example achieve that implementation.  Well, thanks once again.

there's nothing out of the box from Solrthat will do this, it's
something you would need to implement either in the lcient or in a custom
request handler ... Solr's Simple Faceting support is esigned to be just
that: simple.  but the underlying methods/mechanisms of computing DocSet
intersetions can be used by any custom requets handler to generate
application specific results.

I've got 3 or 4 indexes that use the out of the box SimpleFacet support
Solr provides, but the major faceting we do (product based facets) all
uses custom request handlers so we can have very exact control on all of
this kind of stuff driven by our data management tools.



-Hoss



Re: Slave/Master swap

2007-06-20 Thread James liu

If just one master or  one slave server fail, i think u maybe can use master
index server.

shell controlled by program is easy for me. i use php  and shell_exec.


2007/6/21, Otis Gospodnetic [EMAIL PROTECTED]:


Right, that SAN con 2 Masters sounds good.  Lucky you with your lonely
Master!  Where I work hw failures are pretty common.

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, June 20, 2007 11:43:02 PM
Subject: Re: Slave/Master swap



: The more expensive solution might be to have Solr instances run on top
: of a SAN and then one could really have multiple Master instances, one
: in stand-by mode and ready to be started as the new Master if the

i *believe* that if you have two solr isntances pointed at the same
physical data directory (SAN or otherwise) but you only send update/commit
commands to one, they won't interfere with eachother.  so concievable you
can have both masters up and running and your failover approach if the
primary goes down is just to start sending updates to the secondary.
you'll loose any unflushed changes that hte primary had in memory, but
those are lost anyway.

don't trust me on that though, test it out yourself.

: curiosity, how does CNet handle Master redundancy?

I don't know how much i'm allowed to talk about our processes and systems
for redundency, disastery recovery, fallover, etc... but i don't think
i'll upset anyone if i tell you: as far as i know, we've never needed to
take advantage of them with a solr master.  ie: we've never had a solr
master crash so hard we had to bring up another one in it's place ...
knock on wood.  (that probably has more to do with having good hardware
then anything else though).

(and no, i honestly don't know what hardware we use ... i don't bother
paying attention, i let hte hardware guys worry about that)


-Hoss








--
regards
jl


Re: SolrSharp example

2007-06-20 Thread Michael Plax

Hello,

Yonik and Jeff thank you for your help.
You are right this was content-type issue.

in order to run example  following things need to be done:

1.Code (SolrSharp) should be changed
from:
src\Configuration\SolrSearcher.cs(217):oRequest.ContentType = 
application/x-www-form-urlencoded;

to:
src\Configuration\SolrSearcher.cs(217):oRequest.ContentType = 
text/xml;


2. In order take care of the solr 1.2 schema invalidation issue:
schema.xml
comment line: 265
!-- field name=word type=string indexed=true stored=true/--
comment line: 279
!-- field name=timestamp type=date indexed=true stored=true 
default=NOW multiValued=false/--

or as Jeff suggested:
   For the example code, adding the timestamp field in the
   ExampleIndexDocument public constructor such as:
   this.Add(new IndexFieldValue(timestamp, 
DateTime.Now.ToString(s)+Z)));


Regards
Michael




- Original Message - 
From: Jeff Rodenburg [EMAIL PROTECTED]

To: solr-user@lucene.apache.org
Sent: Wednesday, June 20, 2007 1:56 PM
Subject: Re: SolrSharp example



On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote:


On 6/20/07, Michael Plax [EMAIL PROTECTED] wrote:
 This is a log that I got after runnning SolrSharp example. I think
example
 program posts not properly formatted xml.
 I'm running Solr on Windows XP, Java 1.5. Are those settings could be
the
 problem?

Solr1.2 is pickier about the Content-type in the HTTP headers.
I bet it's being set incorrectly.




Ahh, good point.  Within SolrSearcher.cs, the WebPost method contains this
setting:

oRequest.ContentType = application/x-www-form-urlencoded;

Looking through the CHANGES.txt file in the 1.2 tagged release on svn:

9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler 
using
the new request dispatcher (SOLR-104). This requires posted content to 
have

a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'.  The
response format matches that of /select and returns standard error codes. 
To

enable solr1.1 style /update, do not map /update to any handler in
solrconfig.xml (ryan)

For SolrSearcher.cs, it sounds as though changing the ContentType setting 
to

text/xml may fix this issue.

I don't have a 1.2 instance to test this against available to me right 
now,

but can check this later.  Michael, try updating your SolrSearcher.cs file
for this content-type setting to see if that resolves your issue.


thanks,
jeff r.





Re: Multiple doc types in schema

2007-06-20 Thread James liu

I see SOLR-215 from this mail.

Does it now really support multi index and search it will return merged
data?

for example:

i wanna search: aaa, and i have index1, index2, index3, index4it should
return the result from index1,index2,index3, index4 and merge result by
score, datetime, or other thing.

Does it support NFS and how its performance?



2007/6/21, Otis Gospodnetic [EMAIL PROTECTED]:


This sounds like a potentially good use-case for SOLR-215!
See https://issues.apache.org/jira/browse/SOLR-215

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org; Jack L [EMAIL PROTECTED]
Sent: Wednesday, June 6, 2007 6:58:10 AM
Subject: Re: Multiple doc types in schema


: This is based on my understanding that solr/lucene does not
: have the concept of document type. It only sees fields.
:
: Is my understanding correct?

it is.

: It seems a bit unclean to mix fields of all document types
: in the same schema though. Or, is there a way to allow multiple
: document types in the schema, and specify what type to use
: when indexing and searching?

it's really just an issue of semantics ... the schema.xml is where you
list all of the fields you need in your index, any notion of doctype is
entire artificial ... you could group all of the
fields relating to doctypeA in one section of the schema.xml, then have a
big !-- ##...## -- line and then list the fields in doctypeB, etc... but
wat if there are fields you use in both doctypes ? .. how much you mix
them is entirely up to you.



-Hoss








--
regards
jl


Re: SolrSharp example

2007-06-20 Thread Jeff Rodenburg

Thanks for checking, Michael -- great find.  I'm in process of readying this
same fix for inclusion in the source code (I'm verifying against a
full 1.2install.)

The SolrField class is now also being extended to incorporate an IsDefaulted
property, which will permit the SolrSchema.IsValidUpdateIndexDocument to
yield true when default value fields aren't present in the update request.

thanks,
jeff r.



On 6/20/07, Michael Plax [EMAIL PROTECTED] wrote:


Hello,

Yonik and Jeff thank you for your help.
You are right this was content-type issue.

in order to run example  following things need to be done:

1.Code (SolrSharp) should be changed
from:
src\Configuration\SolrSearcher.cs(217):oRequest.ContentType =
application/x-www-form-urlencoded;
to:
src\Configuration\SolrSearcher.cs(217):oRequest.ContentType =
text/xml;

2. In order take care of the solr 1.2 schema invalidation issue:
schema.xml
comment line: 265
!-- field name=word type=string indexed=true stored=true/--
comment line: 279
!-- field name=timestamp type=date indexed=true stored=true
default=NOW multiValued=false/--
or as Jeff suggested:
For the example code, adding the timestamp field in the
ExampleIndexDocument public constructor such as:
this.Add(new IndexFieldValue(timestamp,
DateTime.Now.ToString(s)+Z)));

Regards
Michael




- Original Message -
From: Jeff Rodenburg [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, June 20, 2007 1:56 PM
Subject: Re: SolrSharp example


 On 6/20/07, Yonik Seeley [EMAIL PROTECTED] wrote:

 On 6/20/07, Michael Plax [EMAIL PROTECTED] wrote:
  This is a log that I got after runnning SolrSharp example. I think
 example
  program posts not properly formatted xml.
  I'm running Solr on Windows XP, Java 1.5. Are those settings could be
 the
  problem?

 Solr1.2 is pickier about the Content-type in the HTTP headers.
 I bet it's being set incorrectly.



 Ahh, good point.  Within SolrSearcher.cs, the WebPost method contains
this
 setting:

 oRequest.ContentType = application/x-www-form-urlencoded;

 Looking through the CHANGES.txt file in the 1.2 tagged release on svn:

 9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler
 using
 the new request dispatcher (SOLR-104). This requires posted content to
 have
 a valid contentType: curl -H 'Content-type:text/xml;
charset=utf-8'.  The
 response format matches that of /select and returns standard error
codes.
 To
 enable solr1.1 style /update, do not map /update to any handler in
 solrconfig.xml (ryan)

 For SolrSearcher.cs, it sounds as though changing the ContentType
setting
 to
 text/xml may fix this issue.

 I don't have a 1.2 instance to test this against available to me right
 now,
 but can check this later.  Michael, try updating your SolrSearcher.csfile
 for this content-type setting to see if that resolves your issue.


 thanks,
 jeff r.





Re: Multiple doc types in schema

2007-06-20 Thread Jim Dow

Ignore the poor segmentation scheme (document types combined with
categorizing), but this is working quite well as we get close to going live
with a product.

This static IndexDocKey class contains enumerator that generates Catalog
keys for each type of document (POJO / Model object) that gets indexed.  The
indexing process assigns a Catalog key for each document type, the process
extracts the catid (doc-type) as well as other information that is put into
the index doc.

Just an idea for you:


/**
* This drives much of the categorization of indexes and the subsequent
query
* filters. Lots of logic built into these enumarations and really are rules
* that may better be injected or looked up in a true rules engine. This is
the
* start of system generated markup and metadata.
*
* @author jdow
* @version %I%, %G%
* @since 0.90
*p
*
* pre
* TODO: Review to see if want to keep in the index doc, but make the
enumeration
* of Categories, SubCat, etc...more meaninful and order them in the right
way
* to facilitate getting back filtered docs in a controlled sort order.
* /pre
*/
public class IndexDocKey implements Serializable
{

   // STATICS
   public static final long serialVersionUID = 1L;

   public static long getSerialVersionUID()
   {
   return serialVersionUID;
   }

   @SuppressWarnings(unused)
   protected Category category;


   public IndexDocKey()
   {
   }

   public IndexDocKey(Category category)
   {
   this.category = category;

   }

   public void setCategory(Category category)
   {
   this.category = category;

   }

   /*
* public void setCatDoc(CatDoc catdoc) { this.catdoc = catdoc; }
*/

   public enum Category implements Serializable
   {
   SYSTEM(S, System, null, null),
   SYSPING(S0010, System, null, null),
   APPCNTEXAMPLES(AC01L, Example, ZExample.class, Person.class),
   APPCNTPEOPLE(AC02P, Example, ZExample.class, Person.class),
   APPCNTDISCUSS(AC03D, Example, ZExample.class, Person.class),
   APPCNTIMAGE(AC04I, Example, ZExample.class, Person.class),
   APPCNTFILES(AC05F, Example, ZExample.class, Person.class),
   APPCNTEVENT(AC02E, Example, ZExample.class, Person.class),
   EXAMPLE(L, Example, ZExample.class, Person.class),
   EXAMPLECNTPEOPLE(L00CP, Example, ZExample.class, Person.class),
   EXAMPLECNTDISCUSS(L00CD, Example, ZExample.class, Person.class),
   EXAMPLECNTIMAGE(L00CI, Example, ZExample.class, Person.class),
   EXAMPLECNTFILES(L00CF, Example, ZExample.class, Person.class),
   EXAMPLECNTEVENT(L00CE, Example, ZExample.class, Person.class),
   EXAMPLEIDENTITY(L00LI, Identity, Identity.class, ZExample.class
),

   EVENT(LCE10, Content, Content.class, Content.class),
   EVENTLABEL(LCE11, ContentLabel, ContentLabel.class,
Content.class),
   EVENTCOMM(LCE12, ContentComment, ContentComment.class,
Content.class),
   EVENTPROP(LCE13, ContentProperty, ContentProperty.class,
Content.class),

   DISCUSS(LCD10, Content, Content.class, Content.class),
   DISCUSSLABEL(LCD11, ContentLabel, ContentLabel.class,
Content.class),
   DISCUSSCOMM(LCD12, ContentComment, ContentComment.class,
Content.class),
   DISCUSSPROP(LCD13, ContentProperty, ContentProperty.class,
Content.class),

   IMAGE(LCI10, Content, Content.class, Content.class),
   IMAGELABEL(LCI11, ContentLabel, ContentLabel.class,
Content.class),
   IMAGECOMM(LCI12, ContentComment, ContentComment.class,
Content.class),
   IMAGEPROP(LCI13, ContentProperty, ContentProperty.class,
Content.class),

   FILE(LCF10, Content, Content.class, Content.class),
   FILELABEL(LCF11, ContentLabel, ContentLabel.class, Content.class
),
   FILECOMM(LCF12, ContentComment, ContentComment.class,
Content.class),
   FILEPROP(LCF13, ContentProperty, ContentProperty.class,
Content.class);

   private String catid;
   private String catname;
   private String catdoc;
   private Class? catclass;
   private Class? catparentclass;

   private Category(String catid, String catdoc, Class catclass, Class
catparentclass)
   {
   this.catid = catid;

   this.catdoc = catdoc;

   this.catname = this.name();
   }

   public String getCatId()
   {
   return this.catid;
   }

   public String getCatName()
   {
   return this.catname;
   }

   public String getCatDoc()
   {
   return this.catdoc;
   }

   public Class? getCatClass()
   {
   return this.catclass;
   }

   public Class? getCatParentClass()
   {
   return this.catparentclass;
   }
   }

}


On 6/20/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:


This sounds like a potentially good use-case for SOLR-215!
See https://issues.apache.org/jira/browse/SOLR-215

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original 

delete changed?

2007-06-20 Thread James liu

solr:1.2

curl http://192.168.7.6:8080/solr0/update --data-binary
'deletequerynodeid:20/query/delete'

i remember it is ok when i use solr 1.1

does it change?

it show me:
HTTP Status 400 - missing content stream
--

*type* Status report

*message* *missing content stream*

*description* *The request sent by the client was syntactically incorrect
(missing content stream).*


--
regards
jl


RE: Faceted Search!

2007-06-20 Thread niraj tulachan
Hi Mike,
Currently, I'm just running the demo example provided in the Solr web site 
on my local windows machines.  I was purely looking into generating XML feed 
file and feeding to the Solr server.  However, I was also looking into 
implementing having sub-categories within the categories if that make sense.  
For example, in the shopper.com we have the categories of by price, 
manufactures and so on and with in them,they are sub categories (price is 
sub-cat into $100, 100-200, 200-300 etc).  I don't have constraint in terms of 
technology.  If I have to implement db server I won't mind implementing it.  
Anyway, plz shine a light on how would you handle this issue.  Any suggestion 
will be appericated.
  Thanks,
  Niraj
Mike Austin [EMAIL PROTECTED] wrote:
  Niraj: What environment are you using? SQL Server/.NET/Windows? or something
else?

-Mike

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 20, 2007 4:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Faceted Search!



: define the sub-categories. let's say from the above example, the
: category price has different sub-categories like less than 100
: ,100-200? I'm guessing, we explicit define this in XML feed file, but
: I could be very wrong. In any case, can you please give me the short
: example achieve that implementation. Well, thanks once again.

there's nothing out of the box from Solrthat will do this, it's
something you would need to implement either in the lcient or in a custom
request handler ... Solr's Simple Faceting support is esigned to be just
that: simple. but the underlying methods/mechanisms of computing DocSet
intersetions can be used by any custom requets handler to generate
application specific results.

I've got 3 or 4 indexes that use the out of the box SimpleFacet support
Solr provides, but the major faceting we do (product based facets) all
uses custom request handlers so we can have very exact control on all of
this kind of stuff driven by our data management tools.



-Hoss



   
-
Moody friends. Drama queens. Your life? Nope! - their life, your story.
 Play Sims Stories at Yahoo! Games. 

Recent updates to Solrsharp

2007-06-20 Thread Jeff Rodenburg

Thanks to Yonik, Michael, Ryan, (and others) for some recent help on various
issues discovered with Solrsharp.  We were able to discover a few issues
with the library relative to the Solr 1.2 release.  Those issues have been
remedied and have been pushed into source control.

The Solrsharp source code can be obtained at:
http://solrstuff.org/svn/solrsharp.

Recent fixes include:
- Fix for broken DeleteIndexDocument xml serialization
- Update to correct document posting content-type to solr 1.2 instance
- Identifying schema fields with new IsDefaulted property
- Updates to the example application to incorporate these fixes and the solr
1.2 sample schema
- Updated documentation consistent with these changes

As an aside, it would be nice to record these issues more granularly in
JIRA.  Could we get a component created for our client library, similar to
java/php/ruby?

cheers,
j


Re: Recent updates to Solrsharp

2007-06-20 Thread Yonik Seeley

On 6/21/07, Jeff Rodenburg [EMAIL PROTECTED] wrote:

As an aside, it would be nice to record these issues more granularly in
JIRA.  Could we get a component created for our client library, similar to
java/php/ruby?


Done.

-Yonik