Re: Dismax + Dynamic fields

2008-06-17 Thread Daniel Papasian
Norberto Meijome wrote:
 Thanks Yonik. ok, that matches what I've seen - if i know the actual
 name of the field I'm after, I can use it in a query it, but i can't
 use the dynamic_field_name_* (with wildcard) in the config.
 
 Is adding support for this something that is desirable / needed
 (doable??) , and is it being worked on ?

You can use a wildcard with copyFrom to copy the dynamic fields that
match the pattern to another field that you can then query on. It seems
like that would cover your needs, no?

Daniel


Re: expression in an fq parameter fails

2008-05-21 Thread Daniel Papasian

Ezra Epstein wrote:

  str name=fqstoreAvailableDate:[* TO NOW]/str
  str name=fqstoreExpirationDate:[NOW TO *]/str

...


This works perfectly.  Only trouble is that the two data fields may
actually be empty, in which case this filters out such records and we
want to include them.  


I think the easiest thing to do would be either use a zero-date for 
storeAvailableDate and an infinity-date for storeExpirationDate instead 
of having them be empty for things you want to be always available or 
always expired (if I've understood your problem) or, add another field 
alwaysAvailable or neverExpiring, and then do an OR off of that.


Maybe that's cheating?

HTH,
Daniel


Re: Extending XmlRequestHandler

2008-05-09 Thread Daniel Papasian

Alexander Ramos Jardim wrote:

Ok,

Thanks for the advice!

I got the XmlRequestHandler code. I see it uses Stax right at the XML it
gets. There isn't anything to plug in or out to get an easy way to change
the xml format.


To maybe save you from reinventing the wheel, when I asked a similar 
question a couple weeks back, hossman pointed me towards SOLR-285 and 
SOLR-370.  285 does XSLT, 270 does STX.


Daniel


Re: SOLR-470 default value in schema with NOW (update)

2008-05-07 Thread Daniel Papasian
Chris Hostetter wrote:
 The two exceptions you cited both indicate there was at least one date 
 instance with no millis included -- NOW can't do that.  it always inlcudes 
 millis (even though it shouldn't). 

I've seen people suggest, for performance reasons, that they reduce the
granularity of the timestamps they're storing down to what they need -
i.e. minute, hour, or day, instead of millisecond.  But it seems that
functionality will break if you don't store it with millis.

I'm just trying to make sure I'm reconciling these here-- Is the goal of
reducing the granularity simply to reduce the cardinality of the indexed
date terms?  If so, is the best practice when you don't need
significance beyond date to just fill the rest of the date with zeros,
and index, say, 2008-07-05T00:00:00.000Z?

(Hope this doesn't count as a threadjack!)

Daniel


Re: XSLT transform before update?

2008-04-17 Thread Daniel Papasian

Shalin Shekhar Mangar wrote:

Hi Daniel,

Maybe if you can give us a sample of how your XML looks like, we can suggest
how to use SOLR-469 (Data Import Handler) to index it. Most of the use-cases
we have yet encountered are solvable using the XPathEntityProcessor in
DataImportHandler without using XSLT, for details look at
http://wiki.apache.org/solr/DataImportHandler#head-e68aa93c9ca7b8d261cede2bf1d6110ab1725476


I think even if it is possible to use SOLR-469 for my needs, I'd still 
prefer the XSLT approach, because it's going to be a bit of 
configuration either way, and I'd rather it be an XSLT stylesheet than 
solrconfig.xml.  In addition, I haven't yet decided whether I want to 
apply any patches to the version that we will deploy, but if I do go 
down the route of the XSLT transform patch, if I end up having to back 
it out the amount of work that it would be for me to do the transform at 
the XML source would be negligible, where it would be quite a bit of 
work ahead of me to go from using the DataImportHandler to not using it 
at all.


Because both the solr instance and the XML source are in house, I have 
the ability to apply the XSLT at the source instead of at solr. 
However, there are different teams of people that control the XML source 
and solr, so it would require a bit more office coordination to do it on 
the backend.


The data is a filemaker XML export (DTD fmresultset) and it looks 
roughly like this:

fmresultset
  resultset
field name=IDdata125/data/field
field name=organizationdataFord Foundation/data/field
...
relatedset table=Employees
  record
field name=IDdataY5-A/data/field
field name=NamedataJohn Smith/data/field
  /record
  record
field name=IDdataY5-B/data/field
field name=NamedataJane Doe/data/field
  /record
/relatedset
/fmresultset

I'm taking the product of the resultset and the relatedset, using both 
IDs concatenated as a unique identifier, like so:


doc
field name=ID125Y5-A/field
field name=organizationFord Foundation/field
field name=NameJohn Smith/field
/doc
doc
field name=ID125Y5-B/field
field name=organizationFord Foundation/field
field name=NameJane Doe/field
/doc

I can do the transform pretty simply with XSLT.  I suppose it is 
possible to get the DataImportHandler to do this, but I'm not yet 
convinced that it's easier.


Daniel


XSLT transform before update?

2008-04-16 Thread Daniel Papasian
Hey everyone,

I'm experimenting with updating solr from a remote XML source, using an
XSLT transform to get it into the solr XML syntax (and yes, I've looked
into SOLR-469, but disregarded it as I need to do quite a bit using XSLT
to get it to what I can index) to let me maintain an index.

I'm looking at using stream.url, but I need to do the XSLT at some point
in there.  I would prefer to do the XSLT on the client (solr) side of
the transfer, for various reasons.

Is there a way to implement a custom request handler or similar to get
solr to apply an XSLT transform to the content stream before it attempts
to parse it?  If not possible OOTB, where would be the right place to
add said functionality?

Thanks much for your help,

Daniel


Re: how to suppress result

2008-04-07 Thread Daniel Papasian

Evgeniy Strokin wrote:

I'm sorry, I didn't explain my case clearly. My Index base should
stay the same. User run query, and each time he runs query he wants
to suppress his own IDs. The example will be a merchant, who sell
books. He sell only fantasy books and he wants to see all fantasy
books in stock of wholesaler except books he already has in his own
stack. So he provides a list of books he already has and want them to
be excluded from his search result. So suppression is per query
actually (it would be better to say per user's session, but since
Solr has no sessions I'd say per query). Obviously other book shop
has his own book list and his own query and he wants to search and
suppress from the same index base of wholesaler.


What I would do is index book-merchant pairs, instead of books and 
merchants separately.  Each document would have the merchant's ID in 
there, so you can just add a fq statement to exclude the current merchant.


It's a far ways from normalized data, but this is an index, not an 
RDBMS.  Denormalize the data into documents, and index that.


Daniel


Re: matching exact/whole phrase

2008-04-01 Thread Daniel Papasian

Sandeep Shetty wrote:

Hi people,

I am looking to provide exact phrase match, along with the full text
search with solr.  I want to achieve the same effect in solr rather
than use a separate SQL query. I want to do the following as an
example

The indexed field has the text car repair (without the double
quotes)  for a document and I want this document to come in the
search result only if someone searches for car repair. The document
should not show up for repair and car searches.

Is it possible to do this type of exact phrase matching if needed
with solr itself?


It sounds like you want to do an exact string match, and not a text 
match, so I don't think there's anything complex you'd need to do... 
just store the field with car repair as type=string and do all of 
the literal searches you want.


But if you are working off a field that contains something beyond the 
exact match of what you want to search for, you'll just need to define a 
new field type and use only the analysis filters that you need, and 
you'll have to think more about what you need if that's the case.


Daniel


Re: Multiple schemas?

2008-03-27 Thread Daniel Papasian

tim robertson wrote:

Hi,
Would I be correct in thinking that for each schema I want, I need a new
SOLR instance running?


Hey Tim,

Documents aren't required to have all of the fields (it's not a 
database), so what I would do is just have all of the field definitions 
in a single schema.xml file.


That approach would only be a problem if you needed to have a field name 
mean one thing some of the time and something else another -- I'd 
suggest using consistent naming so that fields named the same way were 
treated the same way, and then using a single solr instance.


Daniel


Re: Update schema.xml without restarting Solr?

2008-03-26 Thread Daniel Papasian

[EMAIL PROTECTED] wrote:

Quoting Daniel Papasian [EMAIL PROTECTED]:

Or if you're adding a new field to the schema (perhaps the most common
need for editing schema.xml), you don't need to reindex any documents at
all, right?  Unless I'm missing something?


Well, it all depends on if that field (not solar/lucene field) exists 
on the already indexed material, but that particular field was never 
indexed. Lets say that we have a bunch of articles, that has a field 
author that someone decided  that it doesn't need to be in the index. 
But then later he changes his mind, and add the author field to the 
schema. In this case all articles that has a populated author field 
should now be reindexed.


Yeah, I guess the use case I was thinking of was someone who had 
multiple different types of content in their index (say, articles, 
events, organizations) and when they added a new content type (book 
review) if they found the need to add a new field for that content type 
(say, publisher) that would only be relevant for that type -- as you're 
adding it before any data that would have it was indexed, I believe 
you'd be fine making that schema change without reindexing anything.



I suppose if you add a new dynamic field specification that conflicts
with existing fields, reindexing is probably a good idea, but if you're
doing that... well, I probably don't want to know.


I must say that I'm abit confused by these dynamic fields. Can someone 
tell me if there is any reasonable use of dynamic fields without having 
the variable type (for example i for int/sint) in the name?


Well, perhaps this is fulfilling your requirement on a technicality, but 
there's always higher order types...  Offhand, I can think of things 
where you might want to define a dynamic field like *_propername or 
*_cost and then you'd be able to use fields like author_propername or 
editor_propername, or book_cost or volume_cost or what have you.


Daniel