Multi-word searches in multi-valued fields

2011-09-22 Thread Olson, Ron
Hi all-

I'm not clear on how to allow a user to search a multi-valued field with 
multiple words and return only those documents where all the words are together 
in one value, and not spread over multiple values.

If I do a literal search on the "company name" field for "smith trucking" (with 
the quotes), then it works because it's looking for only "smith trucking", and 
it finds it, great. However, if I put in "trucking smith", then I get no 
results. If I try using something like (+trucking +smith), then I get documents 
where one document might have "joe's trucking" and "bob smith" in the resulting 
array of names.

So I guess what I need is an exact match, regardless of word positioning (i.e. 
"smith trucking" and "trucking smith" should find only those documents that 
have that those two words in one value of the resulting array).

I've been going through the wiki and it seems like this is probably a 
super-simple thing, but I'm clearly just not getting it; I just can't figure 
out the right syntax to make this work.

Thanks for any info.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Two unrelated questions

2011-09-21 Thread Olson, Ron
Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK 
field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. 
That same id is the one I use in my unique id field in the document 
(ID).

I've noticed that the table has, say, 10 rows. My index only has 8. I don't 
know why that is, but I'd like to figure out which records are missing and add 
them (and hopefully understand why they weren't added in the first place). I 
was just wondering if there was some way to compare the two as part of a sql 
query, but on reflection, it does seem like an absurd request, so I apologize; 
I think what I'll have to do is write a solrj program that gets every ID in the 
table, then does a search on that ID in the index, and add the ones that are 
missing.

Regarding the second item, yes, it's crazy but I'm not sure what to do; there 
really are that many options and some searches will be extremely specific, yet 
broad enough in terms for this to be a problem.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, September 21, 2011 3:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Two unrelated questions

for <1> I don't quite get what you're driving at. Your DIH
query assigns the uniqueKey, it's not like it's something
auto-generated. Perhaps a concrete example would
help.

<2> There's a limit you can adjust that defaults to
1024 (maxBooleanClauses in solrconfig.xml). You can
 bump this very high, but you're right, if anyone actually
does something absurd it'll slow *that* query down. But
just bumping this query higher won't change performance
absent someone actually putting a ton of items in it...

Best
Erick

On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron  wrote:
> Hi all-
>
> I'm not sure if I should break this out into two separate questions to the 
> list for searching purposes, or if one is more acceptable (don't want to 
> flood).
>
> I have two (hopefully) straightforward questions:
>
> 1. Is it possible to expose the unique ID of a document to a DIH query? The 
> reason I want to do this is because I use the unique ID of the row in the 
> table as the unique ID of the Lucene document, but I've noticed that the 
> counts of documents doesn't match the count in the table; I'd like to add 
> these rows and was hoping to avoid writing a custom SolrJ app to do it.
>
> 2. Is there any limit to the number of conditions in a Boolean search? We're 
> working on a new project where the user can choose either, for example, "Ford 
> Vehicles", in which case I can simply search for "Ford", but if the user 
> chooses specific makes and models, then I have to say something like "Crown 
> Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose 
> every model of Ford ever made except one. This could lead to a *very* large 
> query, and was worried both that it was even possible, but also the impact on 
> performance.
>
>
> Thanks, and I apologize if this really should be two separate messages.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Two unrelated questions

2011-09-19 Thread Olson, Ron
Hi all-

I'm not sure if I should break this out into two separate questions to the list 
for searching purposes, or if one is more acceptable (don't want to flood).

I have two (hopefully) straightforward questions:

1. Is it possible to expose the unique ID of a document to a DIH query? The 
reason I want to do this is because I use the unique ID of the row in the table 
as the unique ID of the Lucene document, but I've noticed that the counts of 
documents doesn't match the count in the table; I'd like to add these rows and 
was hoping to avoid writing a custom SolrJ app to do it.

2. Is there any limit to the number of conditions in a Boolean search? We're 
working on a new project where the user can choose either, for example, "Ford 
Vehicles", in which case I can simply search for "Ford", but if the user 
chooses specific makes and models, then I have to say something like "Crown Vic 
OR Focus OR Taurus OR F-150", etc., where they could theoretically choose every 
model of Ford ever made except one. This could lead to a *very* large query, 
and was worried both that it was even possible, but also the impact on 
performance.


Thanks, and I apologize if this really should be two separate messages.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Add copyTo Field without re-indexing?

2011-09-16 Thread Olson, Ron
Hi all-

I have an 11 gig index that I realize I need to add another field to, but not 
from the actual query using DIH, but via copyTo.

Is there any way to re-parse an existing index, adding the new copyTo field, 
without having to basically start all over again with DIH?

Thanks,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Parent delta query, but no child delta query?

2011-09-02 Thread Olson, Ron
Hi all-

I'm trying to set up a delta query for a parent entity query that has many 
sub-queries. The table referenced in the parent query has a "last updated" 
field, but none of the children do. The way the data is set up is that when a 
child table is updated, the "last updated" field of the parent is set.

So my question is whether I need to set up delta queries for my child entities; 
if the parent is updated I just want the whole record replaced. Do I need to 
set up the delta queries for the children? The DIH page doesn't seem to be 
specific on this point; they talk about what happens when a child record is 
updated and notifying the parent.

Thanks for any info,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Exact matching on names?

2011-08-17 Thread Olson, Ron
Thank you Sujit and Rob for your help; I took the "easy" way and created a new 
field type that is identical to text, but with the stemmer removed. This seems, 
so far, to work exactly as needed.

To help anyone else who comes across this issue, this is the field type I used:


   







  
  







  



-Original Message-
From: Sujit Pal [mailto:sujit@comcast.net]
Sent: Tuesday, August 16, 2011 12:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Exact matching on names?

Hi Ron,

There was a discussion about this some time back, which I implemented
(with great success btw) in my own code...basically you store both the
analyzed and non-analyzed versions (use string type) in the index, then
send in a query like this:

+name:clarke name_s:"clarke"^100

The name field is text so it will analyze down "clarke" to "clark" but
it will match both "clark" and "clarke" and the second clause would
boost the entry with "clarke" up to the top, which you then select with
rows=1.

-sujit

On Tue, 2011-08-16 at 10:20 -0500, Olson, Ron wrote:
> Hi all-
>
> I'm missing something fundamental yet I've been unable to find the definitive 
> answer for exact name matching. I'm indexing names using the standard "text" 
> field type and my search is for the name "clarke". My results include 
> "clark", which is incorrect, it needs to match clarke exactly (case 
> insensitive).
>
> I tried textType but that doesn't work because I believe it needs to be 
> *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, 
> and clark", etc.
>
> Thanks for any help,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Exact matching on names?

2011-08-16 Thread Olson, Ron
Hi all-

I'm missing something fundamental yet I've been unable to find the definitive 
answer for exact name matching. I'm indexing names using the standard "text" 
field type and my search is for the name "clarke". My results include "clark", 
which is incorrect, it needs to match clarke exactly (case insensitive).

I tried textType but that doesn't work because I believe it needs to be 
*really* exact, whereas I'm looking for things like "clark oil", "bob, frank, 
and clark", etc.

Thanks for any help,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Dates off by 1 day?

2011-08-10 Thread Olson, Ron
Ah, great! I knew the problem was between the keyboard and the chair. Thanks!

-Original Message-
From: Sethi, Parampreet [mailto:parampreet.se...@teamaol.com]
Sent: Wednesday, August 10, 2011 10:25 AM
To: solr-user@lucene.apache.org
Subject: Re: Dates off by 1 day?


The Date difference is coming because of different time zones.

In Solr the date is stored as Zulu time zone and Solrj is returning date in
CDT timezone (jvm is picking system time zone.)

> 2002-05-13T00:00:00Z

> I get:
>
> --> Sun May 12 19:00:00 CDT 2002

You can convert Date in different time-zones using Java Util date functions
if required.

Hope it helps!

-param
On 8/10/11 11:20 AM, "Olson, Ron"  wrote:

> Hi all-
>
> I apologize in advance if this turns out to be a problem between the keyboard
> and the chair, but I'm confused about why my date field is correct in the
> index, but wrong in SolrJ.
>
> I have a field defined as a date in the index:
>
> 
>
> And if I use the admin site to query the data, I get the right date:
>
> 2002-05-13T00:00:00Z
>
> But in my SolrJ code:
>
> Iterator iter = queryResponse.getResults().iterator();
>
> while (iter.hasNext())
> {
> SolrDocument resultDoc = iter.next();
>
> System.out.println("--> " + resultDoc.getFieldValue("FILE_DATE"));
>
> }
>
> I get:
>
> --> Sun May 12 19:00:00 CDT 2002
>
> I've been searching around through the wiki and other places, but can't seem
> to find anything that either mentions this problem or talks about date
> handling in Solr/SolrJ that might refer to something like this.
>
> Thanks for any info,
>
> Ron
>
>
>
> DISCLAIMER: This electronic message, including any attachments, files or
> documents, is intended only for the addressee and may contain CONFIDENTIAL,
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended
> recipient, you are hereby notified that any use, disclosure, copying or
> distribution of this message or any of the information included in or with it
> is  unauthorized and strictly prohibited.  If you have received this message
> in error, please notify the sender immediately by reply e-mail and permanently
> delete and destroy this message and its attachments, along with any copies
> thereof. This message does not create any contractual obligation on behalf of
> the sender or Law Bulletin Publishing Company.
> Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Dates off by 1 day?

2011-08-10 Thread Olson, Ron
Hi all-

I apologize in advance if this turns out to be a problem between the keyboard 
and the chair, but I'm confused about why my date field is correct in the 
index, but wrong in SolrJ.

I have a field defined as a date in the index:



And if I use the admin site to query the data, I get the right date:

2002-05-13T00:00:00Z

But in my SolrJ code:

Iterator iter = queryResponse.getResults().iterator();

while (iter.hasNext())
{
SolrDocument resultDoc = iter.next();

System.out.println("--> " + resultDoc.getFieldValue("FILE_DATE"));

}

I get:

--> Sun May 12 19:00:00 CDT 2002

I've been searching around through the wiki and other places, but can't seem to 
find anything that either mentions this problem or talks about date handling in 
Solr/SolrJ that might refer to something like this.

Thanks for any info,

Ron



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: deleting index directory/files

2011-08-04 Thread Olson, Ron
I ran into a problem when I deleted just the "index" directory; I deleted the 
entire data directory and it was recreated on the next load. BTW, if you're 
using the DIH, its default behavior is to remove all records on a full import, 
so you can save yourself having to remove any actual files.

-Original Message-
From: Mark juszczec [mailto:mark.juszc...@gmail.com]
Sent: Thursday, August 04, 2011 4:01 PM
To: solr-user@lucene.apache.org
Subject: deleting index directory/files

Hello all

I'm using multiple cores.  I there's a directory named by the core and it
contains a subdir named data that contains a subdir named index that
contains a bunch of files that contain the data for my index.

Let's say I want to completely rebuild the index from scratch.

Can I delete the dir named index?  I know the next thing I'd have to do is a
full data import, and that's ok.  I want to blow away any traces of the
core's previous existence.

Mark

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Strategies for sorting by array, when you can't sort by array?

2011-08-04 Thread Olson, Ron
For anyone who comes across this topic in the future, I "solved" the problem 
this way: by agreement with the stakeholders, on the presumption that no one 
would look at more than 5000 records, I modified my search code so that, if the 
user selected to sort by the name, I set the row count to return 
(query.setRows) to 5000. I then put all the result records into a list, sort 
it, then, depending on what page they're on, extract that subset of the 5000 
and return it.

There is a small performance hit on initial searching for common names (e.g. 
Smith, Jones, etc.), but the performance is still far more acceptable than the 
legacy system Solr is meant to replace (a few seconds as opposed to twenty(!) 
minutes).

Most certainly there are better ways, but this one worked for me, and wanted to 
make sure it was added to the pool of options for anyone who comes across this 
problem in the future.

Thanks to everyone who offered suggestions!

Ron

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, August 03, 2011 11:36 AM
To: solr-user@lucene.apache.org
Cc: Olson, Ron
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Not so much that it's a corner case in the sense of being unusual
neccesarily (I'm not sure), it's just something that fundamentally
doesn't fit well into lucene's architecture.

I'm not sure that filing a JIRA will be much use, it's really unclear
how one would get lucene to do this, it would be signficant work to do,
and it's unlikely any Solr developer is going to decide to spend
signficant time on it unless they need it for their own clients.

On 8/3/2011 11:40 AM, Olson, Ron wrote:
> *Sigh*...I had thought maybe reversing it would work, but that would require 
> creating a whole new index, on a separate core, as the existing index is used 
> for other purposes. Plus, given the volume of data, that would be a big deal, 
> update-wise. What would be better would be to remove that particular sort 
> option-button on the webpage. ;)
>
> I'll create a Jira issue, but in the meanwhile I'll have to come up with 
> something else. I guess I didn't realize how much of a "corner case" this 
> problem is. :)
>
> Thanks for the suggestions!
>
> Ron
>
> -Original Message-
> From: Smiley, David W. [mailto:dsmi...@mitre.org]
> Sent: Wednesday, August 03, 2011 10:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Strategies for sorting by array, when you can't sort by array?
>
> Hi Ron.
> This is an interesting problem you have. One idea would be to create an index 
> with the entity relationship going in the other direction.  So instead of one 
> to many, go many to one.  You would end up with multiple documents with 
> varying names but repeated parent entity information -- perhaps simply using 
> just an ID which is used as a lookup. Do a search on this name field, sorting 
> by a non-tokenized variant of the name field. Use Result-Grouping to 
> consolidate multiple matches of a name to the same parent document. This 
> whole idea might very well be academic since duplicating all the parent 
> entity information for searching on that too might be a bit much than you 
> care to bother with. And I don't think Solr 4's join feature addresses this 
> use case. In the end, I think Solr could be modified to support this, with 
> some work. It would make a good feature request in JIRA.
>
> ~ David Smiley
>
> On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:
>
>> Hi all-
>>
>> Well, this is a problem. I have a list of names as a multi-valued field and 
>> I am searching on this field and need to return the results sorted. I know 
>> from searching and reading the documentation (and getting the error) that 
>> sorting on a multi-valued field isn't possible. Okay, so, what I haven't 
>> found is any real good solution/workaround to the problem. I was wondering 
>> what strategies others have done to overcome this particular situation; 
>> collapsing the individual names into a single field with copyField doesn't 
>> work because the name searched may not be the first name in the field.
>>
>> Thanks for any hints/tips/tricks.
>>
>> Ron
>>
>> DISCLAIMER: This electronic message, including any attachments, files or 
>> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
>> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
>> recipient, you are hereby notified that any use, disclosure, copying or 
>> distribution of this message or any of the information included in or with 
>> it is  unauthorized and strictly prohibited.  If you hav

RE: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Olson, Ron
*Sigh*...I had thought maybe reversing it would work, but that would require 
creating a whole new index, on a separate core, as the existing index is used 
for other purposes. Plus, given the volume of data, that would be a big deal, 
update-wise. What would be better would be to remove that particular sort 
option-button on the webpage. ;)

I'll create a Jira issue, but in the meanwhile I'll have to come up with 
something else. I guess I didn't realize how much of a "corner case" this 
problem is. :)

Thanks for the suggestions!

Ron

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 03, 2011 10:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Hi Ron.
This is an interesting problem you have. One idea would be to create an index 
with the entity relationship going in the other direction.  So instead of one 
to many, go many to one.  You would end up with multiple documents with varying 
names but repeated parent entity information -- perhaps simply using just an ID 
which is used as a lookup. Do a search on this name field, sorting by a 
non-tokenized variant of the name field. Use Result-Grouping to consolidate 
multiple matches of a name to the same parent document. This whole idea might 
very well be academic since duplicating all the parent entity information for 
searching on that too might be a bit much than you care to bother with. And I 
don't think Solr 4's join feature addresses this use case. In the end, I think 
Solr could be modified to support this, with some work. It would make a good 
feature request in JIRA.

~ David Smiley

On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:

> Hi all-
>
> Well, this is a problem. I have a list of names as a multi-valued field and I 
> am searching on this field and need to return the results sorted. I know from 
> searching and reading the documentation (and getting the error) that sorting 
> on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
> real good solution/workaround to the problem. I was wondering what strategies 
> others have done to overcome this particular situation; collapsing the 
> individual names into a single field with copyField doesn't work because the 
> name searched may not be the first name in the field.
>
> Thanks for any hints/tips/tricks.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Olson, Ron
Right, the search term is the sort field. I can manually sort an individual 
page, but when the user clicks on the next page, the sort is "reset", visually.

-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com]
Sent: Wednesday, August 03, 2011 9:52 AM
To: solr-user@lucene.apache.org
Cc: Olson, Ron
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Although you weren't very clear about it, it sounds as if you want the
results to be sorted by a name that actually matched the query?  In
general that is not going to be easy, since it is not something that can
be computed in advance and thus indexed.


-Mike

On 08/03/2011 10:39 AM, Olson, Ron wrote:
> Hi all-
>
> Well, this is a problem. I have a list of names as a multi-valued field and I 
> am searching on this field and need to return the results sorted. I know from 
> searching and reading the documentation (and getting the error) that sorting 
> on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
> real good solution/workaround to the problem. I was wondering what strategies 
> others have done to overcome this particular situation; collapsing the 
> individual names into a single field with copyField doesn't work because the 
> name searched may not be the first name in the field.
>
> Thanks for any hints/tips/tricks.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Olson, Ron
Hi all-

Well, this is a problem. I have a list of names as a multi-valued field and I 
am searching on this field and need to return the results sorted. I know from 
searching and reading the documentation (and getting the error) that sorting on 
a multi-valued field isn't possible. Okay, so, what I haven't found is any real 
good solution/workaround to the problem. I was wondering what strategies others 
have done to overcome this particular situation; collapsing the individual 
names into a single field with copyField doesn't work because the name searched 
may not be the first name in the field.

Thanks for any hints/tips/tricks.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Determine which field term was found?

2011-07-21 Thread Olson, Ron
Hmm, okay, well, if that's the way it works, then I'll loop through the arrays, 
as the query is pretty much as described.

Related to what you said about how lucene works, do you think this is 
functionality something worth opening an enhancement request for, or is it such 
a tiny corner-case as to not be worth it?

Thanks a lot for the help!

Ron

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Thursday, July 21, 2011 4:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Determine which field term was found?

On Thu, Jul 21, 2011 at 4:47 PM, Olson, Ron  wrote:
> Is there an easy way to find out which field matched a term in an OR query 
> using Solr? I have a document with names in two multi-valued fields and I am 
> searching for "Smith", using the query "A_NAMES:smith OR B_NAMES:smith". I 
> figure I could loop through both result arrays, but that seems weird to me to 
> have to search again for the value in a result.

That's pretty much the way lucene currently works - you don't know
what fields match a query.
If the query is simple, looping over the returned stored fields is
probably your best bet.

There are a couple other tricks you could use (although they are not
necessarily better):
1) with grouping by query (a trunk feature) you can essentially return
both queries with one request:
  q=*:*&group=true&group.query=A_NAMES:smith&group.query=B_NAMES:smith
  and optionally add a "group.query=A_NAMES:smith OR B_NAMES:smith" if
you need the combined list
2) use pseudo-fields (also trunk) in conjunction with the termfreq
function (the number of times a term appears in a field).  This
obviously only works with term queries.
  fl=*,count1:termfreq(A_NAMES,'smith'),count2:termfreq(B_NAMES,'smith')
  You can use parameter substitution to pull out the actual term and
simplify the query:
  fl=*,count1:termfreq(A_NAMES,$term),count2:termfreq(B_NAMES,$term)&term=smith


-Yonik
http://www.lucidimagination.com


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Determine which field term was found?

2011-07-21 Thread Olson, Ron
Hi all-

Is there an easy way to find out which field matched a term in an OR query 
using Solr? I have a document with names in two multi-valued fields and I am 
searching for "Smith", using the query "A_NAMES:smith OR B_NAMES:smith". I 
figure I could loop through both result arrays, but that seems weird to me to 
have to search again for the value in a result.

Thanks for any info,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Unique document count from index?

2011-06-27 Thread Olson, Ron
Hi all-

I have a problem that I'm not sure how it can be (if it can be) solved in Solr. 
I am using Solr 3.2 with patch 2524 installed to provide grouping. I need to 
return the count of unique records that match a particular query.

For an example of what I'm talking about, imagine I have an index of music CD 
orders, created from a SQL database using the DataImportHandler. It's possible 
that the person ordered multiple records by the same artist (e.g. order #1234 
contains Pink Floyd "Wish You Were", Pink Floyd "Meddle", Pink Floyd "Obscured 
by Clouds"). One of the fields indexed and stored fields in the document is 
"Artist". If I do a search for Pink Floyd, using the order above, I'd get three 
documents, all with the same order number, for each of the Pink Floyd records. 
What I'd like to find out is how many unique orders have Pink Floyd across the 
entire index. The index has millions of documents.

I have been trying to see if the result grouping functionality provided by 
patch 2524 will help, but while it does collapse the query above into one 
document, the matches field is still the same as without the grouping (which I 
guess makes sense insofar as it is still reporting the number of documents it 
found for the query). I have also thought a subquery in my DataImportHandler 
might work, though I'm not sure how I'd structure it.

Thanks for any guidance on how to solve this problem; I know Solr isn't meant 
to be a data-mining tool and I'm guessing I'm skating perilously close to using 
it for that purpose, but anything I can do to take load from the actual 
database is considered a Good Thing by all concerned.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: XML Stripping from DIH

2011-02-22 Thread Olson, Ron
Thanks a lot! I thought I'd looked on this page but didn't see this one, not 
sure why.

I greatly appreciate it!

Ron

-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Sunday, February 20, 2011 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: XML Stripping from DIH

Ron,

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory


Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
> From: "Olson, Ron" 
> To: "solr-user@lucene.apache.org" 
> Sent: Fri, February 18, 2011 4:05:15 PM
> Subject: XML Stripping from DIH
>
> Hi all-
>
> I have some XML in a database that I am trying to index and  store; I am
>interested in the various pieces of text, but none of the tags. I've  been
>trying to figure out a way to strip all the tags out, but haven't found
>anything within Solr to do so; the XML parser seems to want XPath to get the
>various element values, when all I want is to turn the whole thing into one 
>blob
>of text, regardless of whether it makes any "contextual" sense.
>
> Is there  something in Solr to do this, or is it something I'd have to write
>myself (which  I'm willing to do if necessary)?
>
> Thanks for any  info,
>
> Ron
>
> DISCLAIMER: This electronic message, including any  attachments, files or
>documents, is intended only for the addressee and may  contain CONFIDENTIAL,
>PROPRIETARY or LEGALLY PRIVILEGED information.  If  you are not the intended
>recipient, you are hereby notified that any use,  disclosure, copying or
>distribution of this message or any of the information  included in or with it
>is  unauthorized and strictly prohibited.  If  you have received this message 
>in
>error, please notify the sender immediately by  reply e-mail and permanently
>delete and destroy this message and its  attachments, along with any copies
>thereof. This message does not create any  contractual obligation on behalf of
>the sender or Law Bulletin Publishing  Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


DIH and updating specific record

2011-02-22 Thread Olson, Ron
Hi all-

I am trying to determine if there is a way to tell Solr to update its index 
with a specific ID to a record in the database. All the examples and 
documentation seems to discuss using a "last updated" date/time field, but in 
this case modifying the table would not be an option. Instead, I'd like to 
invoke Solr's DIH delta query with a specific ID to say "here's something new 
or updated, please update your index with it".

I apologize if this is a trivial thing, but I can't seem to find any 
documentation on how to do it.

Thanks,

Ron


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is unauthorized and strictly prohibited. If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


XML Stripping from DIH

2011-02-18 Thread Olson, Ron
Hi all-

I have some XML in a database that I am trying to index and store; I am 
interested in the various pieces of text, but none of the tags. I've been 
trying to figure out a way to strip all the tags out, but haven't found 
anything within Solr to do so; the XML parser seems to want XPath to get the 
various element values, when all I want is to turn the whole thing into one 
blob of text, regardless of whether it makes any "contextual" sense.

Is there something in Solr to do this, or is it something I'd have to write 
myself (which I'm willing to do if necessary)?

Thanks for any info,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Setting up Solr for PDFs on JBoss

2011-01-04 Thread Olson, Ron
This is what I have; I didn't alter it so I believe it's the default:


  

  
  text
  true
  ignored_

  
  true
  links
  ignored_

  

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Monday, January 03, 2011 8:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Setting up Solr for PDFs on JBoss

What's your solrconfig.xml look like for setting up the ExtractingReqHandler?

-Grant

On Jan 3, 2011, at 4:44 PM, Olson, Ron wrote:

> Hi all-
>
> After testing the PDF import functionality in my local copy of Solr 1.4.1 
> with the included Jetty app server, I tried replicating it using my copy of 
> Solr running in JBoss 5.10 (which uses Tomcat as its servlet container). When 
> I try to add a PDF, I get an error buried in the stack trace:
>
> Caused by: org.apache.solr.common.SolrException: Error Instantiating Request 
> Handler, org.apache.solr.handler.extraction.ExtractingRequestHandler is not a 
> org.apache.solr.request.SolrRequestHandler
>
>
> I am using multiple cores, but they all use the common "lib" directory, 
> instead of the "core/lib" directory. This lib directory is what is added to 
> the classpath when JBoss starts ($JBOSS_HOME/server/solr_test/lib), so all 
> the jars in this directory should be available to anything in the "deploy" 
> directory (just mentioning in case people aren't familiar with JBoss). I've 
> added all the jars from the contrib/extraction/lib directory, as well as the 
> jars from dist.
>
> My lib directory is effectively:
>
> apache-solr-cell-1.4.1.jareasymock.jar  
> lucene-spellchecker-2.9.3.jar
> apache-solr-clustering-1.4.1.jar  fontbox-0.1.0.jar 
> nekohtml-1.9.9.jar
> apache-solr-core-1.4.1.jargeronimo-stax-api_1.0_spec-1.0.1.jar  
> ojdbc14.jar
> apache-solr-solrj-1.4.1.jar   geronimo-stax-api_1.0_spec-1.0.jar
> ooxml-schemas-1.0.jar
> asm-3.1.jar   icu4j-3.8.jar 
> pdfbox-0.7.3.jar
> bcmail-jdk14-136.jar  jcl-over-slf4j-1.5.5.jar  
> poi-3.5-beta6.jar
> bcprov-jdk14-136.jar  jempbox-0.2.0.jar 
> poi-ooxml-3.5-beta6.jar
> commons-codec-1.3.jar junit-4.3.jar 
> poi-scratchpad-3.5-beta6.jar
> commons-compress-1.0.jar  log4j-1.2.14.jar  
> slf4j-api-1.5.5.jar
> commons-csv-1.0-SNAPSHOT-r609327.jar  lucene-analyzers-2.9.3.jar
> slf4j-jdk14-1.5.5.jar
> commons-fileupload-1.2.1.jar  lucene-core-2.9.3.jar 
> tika-core-0.4.jar
> commons-httpclient-3.1.jarlucene-highlighter-2.9.3.jar  
> tika-parsers-0.4.jar
> commons-io-1.4.jarlucene-memory-2.9.3.jar   
> wstx-asl-3.2.7.jar
> commons-lang-2.1.jar  lucene-misc-2.9.3.jar 
> xercesImpl-2.8.1.jar
> commons-logging-1.1.1.jar lucene-queries-2.9.3.jar  
> xml-apis-1.0.b2.jar
> dom4j-1.6.1.jar   lucene-snowball-2.9.3.jar 
> xmlbeans-2.3.0.jar
>
> I know several of these jars are already essentially present in JBoss (log4j, 
> for example), but I'm at a loss as to what to remove/add to get it to work. 
> Anyone have any ideas of configuring it under JBoss? The other cores are 
> database-based (thus the use of ojdbc14.jar), and they work fine.
>
> Thanks for any help,
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.

--
Grant Ingersoll
http://www.lucidimagination.com



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this messa

Setting up Solr for PDFs on JBoss

2011-01-03 Thread Olson, Ron
Hi all-

After testing the PDF import functionality in my local copy of Solr 1.4.1 with 
the included Jetty app server, I tried replicating it using my copy of Solr 
running in JBoss 5.10 (which uses Tomcat as its servlet container). When I try 
to add a PDF, I get an error buried in the stack trace:

Caused by: org.apache.solr.common.SolrException: Error Instantiating Request 
Handler, org.apache.solr.handler.extraction.ExtractingRequestHandler is not a 
org.apache.solr.request.SolrRequestHandler


I am using multiple cores, but they all use the common "lib" directory, instead 
of the "core/lib" directory. This lib directory is what is added to the 
classpath when JBoss starts ($JBOSS_HOME/server/solr_test/lib), so all the jars 
in this directory should be available to anything in the "deploy" directory 
(just mentioning in case people aren't familiar with JBoss). I've added all the 
jars from the contrib/extraction/lib directory, as well as the jars from dist.

My lib directory is effectively:

apache-solr-cell-1.4.1.jareasymock.jar  
lucene-spellchecker-2.9.3.jar
apache-solr-clustering-1.4.1.jar  fontbox-0.1.0.jar 
nekohtml-1.9.9.jar
apache-solr-core-1.4.1.jargeronimo-stax-api_1.0_spec-1.0.1.jar  
ojdbc14.jar
apache-solr-solrj-1.4.1.jar   geronimo-stax-api_1.0_spec-1.0.jar
ooxml-schemas-1.0.jar
asm-3.1.jar   icu4j-3.8.jar 
pdfbox-0.7.3.jar
bcmail-jdk14-136.jar  jcl-over-slf4j-1.5.5.jar  
poi-3.5-beta6.jar
bcprov-jdk14-136.jar  jempbox-0.2.0.jar 
poi-ooxml-3.5-beta6.jar
commons-codec-1.3.jar junit-4.3.jar 
poi-scratchpad-3.5-beta6.jar
commons-compress-1.0.jar  log4j-1.2.14.jar  
slf4j-api-1.5.5.jar
commons-csv-1.0-SNAPSHOT-r609327.jar  lucene-analyzers-2.9.3.jar
slf4j-jdk14-1.5.5.jar
commons-fileupload-1.2.1.jar  lucene-core-2.9.3.jar 
tika-core-0.4.jar
commons-httpclient-3.1.jarlucene-highlighter-2.9.3.jar  
tika-parsers-0.4.jar
commons-io-1.4.jarlucene-memory-2.9.3.jar   
wstx-asl-3.2.7.jar
commons-lang-2.1.jar  lucene-misc-2.9.3.jar 
xercesImpl-2.8.1.jar
commons-logging-1.1.1.jar lucene-queries-2.9.3.jar  
xml-apis-1.0.b2.jar
dom4j-1.6.1.jar   lucene-snowball-2.9.3.jar 
xmlbeans-2.3.0.jar

I know several of these jars are already essentially present in JBoss (log4j, 
for example), but I'm at a loss as to what to remove/add to get it to work. 
Anyone have any ideas of configuring it under JBoss? The other cores are 
database-based (thus the use of ojdbc14.jar), and they work fine.

Thanks for any help,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Testing/packaging question

2010-11-04 Thread Olson, Ron
I believe it should point to the directory above, where conf and lib are 
located (though I have a multi-core setup).

Mine is set to:

/usr/local/jboss-5.1.0.GA/server/solr/solr_data/

And in solr_data the solr.xml defines the two cores, but in each core 
directory, is a conf, data, and lib directory, which contains the schema.xml.



-Original Message-
From: Bernhard Reiter [mailto:ock...@raz.or.at]
Sent: Thursday, November 04, 2010 3:49 PM
To: solr-user@lucene.apache.org
Subject: Re: Testing/packaging question

Hi,

I'm now trying to

export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml"

and restarting tomcat (v6 package from ubuntu maverick) via

sudo /etc/init.d/tomcat6 restart

but solr still doesn't seem to find that schema.xml, as it complains
about unknown fields when running the tests that require that schema.xml

Can someone please tell me what I'm doing wrong -- and what I should be
doing?

TIA again,
Bernhard

Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter:
> Hi,
>
> I'm pretty much of a Solr newbie currently packaging solrpy for Debian;
> see
> http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/
>
> In order to run solrpy's supplied tests at build time, I'd need Solr to
> know about the schema.xml that comes with the tests.
> Can anyone tell me how do that properly? I'd basically need Solr to
> temporarily recognize that schema.xml without permanently installing it
> -- is there any way to do this, eg via environment variables?
>
> TIA
> Bernhard Reiter




DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Using setStart in solrj

2010-11-04 Thread Olson, Ron
Hi all-

First, thanks to all the folks to have helped me so far getting the hang of 
Solr; I promise to give back when I think my contributions will be useful :)

I am at the point where I'm trying to return results back from a search in a 
war file, using Java with solrj. On the result page of the website I'd want to 
limit the actual results to probably around 20 or so, with the usual "next/prev 
page" paradigm. The issue I've been wrestling with is keeping the SolrQuery 
object around so that I don't need to transmit the entire thing back to the 
client, especially if they search for something like "truck", which could 
return a lot of results.

I was thinking that one solution would be to do a "query.setRows(20);" for the 
query, then return the results back with some sort of an identifier so that on 
subsequent queries, I could also include "query.setStart(someCounter + 1);" to 
get the next set of 20. In theory, that would work at the cost of having to 
re-execute the query.

I've been looking for information about setStart() and haven't found much more 
than Javadoc that says "sets the starting row for the result set". My question 
is, how do I know what the starting row is? Maybe, based on the search 
parameters, it will always return the results in an implicit order in which 
case is it just like executing a fixed query in a database and then grabbing 
the next 20 rows from the result set? Because the user would be pressing the 
prev/next buttons, even though the query is being re-executed, the parameters 
would not be changing.

That's the theory, anyway. It seems excessive to keep executing the same query 
over and over again just because the user wants to see the next set of results, 
especially if the original SolrQuery object has them all, but maybe that's just 
what needs to be done, given the stateless nature of the web.

Any info on this method/strategy would be most appreciated.

Thanks,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Stored or indexed?

2010-11-02 Thread Olson, Ron
Thanks for the great info! I appreciate everybody's help in getting started 
with Solr, hopefully I'll be able to get my stuff working and move on to more 
difficult questions. :)

-Original Message-
From: Elizabeth L. Murnane [mailto:emurn...@architexa.com]
Sent: Friday, October 29, 2010 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Stored or indexed?

Hi Ron,

In a nutshell - an indexed field is searchable, and a stored field has its 
content stored in the index so it is retrievable. Here are some examples that 
will hopefully give you a feel for how to set the indexed and stored options:

indexed="true" stored="true"
Use this for information you want to search on and also display in search 
results - for example, book title or author.

indexed="false" stored="true"
Use this for fields that you want displayed with search results but that don't 
need to be searchable - for example, destination URL, file system path, time 
stamp, or icon image.

indexed="true" stored="false"
Use this for fields you want to search on but don't need to get their values in 
search results. Here are some of the common reasons you would want this:

Large fields and a database: Storing a field makes your index larger, so set 
stored to false when possible, especially for big fields. For this case a 
database is often used, as the previous responder said. Use a separate 
identifier field to get the field's content from the database.

Ordering results: Say you define field name="bookName" type="text" 
indexed="true" stored="true" that is tokenized and used for searching. If you 
want to sort results based on book name, you could copy the field into a 
separate nonretrievable, nontokenized field that can be used just for sorting -
field name="bookSort" type="string" indexed="true" stored="false"
copyField source="bookName" dest="bookSort"

Easier searching: If you define the field  you can use it as a 
catch-all field that contains all of the other text fields. Since solr looks in 
a default field when given a text query without field names, you can support 
this type of general phrase query by making the catch-all the default field.

indexed="false" stored="false"
Use this when you want to ignore fields. For example, the following will ignore 
unknown fields that don't match a defined field rather than throwing an error 
by default.
fieldtype name="ignored" stored="false" indexed="false"
dynamicField name="*" type="ignored"


Elizabeth Murnane
emurn...@architexa.com
Architexa Lead Developer - www.architexa.com
Understand & Document Code In Seconds


--- On Thu, 10/28/10, Savvas-Andreas Moysidis 
 wrote:

From: Savvas-Andreas Moysidis 
Subject: Re: Stored or indexed?
To: solr-user@lucene.apache.org
Date: Thursday, October 28, 2010, 4:25 AM

In our case, we just store a database id and do a secondary db query when
displaying the results.
This is handy and leads to a more centralised architecture when you need to
display properties of a domain object which you don't index/search.

On 28 October 2010 05:02, kenf_nc  wrote:

>
> Interesting wiki link, I hadn't seen that table before.
>
> And to answer your specific question about indexed=true, stored=false, this
> is most often done when you are using analyzers/tokenizers on your field.
> This field is for search only, you would never retrieve it's contents for
> display. It may in fact be an amalgam of several fields into one 'content'
> field. You have your display copy stored in another field marked
> indexed=false, stored=true and optionally compressed. I also have simple
> string fields set to lowercase so searching is case-insensitive, and have a
> duplicate field where the string is normal case. the first one is
> indexed/not stored, the second is stored/not indexed.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Stored or indexed?

2010-10-27 Thread Olson, Ron
Hi all-

I've read through the documentation, but I'm still a little confused about the 
 tag, in terms of the indexed and stored attributes. If I have 
something marked as indexed="true", why would I ever want stored="false"? Are 
there any good tips-n-tricks anywhere about how to properly set the field tag? 
I've been finding bits and pieces both on the wiki and a couple of other 
websites, but there doesn't seem to be a good definitive how-to on this.

Thanks for any info,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


RE: Confusion about entities and documents

2010-10-22 Thread Olson, Ron
Hmm, okay, I guess I wasn't taking the hierarchy-flattening aspect of Solr 
seriously enough. :)

Based on your reply from the other thread, I guess the best solution, as far as 
I can tell, is to maintain the multiple value lists and take advantage of the 
fact that the arrays will always be in the right order:


1
2


ABC Corp 
XYZ Inc  


So I guess the problem isn't really *sooo* bad...I just need to make sure that 
I have the appropriate names defined so I can link between two arrays in my 
client code. I suppose I could keep things straight by preserving the hierarchy 
within the name attribute.



-Original Message-
From: harrysmith [mailto:harrysmith...@gmail.com]
Sent: Friday, October 22, 2010 4:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Confusion about entities and documents


>What I get when I search for, say, "XYZ", is a document that has XYZ Corp as
a manufacturer name, but the >array of parts_manu appears to be a child of
the document, not the parts array.
>
>Is this the correct behavior, insofar as a document has a single level of
elements, and that's it? If so, what >might be a better strategy for being
able to maintain the hierarchy of information within a document?
>

Yes, this is the correct behavior. I still struggle with the same issue, and
there is no 'best practices' (that I have found at least) of maintaining
relationships within a Solr doc. The argument is Solr is not the correct
place for these representations and should only represent a flat version of
your document.

For a similar question see:
http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593

A few possible solutions are posted there, and i'm interested in how others
have tackled this issue.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html
Sent from the Solr - User mailing list archive at Nabble.com.


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Confusion about entities and documents

2010-10-22 Thread Olson, Ron
Hi all-

I've been checking the online docs about this, but I haven't found a suitable 
explanation about how entities and sub-entities work within a document. I am 
loading records from a SQL database and everything seems to be getting 
flattened in a way I was not expecting.

For example, I have a document that defines, say, "engine". The engine is made 
up of parts, which are manufactured by various companies. A hypothetical, 
abbreviated config would be:





...





What I get when I search for, say, "XYZ", is a document that has XYZ Corp as a 
manufacturer name, but the array of parts_manu appears to be a child of the 
document, not the parts array.

Is this the correct behavior, insofar as a document has a single level of 
elements, and that's it? If so, what might be a better strategy for being able 
to maintain the hierarchy of information within a document?

Thanks for any info,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Documents and Cores, take 2

2010-10-19 Thread Olson, Ron
Hi all-

I have a newbie design question about documents, especially with SQL databases. 
I am trying to set up Solr to go against a database that, for example, has 
"items" and "people". The way I see it, and I don't know if this is right or 
not (thus the question), is that I see both as separate documents as an item 
may contain a list of parts, which the user may want to search, and, as part of 
the "item", view the list of people who have ordered the item.

Then there's the actual "people", who the user might want to search to find a 
name and, consequently, what items they ordered. To me they are both "top 
level" things, with some overlap of fields. If I'm searching for "people", I'm 
likely not going to be interested in the parts of the item, while if I'm 
searching for "items" the likelihood is that I may want to search for "42532" 
which is, in this instance, a SKU, and not get hits on the zip code section of 
the "people".

Does it make sense, then, to separate these two out as separate documents? I 
believe so because the documentation I've read suggests that a document should 
be analogous to a row in a table (in this case, very de-normalized). What is 
tripping me up is, as far as I can tell, you can have only one document type 
per index, and thus one document per core. So in this example, I have two 
cores, "items" and "people". Is this correct? Should I embrace the idea of 
having many cores or am I supposed to have a single, unified index with all 
documents (which doesn't seem like Solr supports).

The ultimate question comes down to the search interface. I don't necessarily 
want to have the user explicitly state which document they want to search; I'd 
like them to simply type "42532" and get documents from both cores, and then 
possibly allow for filtering results after the fact, not before. As I've only 
used the admin site so far (which is core-specific), does the client API allow 
for unified searching across all cores? Assuming it does, I'd think my idea of 
multiple-documents is okay, but I'd love to hear from people who actually know 
what they're doing. :)

Thanks,

Ron

BTW: Sorry about the problem with the previous message; I didn't know about 
thread hijacking.

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Documents and cores

2010-10-19 Thread Olson, Ron
Hi all-

I have a newbie design question about documents, especially with SQL databases. 
I am trying to set up Solr to go against a database that, for example, has 
"items" and "people". The way I see it, and I don't know if this is right or 
not (thus the question), is that I see both as separate documents as an item 
may contain a list of parts, which the user may want to search, and, as part of 
the "item", view the list of people who have ordered the item.

Then there's the actual "people", who the user might want to search to find a 
name and, consequently, what items they ordered. To me they are both "top 
level" things, with some overlap of fields. If I'm searching for "people", I'm 
likely not going to be interested in the parts of the item, while if I'm 
searching for "items" the likelihood is that I may want to search for "42532" 
which is, in this instance, a SKU, and not get hits on the zip code section of 
the "people".

Does it make sense, then, to separate these two out as separate documents? I 
believe so because the documentation I've read suggests that a document should 
be analogous to a row in a table (in this case, very de-normalized). What is 
tripping me up is, as far as I can tell, you can have only one document type 
per index, and thus one document per core. So in this example, I have two 
cores, "items" and "people". Is this correct? Should I embrace the idea of 
having many cores or am I supposed to have a single, unified index with all 
documents (which doesn't seem like Solr supports).

The ultimate question comes down to the search interface. I don't necessarily 
want to have the user explicitly state which document they want to search; I'd 
like them to simply type "42532" and get documents from both cores, and then 
possibly allow for filtering results after the fact, not before. As I've only 
used the admin site so far (which is core-specific), does the client API allow 
for unified searching across all cores? Assuming it does, I'd think my idea of 
multiple-documents is okay, but I'd love to hear from people who actually know 
what they're doing. :)

Thanks,

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.