Multi-word searches in multi-valued fields
Hi all- I'm not clear on how to allow a user to search a multi-valued field with multiple words and return only those documents where all the words are together in one value, and not spread over multiple values. If I do a literal search on the "company name" field for "smith trucking" (with the quotes), then it works because it's looking for only "smith trucking", and it finds it, great. However, if I put in "trucking smith", then I get no results. If I try using something like (+trucking +smith), then I get documents where one document might have "joe's trucking" and "bob smith" in the resulting array of names. So I guess what I need is an exact match, regardless of word positioning (i.e. "smith trucking" and "trucking smith" should find only those documents that have that those two words in one value of the resulting array). I've been going through the wiki and it seems like this is probably a super-simple thing, but I'm clearly just not getting it; I just can't figure out the right syntax to make this work. Thanks for any info. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Two unrelated questions
Thanks for the reply. As far as #1, my table that I'm indexing via DIH has a PK field, generated by a sequence, so there are records with ID of 1, 2, 3, etc. That same id is the one I use in my unique id field in the document (ID). I've noticed that the table has, say, 10 rows. My index only has 8. I don't know why that is, but I'd like to figure out which records are missing and add them (and hopefully understand why they weren't added in the first place). I was just wondering if there was some way to compare the two as part of a sql query, but on reflection, it does seem like an absurd request, so I apologize; I think what I'll have to do is write a solrj program that gets every ID in the table, then does a search on that ID in the index, and add the ones that are missing. Regarding the second item, yes, it's crazy but I'm not sure what to do; there really are that many options and some searches will be extremely specific, yet broad enough in terms for this to be a problem. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, September 21, 2011 3:55 PM To: solr-user@lucene.apache.org Subject: Re: Two unrelated questions for <1> I don't quite get what you're driving at. Your DIH query assigns the uniqueKey, it's not like it's something auto-generated. Perhaps a concrete example would help. <2> There's a limit you can adjust that defaults to 1024 (maxBooleanClauses in solrconfig.xml). You can bump this very high, but you're right, if anyone actually does something absurd it'll slow *that* query down. But just bumping this query higher won't change performance absent someone actually putting a ton of items in it... Best Erick On Mon, Sep 19, 2011 at 9:12 AM, Olson, Ron wrote: > Hi all- > > I'm not sure if I should break this out into two separate questions to the > list for searching purposes, or if one is more acceptable (don't want to > flood). > > I have two (hopefully) straightforward questions: > > 1. Is it possible to expose the unique ID of a document to a DIH query? The > reason I want to do this is because I use the unique ID of the row in the > table as the unique ID of the Lucene document, but I've noticed that the > counts of documents doesn't match the count in the table; I'd like to add > these rows and was hoping to avoid writing a custom SolrJ app to do it. > > 2. Is there any limit to the number of conditions in a Boolean search? We're > working on a new project where the user can choose either, for example, "Ford > Vehicles", in which case I can simply search for "Ford", but if the user > chooses specific makes and models, then I have to say something like "Crown > Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose > every model of Ford ever made except one. This could lead to a *very* large > query, and was worried both that it was even possible, but also the impact on > performance. > > > Thanks, and I apologize if this really should be two separate messages. > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. > DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Two unrelated questions
Hi all- I'm not sure if I should break this out into two separate questions to the list for searching purposes, or if one is more acceptable (don't want to flood). I have two (hopefully) straightforward questions: 1. Is it possible to expose the unique ID of a document to a DIH query? The reason I want to do this is because I use the unique ID of the row in the table as the unique ID of the Lucene document, but I've noticed that the counts of documents doesn't match the count in the table; I'd like to add these rows and was hoping to avoid writing a custom SolrJ app to do it. 2. Is there any limit to the number of conditions in a Boolean search? We're working on a new project where the user can choose either, for example, "Ford Vehicles", in which case I can simply search for "Ford", but if the user chooses specific makes and models, then I have to say something like "Crown Vic OR Focus OR Taurus OR F-150", etc., where they could theoretically choose every model of Ford ever made except one. This could lead to a *very* large query, and was worried both that it was even possible, but also the impact on performance. Thanks, and I apologize if this really should be two separate messages. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Add copyTo Field without re-indexing?
Hi all- I have an 11 gig index that I realize I need to add another field to, but not from the actual query using DIH, but via copyTo. Is there any way to re-parse an existing index, adding the new copyTo field, without having to basically start all over again with DIH? Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Parent delta query, but no child delta query?
Hi all- I'm trying to set up a delta query for a parent entity query that has many sub-queries. The table referenced in the parent query has a "last updated" field, but none of the children do. The way the data is set up is that when a child table is updated, the "last updated" field of the parent is set. So my question is whether I need to set up delta queries for my child entities; if the parent is updated I just want the whole record replaced. Do I need to set up the delta queries for the children? The DIH page doesn't seem to be specific on this point; they talk about what happens when a child record is updated and notifying the parent. Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Exact matching on names?
Thank you Sujit and Rob for your help; I took the "easy" way and created a new field type that is identical to text, but with the stemmer removed. This seems, so far, to work exactly as needed. To help anyone else who comes across this issue, this is the field type I used: -Original Message- From: Sujit Pal [mailto:sujit@comcast.net] Sent: Tuesday, August 16, 2011 12:18 PM To: solr-user@lucene.apache.org Subject: Re: Exact matching on names? Hi Ron, There was a discussion about this some time back, which I implemented (with great success btw) in my own code...basically you store both the analyzed and non-analyzed versions (use string type) in the index, then send in a query like this: +name:clarke name_s:"clarke"^100 The name field is text so it will analyze down "clarke" to "clark" but it will match both "clark" and "clarke" and the second clause would boost the entry with "clarke" up to the top, which you then select with rows=1. -sujit On Tue, 2011-08-16 at 10:20 -0500, Olson, Ron wrote: > Hi all- > > I'm missing something fundamental yet I've been unable to find the definitive > answer for exact name matching. I'm indexing names using the standard "text" > field type and my search is for the name "clarke". My results include > "clark", which is incorrect, it needs to match clarke exactly (case > insensitive). > > I tried textType but that doesn't work because I believe it needs to be > *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, > and clark", etc. > > Thanks for any help, > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Exact matching on names?
Hi all- I'm missing something fundamental yet I've been unable to find the definitive answer for exact name matching. I'm indexing names using the standard "text" field type and my search is for the name "clarke". My results include "clark", which is incorrect, it needs to match clarke exactly (case insensitive). I tried textType but that doesn't work because I believe it needs to be *really* exact, whereas I'm looking for things like "clark oil", "bob, frank, and clark", etc. Thanks for any help, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Dates off by 1 day?
Ah, great! I knew the problem was between the keyboard and the chair. Thanks! -Original Message- From: Sethi, Parampreet [mailto:parampreet.se...@teamaol.com] Sent: Wednesday, August 10, 2011 10:25 AM To: solr-user@lucene.apache.org Subject: Re: Dates off by 1 day? The Date difference is coming because of different time zones. In Solr the date is stored as Zulu time zone and Solrj is returning date in CDT timezone (jvm is picking system time zone.) > 2002-05-13T00:00:00Z > I get: > > --> Sun May 12 19:00:00 CDT 2002 You can convert Date in different time-zones using Java Util date functions if required. Hope it helps! -param On 8/10/11 11:20 AM, "Olson, Ron" wrote: > Hi all- > > I apologize in advance if this turns out to be a problem between the keyboard > and the chair, but I'm confused about why my date field is correct in the > index, but wrong in SolrJ. > > I have a field defined as a date in the index: > > > > And if I use the admin site to query the data, I get the right date: > > 2002-05-13T00:00:00Z > > But in my SolrJ code: > > Iterator iter = queryResponse.getResults().iterator(); > > while (iter.hasNext()) > { > SolrDocument resultDoc = iter.next(); > > System.out.println("--> " + resultDoc.getFieldValue("FILE_DATE")); > > } > > I get: > > --> Sun May 12 19:00:00 CDT 2002 > > I've been searching around through the wiki and other places, but can't seem > to find anything that either mentions this problem or talks about date > handling in Solr/SolrJ that might refer to something like this. > > Thanks for any info, > > Ron > > > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and permanently > delete and destroy this message and its attachments, along with any copies > thereof. This message does not create any contractual obligation on behalf of > the sender or Law Bulletin Publishing Company. > Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Dates off by 1 day?
Hi all- I apologize in advance if this turns out to be a problem between the keyboard and the chair, but I'm confused about why my date field is correct in the index, but wrong in SolrJ. I have a field defined as a date in the index: And if I use the admin site to query the data, I get the right date: 2002-05-13T00:00:00Z But in my SolrJ code: Iterator iter = queryResponse.getResults().iterator(); while (iter.hasNext()) { SolrDocument resultDoc = iter.next(); System.out.println("--> " + resultDoc.getFieldValue("FILE_DATE")); } I get: --> Sun May 12 19:00:00 CDT 2002 I've been searching around through the wiki and other places, but can't seem to find anything that either mentions this problem or talks about date handling in Solr/SolrJ that might refer to something like this. Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: deleting index directory/files
I ran into a problem when I deleted just the "index" directory; I deleted the entire data directory and it was recreated on the next load. BTW, if you're using the DIH, its default behavior is to remove all records on a full import, so you can save yourself having to remove any actual files. -Original Message- From: Mark juszczec [mailto:mark.juszc...@gmail.com] Sent: Thursday, August 04, 2011 4:01 PM To: solr-user@lucene.apache.org Subject: deleting index directory/files Hello all I'm using multiple cores. I there's a directory named by the core and it contains a subdir named data that contains a subdir named index that contains a bunch of files that contain the data for my index. Let's say I want to completely rebuild the index from scratch. Can I delete the dir named index? I know the next thing I'd have to do is a full data import, and that's ok. I want to blow away any traces of the core's previous existence. Mark DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Strategies for sorting by array, when you can't sort by array?
For anyone who comes across this topic in the future, I "solved" the problem this way: by agreement with the stakeholders, on the presumption that no one would look at more than 5000 records, I modified my search code so that, if the user selected to sort by the name, I set the row count to return (query.setRows) to 5000. I then put all the result records into a list, sort it, then, depending on what page they're on, extract that subset of the 5000 and return it. There is a small performance hit on initial searching for common names (e.g. Smith, Jones, etc.), but the performance is still far more acceptable than the legacy system Solr is meant to replace (a few seconds as opposed to twenty(!) minutes). Most certainly there are better ways, but this one worked for me, and wanted to make sure it was added to the pool of options for anyone who comes across this problem in the future. Thanks to everyone who offered suggestions! Ron -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, August 03, 2011 11:36 AM To: solr-user@lucene.apache.org Cc: Olson, Ron Subject: Re: Strategies for sorting by array, when you can't sort by array? Not so much that it's a corner case in the sense of being unusual neccesarily (I'm not sure), it's just something that fundamentally doesn't fit well into lucene's architecture. I'm not sure that filing a JIRA will be much use, it's really unclear how one would get lucene to do this, it would be signficant work to do, and it's unlikely any Solr developer is going to decide to spend signficant time on it unless they need it for their own clients. On 8/3/2011 11:40 AM, Olson, Ron wrote: > *Sigh*...I had thought maybe reversing it would work, but that would require > creating a whole new index, on a separate core, as the existing index is used > for other purposes. Plus, given the volume of data, that would be a big deal, > update-wise. What would be better would be to remove that particular sort > option-button on the webpage. ;) > > I'll create a Jira issue, but in the meanwhile I'll have to come up with > something else. I guess I didn't realize how much of a "corner case" this > problem is. :) > > Thanks for the suggestions! > > Ron > > -Original Message- > From: Smiley, David W. [mailto:dsmi...@mitre.org] > Sent: Wednesday, August 03, 2011 10:26 AM > To: solr-user@lucene.apache.org > Subject: Re: Strategies for sorting by array, when you can't sort by array? > > Hi Ron. > This is an interesting problem you have. One idea would be to create an index > with the entity relationship going in the other direction. So instead of one > to many, go many to one. You would end up with multiple documents with > varying names but repeated parent entity information -- perhaps simply using > just an ID which is used as a lookup. Do a search on this name field, sorting > by a non-tokenized variant of the name field. Use Result-Grouping to > consolidate multiple matches of a name to the same parent document. This > whole idea might very well be academic since duplicating all the parent > entity information for searching on that too might be a bit much than you > care to bother with. And I don't think Solr 4's join feature addresses this > use case. In the end, I think Solr could be modified to support this, with > some work. It would make a good feature request in JIRA. > > ~ David Smiley > > On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: > >> Hi all- >> >> Well, this is a problem. I have a list of names as a multi-valued field and >> I am searching on this field and need to return the results sorted. I know >> from searching and reading the documentation (and getting the error) that >> sorting on a multi-valued field isn't possible. Okay, so, what I haven't >> found is any real good solution/workaround to the problem. I was wondering >> what strategies others have done to overcome this particular situation; >> collapsing the individual names into a single field with copyField doesn't >> work because the name searched may not be the first name in the field. >> >> Thanks for any hints/tips/tricks. >> >> Ron >> >> DISCLAIMER: This electronic message, including any attachments, files or >> documents, is intended only for the addressee and may contain CONFIDENTIAL, >> PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended >> recipient, you are hereby notified that any use, disclosure, copying or >> distribution of this message or any of the information included in or with >> it is unauthorized and strictly prohibited. If you hav
RE: Strategies for sorting by array, when you can't sort by array?
*Sigh*...I had thought maybe reversing it would work, but that would require creating a whole new index, on a separate core, as the existing index is used for other purposes. Plus, given the volume of data, that would be a big deal, update-wise. What would be better would be to remove that particular sort option-button on the webpage. ;) I'll create a Jira issue, but in the meanwhile I'll have to come up with something else. I guess I didn't realize how much of a "corner case" this problem is. :) Thanks for the suggestions! Ron -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 03, 2011 10:26 AM To: solr-user@lucene.apache.org Subject: Re: Strategies for sorting by array, when you can't sort by array? Hi Ron. This is an interesting problem you have. One idea would be to create an index with the entity relationship going in the other direction. So instead of one to many, go many to one. You would end up with multiple documents with varying names but repeated parent entity information -- perhaps simply using just an ID which is used as a lookup. Do a search on this name field, sorting by a non-tokenized variant of the name field. Use Result-Grouping to consolidate multiple matches of a name to the same parent document. This whole idea might very well be academic since duplicating all the parent entity information for searching on that too might be a bit much than you care to bother with. And I don't think Solr 4's join feature addresses this use case. In the end, I think Solr could be modified to support this, with some work. It would make a good feature request in JIRA. ~ David Smiley On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: > Hi all- > > Well, this is a problem. I have a list of names as a multi-valued field and I > am searching on this field and need to return the results sorted. I know from > searching and reading the documentation (and getting the error) that sorting > on a multi-valued field isn't possible. Okay, so, what I haven't found is any > real good solution/workaround to the problem. I was wondering what strategies > others have done to overcome this particular situation; collapsing the > individual names into a single field with copyField doesn't work because the > name searched may not be the first name in the field. > > Thanks for any hints/tips/tricks. > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Strategies for sorting by array, when you can't sort by array?
Right, the search term is the sort field. I can manually sort an individual page, but when the user clicks on the next page, the sort is "reset", visually. -Original Message- From: Mike Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, August 03, 2011 9:52 AM To: solr-user@lucene.apache.org Cc: Olson, Ron Subject: Re: Strategies for sorting by array, when you can't sort by array? Although you weren't very clear about it, it sounds as if you want the results to be sorted by a name that actually matched the query? In general that is not going to be easy, since it is not something that can be computed in advance and thus indexed. -Mike On 08/03/2011 10:39 AM, Olson, Ron wrote: > Hi all- > > Well, this is a problem. I have a list of names as a multi-valued field and I > am searching on this field and need to return the results sorted. I know from > searching and reading the documentation (and getting the error) that sorting > on a multi-valued field isn't possible. Okay, so, what I haven't found is any > real good solution/workaround to the problem. I was wondering what strategies > others have done to overcome this particular situation; collapsing the > individual names into a single field with copyField doesn't work because the > name searched may not be the first name in the field. > > Thanks for any hints/tips/tricks. > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. > DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Strategies for sorting by array, when you can't sort by array?
Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Determine which field term was found?
Hmm, okay, well, if that's the way it works, then I'll loop through the arrays, as the query is pretty much as described. Related to what you said about how lucene works, do you think this is functionality something worth opening an enhancement request for, or is it such a tiny corner-case as to not be worth it? Thanks a lot for the help! Ron -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, July 21, 2011 4:27 PM To: solr-user@lucene.apache.org Subject: Re: Determine which field term was found? On Thu, Jul 21, 2011 at 4:47 PM, Olson, Ron wrote: > Is there an easy way to find out which field matched a term in an OR query > using Solr? I have a document with names in two multi-valued fields and I am > searching for "Smith", using the query "A_NAMES:smith OR B_NAMES:smith". I > figure I could loop through both result arrays, but that seems weird to me to > have to search again for the value in a result. That's pretty much the way lucene currently works - you don't know what fields match a query. If the query is simple, looping over the returned stored fields is probably your best bet. There are a couple other tricks you could use (although they are not necessarily better): 1) with grouping by query (a trunk feature) you can essentially return both queries with one request: q=*:*&group=true&group.query=A_NAMES:smith&group.query=B_NAMES:smith and optionally add a "group.query=A_NAMES:smith OR B_NAMES:smith" if you need the combined list 2) use pseudo-fields (also trunk) in conjunction with the termfreq function (the number of times a term appears in a field). This obviously only works with term queries. fl=*,count1:termfreq(A_NAMES,'smith'),count2:termfreq(B_NAMES,'smith') You can use parameter substitution to pull out the actual term and simplify the query: fl=*,count1:termfreq(A_NAMES,$term),count2:termfreq(B_NAMES,$term)&term=smith -Yonik http://www.lucidimagination.com DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Determine which field term was found?
Hi all- Is there an easy way to find out which field matched a term in an OR query using Solr? I have a document with names in two multi-valued fields and I am searching for "Smith", using the query "A_NAMES:smith OR B_NAMES:smith". I figure I could loop through both result arrays, but that seems weird to me to have to search again for the value in a result. Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Unique document count from index?
Hi all- I have a problem that I'm not sure how it can be (if it can be) solved in Solr. I am using Solr 3.2 with patch 2524 installed to provide grouping. I need to return the count of unique records that match a particular query. For an example of what I'm talking about, imagine I have an index of music CD orders, created from a SQL database using the DataImportHandler. It's possible that the person ordered multiple records by the same artist (e.g. order #1234 contains Pink Floyd "Wish You Were", Pink Floyd "Meddle", Pink Floyd "Obscured by Clouds"). One of the fields indexed and stored fields in the document is "Artist". If I do a search for Pink Floyd, using the order above, I'd get three documents, all with the same order number, for each of the Pink Floyd records. What I'd like to find out is how many unique orders have Pink Floyd across the entire index. The index has millions of documents. I have been trying to see if the result grouping functionality provided by patch 2524 will help, but while it does collapse the query above into one document, the matches field is still the same as without the grouping (which I guess makes sense insofar as it is still reporting the number of documents it found for the query). I have also thought a subquery in my DataImportHandler might work, though I'm not sure how I'd structure it. Thanks for any guidance on how to solve this problem; I know Solr isn't meant to be a data-mining tool and I'm guessing I'm skating perilously close to using it for that purpose, but anything I can do to take load from the actual database is considered a Good Thing by all concerned. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: XML Stripping from DIH
Thanks a lot! I thought I'd looked on this page but didn't see this one, not sure why. I greatly appreciate it! Ron -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Sunday, February 20, 2011 5:59 AM To: solr-user@lucene.apache.org Subject: Re: XML Stripping from DIH Ron, http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message > From: "Olson, Ron" > To: "solr-user@lucene.apache.org" > Sent: Fri, February 18, 2011 4:05:15 PM > Subject: XML Stripping from DIH > > Hi all- > > I have some XML in a database that I am trying to index and store; I am >interested in the various pieces of text, but none of the tags. I've been >trying to figure out a way to strip all the tags out, but haven't found >anything within Solr to do so; the XML parser seems to want XPath to get the >various element values, when all I want is to turn the whole thing into one >blob >of text, regardless of whether it makes any "contextual" sense. > > Is there something in Solr to do this, or is it something I'd have to write >myself (which I'm willing to do if necessary)? > > Thanks for any info, > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or >documents, is intended only for the addressee and may contain CONFIDENTIAL, >PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended >recipient, you are hereby notified that any use, disclosure, copying or >distribution of this message or any of the information included in or with it >is unauthorized and strictly prohibited. If you have received this message >in >error, please notify the sender immediately by reply e-mail and permanently >delete and destroy this message and its attachments, along with any copies >thereof. This message does not create any contractual obligation on behalf of >the sender or Law Bulletin Publishing Company. > Thank you. > DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
DIH and updating specific record
Hi all- I am trying to determine if there is a way to tell Solr to update its index with a specific ID to a record in the database. All the examples and documentation seems to discuss using a "last updated" date/time field, but in this case modifying the table would not be an option. Instead, I'd like to invoke Solr's DIH delta query with a specific ID to say "here's something new or updated, please update your index with it". I apologize if this is a trivial thing, but I can't seem to find any documentation on how to do it. Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
XML Stripping from DIH
Hi all- I have some XML in a database that I am trying to index and store; I am interested in the various pieces of text, but none of the tags. I've been trying to figure out a way to strip all the tags out, but haven't found anything within Solr to do so; the XML parser seems to want XPath to get the various element values, when all I want is to turn the whole thing into one blob of text, regardless of whether it makes any "contextual" sense. Is there something in Solr to do this, or is it something I'd have to write myself (which I'm willing to do if necessary)? Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Setting up Solr for PDFs on JBoss
This is what I have; I didn't alter it so I believe it's the default: text true ignored_ true links ignored_ -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, January 03, 2011 8:10 PM To: solr-user@lucene.apache.org Subject: Re: Setting up Solr for PDFs on JBoss What's your solrconfig.xml look like for setting up the ExtractingReqHandler? -Grant On Jan 3, 2011, at 4:44 PM, Olson, Ron wrote: > Hi all- > > After testing the PDF import functionality in my local copy of Solr 1.4.1 > with the included Jetty app server, I tried replicating it using my copy of > Solr running in JBoss 5.10 (which uses Tomcat as its servlet container). When > I try to add a PDF, I get an error buried in the stack trace: > > Caused by: org.apache.solr.common.SolrException: Error Instantiating Request > Handler, org.apache.solr.handler.extraction.ExtractingRequestHandler is not a > org.apache.solr.request.SolrRequestHandler > > > I am using multiple cores, but they all use the common "lib" directory, > instead of the "core/lib" directory. This lib directory is what is added to > the classpath when JBoss starts ($JBOSS_HOME/server/solr_test/lib), so all > the jars in this directory should be available to anything in the "deploy" > directory (just mentioning in case people aren't familiar with JBoss). I've > added all the jars from the contrib/extraction/lib directory, as well as the > jars from dist. > > My lib directory is effectively: > > apache-solr-cell-1.4.1.jareasymock.jar > lucene-spellchecker-2.9.3.jar > apache-solr-clustering-1.4.1.jar fontbox-0.1.0.jar > nekohtml-1.9.9.jar > apache-solr-core-1.4.1.jargeronimo-stax-api_1.0_spec-1.0.1.jar > ojdbc14.jar > apache-solr-solrj-1.4.1.jar geronimo-stax-api_1.0_spec-1.0.jar > ooxml-schemas-1.0.jar > asm-3.1.jar icu4j-3.8.jar > pdfbox-0.7.3.jar > bcmail-jdk14-136.jar jcl-over-slf4j-1.5.5.jar > poi-3.5-beta6.jar > bcprov-jdk14-136.jar jempbox-0.2.0.jar > poi-ooxml-3.5-beta6.jar > commons-codec-1.3.jar junit-4.3.jar > poi-scratchpad-3.5-beta6.jar > commons-compress-1.0.jar log4j-1.2.14.jar > slf4j-api-1.5.5.jar > commons-csv-1.0-SNAPSHOT-r609327.jar lucene-analyzers-2.9.3.jar > slf4j-jdk14-1.5.5.jar > commons-fileupload-1.2.1.jar lucene-core-2.9.3.jar > tika-core-0.4.jar > commons-httpclient-3.1.jarlucene-highlighter-2.9.3.jar > tika-parsers-0.4.jar > commons-io-1.4.jarlucene-memory-2.9.3.jar > wstx-asl-3.2.7.jar > commons-lang-2.1.jar lucene-misc-2.9.3.jar > xercesImpl-2.8.1.jar > commons-logging-1.1.1.jar lucene-queries-2.9.3.jar > xml-apis-1.0.b2.jar > dom4j-1.6.1.jar lucene-snowball-2.9.3.jar > xmlbeans-2.3.0.jar > > I know several of these jars are already essentially present in JBoss (log4j, > for example), but I'm at a loss as to what to remove/add to get it to work. > Anyone have any ideas of configuring it under JBoss? The other cores are > database-based (thus the use of ojdbc14.jar), and they work fine. > > Thanks for any help, > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. -- Grant Ingersoll http://www.lucidimagination.com DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this messa
Setting up Solr for PDFs on JBoss
Hi all- After testing the PDF import functionality in my local copy of Solr 1.4.1 with the included Jetty app server, I tried replicating it using my copy of Solr running in JBoss 5.10 (which uses Tomcat as its servlet container). When I try to add a PDF, I get an error buried in the stack trace: Caused by: org.apache.solr.common.SolrException: Error Instantiating Request Handler, org.apache.solr.handler.extraction.ExtractingRequestHandler is not a org.apache.solr.request.SolrRequestHandler I am using multiple cores, but they all use the common "lib" directory, instead of the "core/lib" directory. This lib directory is what is added to the classpath when JBoss starts ($JBOSS_HOME/server/solr_test/lib), so all the jars in this directory should be available to anything in the "deploy" directory (just mentioning in case people aren't familiar with JBoss). I've added all the jars from the contrib/extraction/lib directory, as well as the jars from dist. My lib directory is effectively: apache-solr-cell-1.4.1.jareasymock.jar lucene-spellchecker-2.9.3.jar apache-solr-clustering-1.4.1.jar fontbox-0.1.0.jar nekohtml-1.9.9.jar apache-solr-core-1.4.1.jargeronimo-stax-api_1.0_spec-1.0.1.jar ojdbc14.jar apache-solr-solrj-1.4.1.jar geronimo-stax-api_1.0_spec-1.0.jar ooxml-schemas-1.0.jar asm-3.1.jar icu4j-3.8.jar pdfbox-0.7.3.jar bcmail-jdk14-136.jar jcl-over-slf4j-1.5.5.jar poi-3.5-beta6.jar bcprov-jdk14-136.jar jempbox-0.2.0.jar poi-ooxml-3.5-beta6.jar commons-codec-1.3.jar junit-4.3.jar poi-scratchpad-3.5-beta6.jar commons-compress-1.0.jar log4j-1.2.14.jar slf4j-api-1.5.5.jar commons-csv-1.0-SNAPSHOT-r609327.jar lucene-analyzers-2.9.3.jar slf4j-jdk14-1.5.5.jar commons-fileupload-1.2.1.jar lucene-core-2.9.3.jar tika-core-0.4.jar commons-httpclient-3.1.jarlucene-highlighter-2.9.3.jar tika-parsers-0.4.jar commons-io-1.4.jarlucene-memory-2.9.3.jar wstx-asl-3.2.7.jar commons-lang-2.1.jar lucene-misc-2.9.3.jar xercesImpl-2.8.1.jar commons-logging-1.1.1.jar lucene-queries-2.9.3.jar xml-apis-1.0.b2.jar dom4j-1.6.1.jar lucene-snowball-2.9.3.jar xmlbeans-2.3.0.jar I know several of these jars are already essentially present in JBoss (log4j, for example), but I'm at a loss as to what to remove/add to get it to work. Anyone have any ideas of configuring it under JBoss? The other cores are database-based (thus the use of ojdbc14.jar), and they work fine. Thanks for any help, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Testing/packaging question
I believe it should point to the directory above, where conf and lib are located (though I have a multi-core setup). Mine is set to: /usr/local/jboss-5.1.0.GA/server/solr/solr_data/ And in solr_data the solr.xml defines the two cores, but in each core directory, is a conf, data, and lib directory, which contains the schema.xml. -Original Message- From: Bernhard Reiter [mailto:ock...@raz.or.at] Sent: Thursday, November 04, 2010 3:49 PM To: solr-user@lucene.apache.org Subject: Re: Testing/packaging question Hi, I'm now trying to export JAVA_OPTS="$JAVA_OPTS -Dsolr.solr.home=/path/to/my/schema.xml" and restarting tomcat (v6 package from ubuntu maverick) via sudo /etc/init.d/tomcat6 restart but solr still doesn't seem to find that schema.xml, as it complains about unknown fields when running the tests that require that schema.xml Can someone please tell me what I'm doing wrong -- and what I should be doing? TIA again, Bernhard Am Montag, den 01.11.2010, 19:01 +0100 schrieb Bernhard Reiter: > Hi, > > I'm pretty much of a Solr newbie currently packaging solrpy for Debian; > see > http://svn.debian.org/viewsvn/python-modules/packages/python-solrpy/trunk/ > > In order to run solrpy's supplied tests at build time, I'd need Solr to > know about the schema.xml that comes with the tests. > Can anyone tell me how do that properly? I'd basically need Solr to > temporarily recognize that schema.xml without permanently installing it > -- is there any way to do this, eg via environment variables? > > TIA > Bernhard Reiter DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Using setStart in solrj
Hi all- First, thanks to all the folks to have helped me so far getting the hang of Solr; I promise to give back when I think my contributions will be useful :) I am at the point where I'm trying to return results back from a search in a war file, using Java with solrj. On the result page of the website I'd want to limit the actual results to probably around 20 or so, with the usual "next/prev page" paradigm. The issue I've been wrestling with is keeping the SolrQuery object around so that I don't need to transmit the entire thing back to the client, especially if they search for something like "truck", which could return a lot of results. I was thinking that one solution would be to do a "query.setRows(20);" for the query, then return the results back with some sort of an identifier so that on subsequent queries, I could also include "query.setStart(someCounter + 1);" to get the next set of 20. In theory, that would work at the cost of having to re-execute the query. I've been looking for information about setStart() and haven't found much more than Javadoc that says "sets the starting row for the result set". My question is, how do I know what the starting row is? Maybe, based on the search parameters, it will always return the results in an implicit order in which case is it just like executing a fixed query in a database and then grabbing the next 20 rows from the result set? Because the user would be pressing the prev/next buttons, even though the query is being re-executed, the parameters would not be changing. That's the theory, anyway. It seems excessive to keep executing the same query over and over again just because the user wants to see the next set of results, especially if the original SolrQuery object has them all, but maybe that's just what needs to be done, given the stateless nature of the web. Any info on this method/strategy would be most appreciated. Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Stored or indexed?
Thanks for the great info! I appreciate everybody's help in getting started with Solr, hopefully I'll be able to get my stuff working and move on to more difficult questions. :) -Original Message- From: Elizabeth L. Murnane [mailto:emurn...@architexa.com] Sent: Friday, October 29, 2010 12:42 PM To: solr-user@lucene.apache.org Subject: Re: Stored or indexed? Hi Ron, In a nutshell - an indexed field is searchable, and a stored field has its content stored in the index so it is retrievable. Here are some examples that will hopefully give you a feel for how to set the indexed and stored options: indexed="true" stored="true" Use this for information you want to search on and also display in search results - for example, book title or author. indexed="false" stored="true" Use this for fields that you want displayed with search results but that don't need to be searchable - for example, destination URL, file system path, time stamp, or icon image. indexed="true" stored="false" Use this for fields you want to search on but don't need to get their values in search results. Here are some of the common reasons you would want this: Large fields and a database: Storing a field makes your index larger, so set stored to false when possible, especially for big fields. For this case a database is often used, as the previous responder said. Use a separate identifier field to get the field's content from the database. Ordering results: Say you define field name="bookName" type="text" indexed="true" stored="true" that is tokenized and used for searching. If you want to sort results based on book name, you could copy the field into a separate nonretrievable, nontokenized field that can be used just for sorting - field name="bookSort" type="string" indexed="true" stored="false" copyField source="bookName" dest="bookSort" Easier searching: If you define the field you can use it as a catch-all field that contains all of the other text fields. Since solr looks in a default field when given a text query without field names, you can support this type of general phrase query by making the catch-all the default field. indexed="false" stored="false" Use this when you want to ignore fields. For example, the following will ignore unknown fields that don't match a defined field rather than throwing an error by default. fieldtype name="ignored" stored="false" indexed="false" dynamicField name="*" type="ignored" Elizabeth Murnane emurn...@architexa.com Architexa Lead Developer - www.architexa.com Understand & Document Code In Seconds --- On Thu, 10/28/10, Savvas-Andreas Moysidis wrote: From: Savvas-Andreas Moysidis Subject: Re: Stored or indexed? To: solr-user@lucene.apache.org Date: Thursday, October 28, 2010, 4:25 AM In our case, we just store a database id and do a secondary db query when displaying the results. This is handy and leads to a more centralised architecture when you need to display properties of a domain object which you don't index/search. On 28 October 2010 05:02, kenf_nc wrote: > > Interesting wiki link, I hadn't seen that table before. > > And to answer your specific question about indexed=true, stored=false, this > is most often done when you are using analyzers/tokenizers on your field. > This field is for search only, you would never retrieve it's contents for > display. It may in fact be an amalgam of several fields into one 'content' > field. You have your display copy stored in another field marked > indexed=false, stored=true and optionally compressed. I also have simple > string fields set to lowercase so searching is case-insensitive, and have a > duplicate field where the string is normal case. the first one is > indexed/not stored, the second is stored/not indexed. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Stored-or-indexed-tp1782805p1784315.html > Sent from the Solr - User mailing list archive at Nabble.com. > DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Stored or indexed?
Hi all- I've read through the documentation, but I'm still a little confused about the tag, in terms of the indexed and stored attributes. If I have something marked as indexed="true", why would I ever want stored="false"? Are there any good tips-n-tricks anywhere about how to properly set the field tag? I've been finding bits and pieces both on the wiki and a couple of other websites, but there doesn't seem to be a good definitive how-to on this. Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Confusion about entities and documents
Hmm, okay, I guess I wasn't taking the hierarchy-flattening aspect of Solr seriously enough. :) Based on your reply from the other thread, I guess the best solution, as far as I can tell, is to maintain the multiple value lists and take advantage of the fact that the arrays will always be in the right order: 1 2 ABC Corp XYZ Inc So I guess the problem isn't really *sooo* bad...I just need to make sure that I have the appropriate names defined so I can link between two arrays in my client code. I suppose I could keep things straight by preserving the hierarchy within the name attribute. -Original Message- From: harrysmith [mailto:harrysmith...@gmail.com] Sent: Friday, October 22, 2010 4:10 PM To: solr-user@lucene.apache.org Subject: Re: Confusion about entities and documents >What I get when I search for, say, "XYZ", is a document that has XYZ Corp as a manufacturer name, but the >array of parts_manu appears to be a child of the document, not the parts array. > >Is this the correct behavior, insofar as a document has a single level of elements, and that's it? If so, what >might be a better strategy for being able to maintain the hierarchy of information within a document? > Yes, this is the correct behavior. I still struggle with the same issue, and there is no 'best practices' (that I have found at least) of maintaining relationships within a Solr doc. The argument is Solr is not the correct place for these representations and should only represent a flat version of your document. For a similar question see: http://lucene.472066.n3.nabble.com/Schema-Definition-Question-td1049966.html#a1105593 A few possible solutions are posted there, and i'm interested in how others have tackled this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Confusion-about-entities-and-documents-tp1753926p1755152.html Sent from the Solr - User mailing list archive at Nabble.com. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Confusion about entities and documents
Hi all- I've been checking the online docs about this, but I haven't found a suitable explanation about how entities and sub-entities work within a document. I am loading records from a SQL database and everything seems to be getting flattened in a way I was not expecting. For example, I have a document that defines, say, "engine". The engine is made up of parts, which are manufactured by various companies. A hypothetical, abbreviated config would be: ... What I get when I search for, say, "XYZ", is a document that has XYZ Corp as a manufacturer name, but the array of parts_manu appears to be a child of the document, not the parts array. Is this the correct behavior, insofar as a document has a single level of elements, and that's it? If so, what might be a better strategy for being able to maintain the hierarchy of information within a document? Thanks for any info, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Documents and Cores, take 2
Hi all- I have a newbie design question about documents, especially with SQL databases. I am trying to set up Solr to go against a database that, for example, has "items" and "people". The way I see it, and I don't know if this is right or not (thus the question), is that I see both as separate documents as an item may contain a list of parts, which the user may want to search, and, as part of the "item", view the list of people who have ordered the item. Then there's the actual "people", who the user might want to search to find a name and, consequently, what items they ordered. To me they are both "top level" things, with some overlap of fields. If I'm searching for "people", I'm likely not going to be interested in the parts of the item, while if I'm searching for "items" the likelihood is that I may want to search for "42532" which is, in this instance, a SKU, and not get hits on the zip code section of the "people". Does it make sense, then, to separate these two out as separate documents? I believe so because the documentation I've read suggests that a document should be analogous to a row in a table (in this case, very de-normalized). What is tripping me up is, as far as I can tell, you can have only one document type per index, and thus one document per core. So in this example, I have two cores, "items" and "people". Is this correct? Should I embrace the idea of having many cores or am I supposed to have a single, unified index with all documents (which doesn't seem like Solr supports). The ultimate question comes down to the search interface. I don't necessarily want to have the user explicitly state which document they want to search; I'd like them to simply type "42532" and get documents from both cores, and then possibly allow for filtering results after the fact, not before. As I've only used the admin site so far (which is core-specific), does the client API allow for unified searching across all cores? Assuming it does, I'd think my idea of multiple-documents is okay, but I'd love to hear from people who actually know what they're doing. :) Thanks, Ron BTW: Sorry about the problem with the previous message; I didn't know about thread hijacking. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Documents and cores
Hi all- I have a newbie design question about documents, especially with SQL databases. I am trying to set up Solr to go against a database that, for example, has "items" and "people". The way I see it, and I don't know if this is right or not (thus the question), is that I see both as separate documents as an item may contain a list of parts, which the user may want to search, and, as part of the "item", view the list of people who have ordered the item. Then there's the actual "people", who the user might want to search to find a name and, consequently, what items they ordered. To me they are both "top level" things, with some overlap of fields. If I'm searching for "people", I'm likely not going to be interested in the parts of the item, while if I'm searching for "items" the likelihood is that I may want to search for "42532" which is, in this instance, a SKU, and not get hits on the zip code section of the "people". Does it make sense, then, to separate these two out as separate documents? I believe so because the documentation I've read suggests that a document should be analogous to a row in a table (in this case, very de-normalized). What is tripping me up is, as far as I can tell, you can have only one document type per index, and thus one document per core. So in this example, I have two cores, "items" and "people". Is this correct? Should I embrace the idea of having many cores or am I supposed to have a single, unified index with all documents (which doesn't seem like Solr supports). The ultimate question comes down to the search interface. I don't necessarily want to have the user explicitly state which document they want to search; I'd like them to simply type "42532" and get documents from both cores, and then possibly allow for filtering results after the fact, not before. As I've only used the admin site so far (which is core-specific), does the client API allow for unified searching across all cores? Assuming it does, I'd think my idea of multiple-documents is okay, but I'd love to hear from people who actually know what they're doing. :) Thanks, Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.