Re: facet.field counts when q includes field
No problem, Mike. Glad you got it sorted out. Trey Grainger Co-author, Solr in Action Director of Engineering, Search & Analytics @ CareerBuilder On Sun, Apr 27, 2014 at 7:23 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > On 4/27/14 7:02 PM, Michael Sokolov wrote: > >> On 4/27/2014 6:30 PM, Trey Grainger wrote: >> >>> So my question basically is: which restrictions are applied to the docset > from which (field) facets are computed? >>> >>> Facets are generated based upon values found within the documents >>> matching >>> your "q=" parameter and also all of your "fq=" parameters. Basically, if >>> you do an intersection of the docsets from all "q=" and "fq=" parameters >>> then you end up with the docset the facet calculations are based upon. >>> >>> When you say "if I add type=book, *no* documents match, but I get facet >>> counts: { chapter=4 }", I'm not exactly sure what you mean. If you are >>> adding "q=toto&type=book&facet=true&facet.field=type" then the problem >>> is >>> that the "type=book" parameter doesn't do anything... it is not a valid >>> Solr parameter for filtering here. In this case, all 4 of your documents >>> matching the "q=toto" query are still being returned, which is why the >>> facet count for chapters is 4. >>> >> In fact my query looks like: >> >> q=fulltext_t%3A%28toto%29+AND+dc_type_s%3A%28book%29+% >> 2Bdirectory_b%3Afalse&start=0&rows=20&fl=uri%2Ctimestamp% >> 2Cdirectory_b%2Csize_i%2Cmeta_ss%2Cmime_type_ss&facet.field=dc_type_s >> >> or without url encoding: >> >> q=fulltext_t:(toto) AND dc_type_s:(book) (directory_b:false) >> facet.field=dc_type_s >> >> default operator is AND >> >> ... so I don't think that the query is broken like you described? >> >> -Mike >> > OK the problem wasn't with the query, but while I tried to write out a > clearer explanation, I found it -- an issue in a unit test too boring to > describe. Facets do seem to work like you said, and how they're > documented, and as I assumed they did :) > > Thanks, and sorry for the noise. > > -Mike >
Re: Application of different stemmers / stopword lists within a single field
If you can throw money at the problem: http://www.basistech.com/text-analytics/rosette/language-identifier/ . Language Boundary Locator at the bottom of the page seems to be part/all of your solution. Otherwise, specifically for English and Arabic, you could play with Unicode ranges to try detecting text blocks: 1) Create an UpdateRequestProcessor chain that a) clones text into field_EN and field_AR. b) applies regular expression transformations that strip English or Arabic unicode text range correspondingly, so field_EN only has English characters left, etc. Of course, you need to decide what you want to do with occasional EN or neutral characters happening in the middle of Arabic text (numbers: Arabic or Indic? brackets, dashes, etc). But if you just index text, it might be ok even if it is not perfect. c) deletes empty fields, just in case not all of them have mix language 2) Use eDismax to search over both fields, each with its own processor. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Fri, Apr 25, 2014 at 5:34 PM, Timothy Hill wrote: > This may not be a practically solvable problem, but the company I work for > has a large number of lengthy mixed-language documents - for example, > scholarly articles about Islam written in English but containing lengthy > passages of Arabic. Ideally, we would like users to be able to search both > the English and Arabic portions of the text, using the full complement of > language-processing tools such as stemming and stopword removal. > > The problem, of course, is that these two languages co-occur in the same > field. Is there any way to apply different processing to different words or > paragraphs within a single field through language detection? Is this to all > intents and purposes impossible within Solr? Or is another approach (using > language detection to split the single large field into > language-differentiated smaller fields, for example) possible/recommended? > > Thanks, > > Tim Hill
Re: How to sort solr results by foreign id field
So, way problem above ? - Lady Cute -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263p4133408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: facet.field counts when q includes field
On 4/27/14 7:02 PM, Michael Sokolov wrote: On 4/27/2014 6:30 PM, Trey Grainger wrote: So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? Facets are generated based upon values found within the documents matching your "q=" parameter and also all of your "fq=" parameters. Basically, if you do an intersection of the docsets from all "q=" and "fq=" parameters then you end up with the docset the facet calculations are based upon. When you say "if I add type=book, *no* documents match, but I get facet counts: { chapter=4 }", I'm not exactly sure what you mean. If you are adding "q=toto&type=book&facet=true&facet.field=type" then the problem is that the "type=book" parameter doesn't do anything... it is not a valid Solr parameter for filtering here. In this case, all 4 of your documents matching the "q=toto" query are still being returned, which is why the facet count for chapters is 4. In fact my query looks like: q=fulltext_t%3A%28toto%29+AND+dc_type_s%3A%28book%29+%2Bdirectory_b%3Afalse&start=0&rows=20&fl=uri%2Ctimestamp%2Cdirectory_b%2Csize_i%2Cmeta_ss%2Cmime_type_ss&facet.field=dc_type_s or without url encoding: q=fulltext_t:(toto) AND dc_type_s:(book) (directory_b:false) facet.field=dc_type_s default operator is AND ... so I don't think that the query is broken like you described? -Mike OK the problem wasn't with the query, but while I tried to write out a clearer explanation, I found it -- an issue in a unit test too boring to describe. Facets do seem to work like you said, and how they're documented, and as I assumed they did :) Thanks, and sorry for the noise. -Mike
Re: facet.field counts when q includes field
On 4/27/2014 6:30 PM, Trey Grainger wrote: So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? Facets are generated based upon values found within the documents matching your "q=" parameter and also all of your "fq=" parameters. Basically, if you do an intersection of the docsets from all "q=" and "fq=" parameters then you end up with the docset the facet calculations are based upon. When you say "if I add type=book, *no* documents match, but I get facet counts: { chapter=4 }", I'm not exactly sure what you mean. If you are adding "q=toto&type=book&facet=true&facet.field=type" then the problem is that the "type=book" parameter doesn't do anything... it is not a valid Solr parameter for filtering here. In this case, all 4 of your documents matching the "q=toto" query are still being returned, which is why the facet count for chapters is 4. In fact my query looks like: q=fulltext_t%3A%28toto%29+AND+dc_type_s%3A%28book%29+%2Bdirectory_b%3Afalse&start=0&rows=20&fl=uri%2Ctimestamp%2Cdirectory_b%2Csize_i%2Cmeta_ss%2Cmime_type_ss&facet.field=dc_type_s or without url encoding: q=fulltext_t:(toto) AND dc_type_s:(book) (directory_b:false) facet.field=dc_type_s default operator is AND ... so I don't think that the query is broken like you described? -Mike
Re: facet.field counts when q includes field
>>So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? Facets are generated based upon values found within the documents matching your "q=" parameter and also all of your "fq=" parameters. Basically, if you do an intersection of the docsets from all "q=" and "fq=" parameters then you end up with the docset the facet calculations are based upon. When you say "if I add type=book, *no* documents match, but I get facet counts: { chapter=4 }", I'm not exactly sure what you mean. If you are adding "q=toto&type=book&facet=true&facet.field=type" then the problem is that the "type=book" parameter doesn't do anything... it is not a valid Solr parameter for filtering here. In this case, all 4 of your documents matching the "q=toto" query are still being returned, which is why the facet count for chapters is 4. If instead you specify "q=toto&fq=type:book&facet=true&facet.field=type" then this will filter down to ONLY the documents with a type of book. Since it looks like in your data there are no documents which are both a type of book and also match the "q=toto" query, you should get 0 documents and thus the counts of all your facet values will be zero. As you mentioned, it is possible to utilize tags and excludes to change the behavior described above, but hopefully this answers your question about the default behavior. Thanks, Trey Grainger Co-author, Solr in Action Director of Engineering, Search & Analytics @ CareerBuilder On Sun, Apr 27, 2014 at 4:51 PM, Michael Sokolov < msoko...@safaribooksonline.com> wrote: > I'm trying to understand the facet counts I'm getting back from Solr when > the main query includes a term that restricts on a field that is being > faceted. After reading the docs on the wiki (both wikis) I'm confused. > > In my little test dataset, if I facet on "type" and use q=*:*, I get facet > counts for type: [ chapter=5, book=1 ] > > With q=toto, only four of the chapters match, so I get facet counts for > type: { chapter=4 } . > > Now if I add type=book, *no* documents match, but I get facet counts: { > chapter=4 }. > > It's as if the type term from the query is being ignored when the facets > are computed. This is actually what we want, in general, but the > documentation doesn't reflect it and I'd like to understand better the > mechanism so I can tell what I can rely on. > > I see that there is the possibility of tagging and excluding filters (fq) > so they don't effect the facet counting, but there's no mention on the wiki > of any sort of term exclusion from the main query. I poked around in the > source a bit, but wasn't able to find an answer quickly, so I thought I'd > ask here. > > So my question basically is: which restrictions are applied to the docset > from which (field) facets are computed? > > -Mike > > >
facet.field counts when q includes field
I'm trying to understand the facet counts I'm getting back from Solr when the main query includes a term that restricts on a field that is being faceted. After reading the docs on the wiki (both wikis) I'm confused. In my little test dataset, if I facet on "type" and use q=*:*, I get facet counts for type: [ chapter=5, book=1 ] With q=toto, only four of the chapters match, so I get facet counts for type: { chapter=4 } . Now if I add type=book, *no* documents match, but I get facet counts: { chapter=4 }. It's as if the type term from the query is being ignored when the facets are computed. This is actually what we want, in general, but the documentation doesn't reflect it and I'd like to understand better the mechanism so I can tell what I can rely on. I see that there is the possibility of tagging and excluding filters (fq) so they don't effect the facet counting, but there's no mention on the wiki of any sort of term exclusion from the main query. I poked around in the source a bit, but wasn't able to find an answer quickly, so I thought I'd ask here. So my question basically is: which restrictions are applied to the docset from which (field) facets are computed? -Mike
Wildcard search not working with search term having special characters and digits
Hi, Below query without wildcard search is returning results. http://localhost:8080/solr/master/select?q=page_title_t:"an-138"; But below query with wildcard is not returning results http://localhost:8080/solr/master/select?q=page_title_t:"an-13*"; Below query with wildcard search and no didgits is returning results. http://localhost:8080/solr/master/select?q=page_title_t:"an-*"; I have tried by adding WordDelimeter Filter but there is no luck. Please suggest or guide how to make wildcard search works with special characters and digits. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-search-not-working-with-search-term-having-special-characters-and-digits-tp4133385.html Sent from the Solr - User mailing list archive at Nabble.com.
Stemming not working with wildcard search
Hi, I have added SnowballPorterFilterFactory filter to field type to make singular and plural search terms return same results. So below queries (double quotes around search term) returning similar results which is fine. http://localhost:8080/solr/master/select?q=page_title_t:"product*"; http://localhost:8080/solr/master/select?q=page_title_t:"products*"; But when I have analyzed results, in both result sets, documents which dont start with words "Product" or "products" didnt come though there are few documents available. So I have added * as prefix and suffix to search term without double quotes to do wildcard search. http://localhost:8080/solr/master/select?q=page_title_t:*product* http://localhost:8080/solr/master/select?q=page_title_t:*products* Now, stemming is not working as above second query is not returning similar results as query 1. If double quotes are added around search term then its returning similar results but results are not as expected. With double quotes it wont return results like "Old products", "New products", "Cool Product". It will only return results with the values like "Product 1", "Product 2","Products of USA". Please suggest or guide how to make stemming work with wildcard search. Appreciate immediate response!! Thanks, G. Naresh Kumar -- View this message in context: http://lucene.472066.n3.nabble.com/Stemming-not-working-with-wildcard-search-tp4133382.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Data Import Handelr Question
This might be helpful: http://searchhub.org/2012/02/14/indexing-with-solrj/ It combines using Tika for structured documents and using a JDBC connector, but extracting the DB-specific stuff should be quite easy. Best, Erick On Sun, Apr 27, 2014 at 7:24 AM, Yuval Dotan wrote: > Thanks Shawn > > In your opinion, what do you think is easier, writing the importer from > scratch or extending the DIH (for example: adding the state etc...)? > > > Yuval > > > On Thu, Apr 24, 2014 at 6:47 PM, Shawn Heisey wrote: > >> On 4/24/2014 9:24 AM, Yuval Dotan wrote: >> >>> I want to use the DIH component in order to import data from old >>> postgresql >>> DB. >>> I want to be able to recover from errors and crashes. >>> If an error occurs I should be able to restart and continue indexing from >>> where it stopped. >>> Is the DIH good enough for my requirements ? >>> If not is it possible to extend one of its classes in order to support the >>> recovery? >>> >> >> The entity in the Dataimport Handler (DIH) config has an "onError" >> attribute. >> >> http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config >> https://cwiki.apache.org/confluence/display/solr/ >> Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler# >> UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors >> >> But honestly, if you want a really robust Java program that indexes to >> Solr and does precisely what you want, you may be better off writing it >> yourself using SolrJ and JDBC. DIH is powerful and efficient, but when you >> write the program yourself, you can do anything you want with your data. >> >> You also have the possibility of resuming an import after a Solr crash. >> Because DIH is embedded in Solr and doesn't save any kind of state data >> about an import in progress, that's pretty much impossible with DIH. With >> a SolrJ program, you'd have to handle that yourself, but it would be >> *possible*. >> >> https://cwiki.apache.org/confluence/display/solr/Using+SolrJ >> >> Thanks, >> Shawn >> >>
Re: How to sort solr results by foreign id field
Store the sort criteria in the documents you want to sort. Solr is _not_ a RDBMS, trying to do SQL-like things is usually a mistake, the usual approach is to de-normalize your data so you don't need to try. Best Erick On Sun, Apr 27, 2014 at 6:11 AM, Goosef_Le_Hung wrote: > help me > > > > - > Lady Cute > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263p4133345.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: DocValues and StatsComponent
Hi Harish, I created https://issues.apache.org/jira/browse/SOLR-6024 on behalf of you. Ahmet On Friday, April 4, 2014 3:13 AM, Ahmet Arslan wrote: Hi Harish, I re-produced your problem with example/default setup. I enabled doc values example fields. ( deleted the original ones) and indexed example documents. Single valued fields work fine. But stats on multi-valued field cat yields http://localhost:8983/solr/collection1/select?q=*%3A*&wt=json&indent=true&stats=true&stats.field=cat "msg": "Type mismatch: cat was indexed as SORTED_SET", "code": 400 And confluence does not say anything about this. Can you file a jira issue? Ahmet On Thursday, April 3, 2014 11:01 PM, Harish Agarwal wrote: Is there a known issue using the StatsComponent against fields indexed with docvalues? My setup is currently throwing this error (against the latest nightly build): org.apache.solr.common.Solr*Exception*; org.apache.solr.common.Solr *Exception*: Type mismatch: INTEGER_4 was indexed as SORTED_SET
Re: Data Import Handelr Question
Thanks Shawn In your opinion, what do you think is easier, writing the importer from scratch or extending the DIH (for example: adding the state etc...)? Yuval On Thu, Apr 24, 2014 at 6:47 PM, Shawn Heisey wrote: > On 4/24/2014 9:24 AM, Yuval Dotan wrote: > >> I want to use the DIH component in order to import data from old >> postgresql >> DB. >> I want to be able to recover from errors and crashes. >> If an error occurs I should be able to restart and continue indexing from >> where it stopped. >> Is the DIH good enough for my requirements ? >> If not is it possible to extend one of its classes in order to support the >> recovery? >> > > The entity in the Dataimport Handler (DIH) config has an "onError" > attribute. > > http://wiki.apache.org/solr/DataImportHandler#Schema_for_the_data_config > https://cwiki.apache.org/confluence/display/solr/ > Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler# > UploadingStructuredDataStoreDatawiththeDataImportHandler-EntityProcessors > > But honestly, if you want a really robust Java program that indexes to > Solr and does precisely what you want, you may be better off writing it > yourself using SolrJ and JDBC. DIH is powerful and efficient, but when you > write the program yourself, you can do anything you want with your data. > > You also have the possibility of resuming an import after a Solr crash. > Because DIH is embedded in Solr and doesn't save any kind of state data > about an import in progress, that's pretty much impossible with DIH. With > a SolrJ program, you'd have to handle that yourself, but it would be > *possible*. > > https://cwiki.apache.org/confluence/display/solr/Using+SolrJ > > Thanks, > Shawn > >
Re: '0' Status: Communication Error
Hey Naresh, few things that may be wrong 1) your application is not pointed to correct solr(change config.ini) 2)not able to access new solr machine from your application environment(just run this command in terminal to know the status of the port/IP from application environment telnet IP_ADDRESS 8983). Hope this helps! On Sat, Apr 26, 2014 at 5:33 PM, Naresh wrote: > I've got this problem that I can't solve. Partly because I can't explain it > with the right terms. I'm new to this so sorry for this clumsy question. > > Below you can see an overview of my goal. > > I'm using Magento CE1.7.0.2 & Solr 4.6.0. > > I'm using Magentix/Solr extension in Magento CE1.7.0.2 its working fine i > can able get the response in max of 2secs. (Here i place Solr Server in > near > to My Magento) > > But i placed my Solr in separate server i don't want to place all these > thing in one server. > > Enable Search : Yes > Enable Index : Yes > Host : IP address of Solr file existing server > Port : 8983 > Path : /solr > Search limit : 100 > > But in solr logs its not giving any log details but actually that should > give some log details & time taken for re-indexing dataetc > > And in Solr.log file its giving ERR (3): '0' Status: Communication Error.. > > Any thing wrong i did here ? > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/0-Status-Communication-Error-tp4133265.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: How to sort solr results by foreign id field
help me - Lady Cute -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-sort-solr-results-by-foreign-id-field-tp4133263p4133345.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can I convert xml message for updating a Solr index to a javabin file
Look at the SolrJ source code and doc. JavaBin is more of a protocol than a file format. -- Jack Krupansky -Original Message- From: Elran Dvir Sent: Sunday, April 27, 2014 2:16 AM To: solr-user@lucene.apache.org Subject: RE: How can I convert xml message for updating a Solr index to a javabin file Does anyone know a way to do this? Thanks. -Original Message- From: Elran Dvir Sent: Thursday, April 24, 2014 4:11 PM To: solr-user@lucene.apache.org Subject: RE: How can I convert xml message for updating a Solr index to a javabin file I want to measure xml vs javabin update message indexing performance. -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Thursday, April 24, 2014 2:04 PM To: solr-user@lucene.apache.org Subject: Re: How can I convert xml message for updating a Solr index to a javabin file Why would you want to do this? Javabin is used by SolrJ to communicate with Solr. XML is good enough for communicating from the command line/curl, as is JSON. Attempting to use javabin just seems to add an unnecessary complication. Upayavifra On Thu, Apr 24, 2014, at 10:20 AM, Elran Dvir wrote: Hi all, Is there a way I can covert a xml Solr update message file to javabin file? If so, How? How can I use curl to update Solr by javabin message file? Thank you very much. Email secured by Check Point