Solr Cloud shard shutdown problem
When I am running multiple solr instances setup in a sharded implementation I am noticing that if I shut down 1 of the shards I get the following message when a query is executed: HTTP ERROR 503 Problem accessing /solr/select/. Reason: no servers hosting shard: -- *Powered by Jetty:// *In order to get this to go away I need to stop the remaining solr instances that are running and delete the zoo data to get things to run again. Is there a way to have solr clean up the zookeeper entries when a shard shuts down so this error does not occur?
Problem with DictionaryCompoundWordTokenFilterFactory in Solr 3.2
Hello everybody! I am facing a problem with Solr's DictionaryCompoundWordTokenFilterFactory and hope you have some advice for me. I am using the latest version Solr 3.2. (Had the same problem with Solr 3.1) In the schema, I am using the settings like Now, when I am analyzing the word "lederschuh" (means "leather shoe" in German) I am getting the following sub-words using the analyzer interface: 1.) lederschuh 2.) lederschuh 3.) der 4.) er 5.) schuh Problem 1: I configured "minSubwordSize" to 3. Why does entry 4 ("er") appear which is shorter than 3 chars? Problem 2: I configured "onlyLongestMatch" to true. There is a "lederschuh" entry in my dictionary. So the longestmatch would be "lederschuh" by itself and I do not expect to have that split up any further. Why is Solr still splitting that up? Is this a bug or did I misconfigure something? Any advise is very welcome! Thank you, Bernhard
Re: Getting all field names in a Solr index via SolrJ
: Yes. So the consensus is that it can only be done via Luke. I would expect : it to be easier than that. Luke != LukeRequestHandler LukeRequestHandler is built into solr, and is *exactly* how the web ui of Solr lists all the fields. http://wiki.apache.org/solr/LukeRequestHandler -Hoss
Re: AW: How to deal with many files using solr external file field
: We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR: : : * SOLR looks if there is a file for the requested query, e.g. "trousers" Something smells fishy here. ExternalFileField is designed to let you load values for a field (for use in functions) from a file where the file name is determined from the field name If ExternalFileField is trying to load a file named "external_trousers" that means your query is attempting ot use "trousers" as a *field* ... that doesn't sound right. based on your description of the memory blow up you are seeing, it sounds like you are using the user's query string as a (dynamic?) field name and none of these external_${query} files exist -- that's not really the intended usage. Can you clarify a bit more what exactly your goal is? This smells like an XY Problem (my gut reaction is that you might actaully be wanting to use QueryElevationComponent instead of ExternalFileField)... http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Re: FastVectorHighlighter and hl.fragsize parameter set to zero causes exception
Hi Tom, (11/06/11 2:01), Burton-West, Tom wrote: According to the documentation on the Solr wiki page, setting the hl.fragsize parameter to "0" indicates that the whole field value should be used (no fragmenting). However the FastVectorHighlighter throws an exception message fragCharSize(0) is too small. It must be 18 or higher. java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 or higher. at org.apache.lucene.search.vectorhighlight.SimpleFragListBuilder.createFieldFragList(SimpleFragListBuilder.java:36) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:177) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:166) at I noticed that in Solr 1268 there was an attempt to fix this problem but it apparently did not work. 1) Is there any plan to implement a feature in FastVectorHighlighter that would behave the same as the regular Highlighter, i.e. return the whole field value when hl.fragsize=0? There is SingleFragListBuilder for this purpose. Please see: https://issues.apache.org/jira/browse/LUCENE-2464 2) Should I edit the wiki entry to indicate that the hl.fragsize=0 does not work with FVH? Yes, please! 3) Are there other parameters listed in the wiki that should have a note indicating whether they apply to only the regular highlighter or the FVH? FVH doesn't support (or makes no sense): fragsize=0, mergeContiguous, maxAnalyzedChars, formatter, simple.pre/simple.post, fragmenter, highlightMultiTerm, regex.* and FVH support requireFieldMatch but doesn't support per-field override for it. If you would update wiki, it definitely be very helpful. Thank you! koji -- http://www.rondhuit.com/en/
Re: Getting all field names in a Solr index via SolrJ
I do need all the fields, including the dynamic ones. Thanks very much for all he advice. On Fri, Jun 10, 2011 at 10:58 PM, Ahmet Arslan wrote: > > > Yes. So the consensus is that it can > > only be done via Luke. I would expect > > it to be easier than that. > > If you use &numTerms=0 things will be fast. > By the way if you dont care about dynamic fields, you can extract field > names from schema.xml too (via ShowFileRequestHandler) >
Re: Getting all field names in a Solr index via SolrJ
> Yes. So the consensus is that it can > only be done via Luke. I would expect > it to be easier than that. If you use &numTerms=0 things will be fast. By the way if you dont care about dynamic fields, you can extract field names from schema.xml too (via ShowFileRequestHandler)
Re: Getting all field names in a Solr index via SolrJ
Yes. So the consensus is that it can only be done via Luke. I would expect it to be easier than that. Thanks! On Fri, Jun 10, 2011 at 9:05 PM, Ahmet Arslan wrote: > > > Sure, but how does the web admin do > > > it without Luke? > > > > admin/analysis.jsp uses luke too. > > > > Sorry I wrote it wrong. I was referring /admin/schema.jsp >
FYI: How to build and start Apache Solr admin app from source with Maven
Hi, guys, FYI: Here is the link to how to build and start Apache Solr admin app from source with Maven just in case you might be interested: http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html Have fun. YH
Re: Document has fields with different update frequencies: how best to model
Take a look at ExternalFileField [1]. It's meant for exactly what you want to do here. FYI, there is an issue with caching of the external values introduced in v1.4 but, thankfully, resolved in v3.2 [2] --jay [1] http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html [2] https://issues.apache.org/jira/browse/SOLR-2536 On Fri, Jun 10, 2011 at 12:54 PM, lee carroll wrote: > Hi, > We have a document type which has fields which are pretty static. Say > they change once every 6 month. But the same document has a field > which changes hourly > What are the best approaches to index this document ? > > Eg > Hotel ID (static) , Hotel Description (static and costly to get from a > url etc), FromPrice (changes hourly) > > Option 1 > Index hourly as a single document and don't worry about the unneeded > field updates > > Option 2 > Split into 2 document types and index independently. This would > require the front end application to query multiple times? > doc1 > ID,Description,DocType > doc2 > ID,HotelID,Price,DocType > > application performs searches based on hotel attributes > for each hotel match issue query to get price > > > Any other options ? Can you query across documents ? > > We run 1.4.1, we could maybe update to 3.2 but I don't think I could > swing to trunk for JOIN feature (if that indeed is JOIN's use case) > > Thanks in advance > > PS Am I just worrying about de-normalised data and should sort the > source data out maybe by caching and get over it ...? > > cheers Lee c >
Re: Getting all field names in a Solr index via SolrJ
> > Sure, but how does the web admin do > > it without Luke? > > admin/analysis.jsp uses luke too. > Sorry I wrote it wrong. I was referring /admin/schema.jsp
Re: wildcard search
> I tried to follow this recipe, adapting it to the solr 3.2 > I am testing right now. > The first try gave me a message > > [java] !!! Couldn't get > license file for > /Installer/solr/apache-solr-3.2.0/solr/lib/ComplexPhrase-1.0.jar > [java] At least one file does not > have a license, or it's license name is not in the proper > format. See the logs. > > BUILD FAILED > > so I created a fake license > ComplexPhrase-LICENSE-MIT.txt > for ComplexPhrase and tried again, which ran through > successfully, I hope this is OK. I didn't used it with solr 3.2. I will check about it. > > IA 300 > > IC 330 > > IA 314 > > IA 318 > > I didn't have to split them up, they are already separated > as field with multiValued="true". > But I need to be able to search for IA 310 - IA 319 with > one call, > {!complexphrase}GOK:"IA 31?" > will do this now, or even for > {!complexphrase}GOK:"IA 3*" > to catch all those in one go. So your GOK field already contains the list as multivalued. Then you can use prefix query parser plugin for this. Just make sure that field type of GOK is string not text. q={!prefix f=GOK}IA 3 should be equivalent to {!complexphrase}GOK:"IA 3*"
Re: Getting all field names in a Solr index via SolrJ
note the URL above (esp LukeRequestHandler). Best Erick On Fri, Jun 10, 2011 at 12:04 PM, Public Network Services wrote: > Sure, but how does the web admin do it without Luke? > > > On Fri, Jun 10, 2011 at 6:53 PM, Stefan Matheis < > matheis.ste...@googlemail.com> wrote: > >> If you really want to have _all_ defined (and therefore possible >> fields) you should have a look to >> http://wiki.apache.org/solr/LukeRequestHandler >> >> Regards >> Stefan >> >> On Fri, Jun 10, 2011 at 5:48 PM, Public Network Services >> wrote: >> > Hi... >> > >> > I would like to get a list of all field names in a Solr index, much like >> the >> > web admin can list all these fields in Schema Browser. It sounds trivial, >> > but still looking around as to how it would best be implemented. >> > >> > If I run a query with the wildcard string ("*:*"), not all field names >> are >> > included in all returned documents (rows), so I have to retrieve several >> > rows (i.e., a SolrDocumentList) and then iterate over all these rows >> > (documents) and use the getFieldNames() method on each. This is a hack >> and >> > does not produce 100% reliable results (e.g., how many rows to retrieve >> in >> > order to get all the field names?). >> > >> > So, how is this properly done? >> > >> > Thanks! >> > >> >
Heads Up - Index File Format Change on Trunk
Hey folks, I just committed LUCENE-3108 (Landing DocValues on Trunk) which adds a byte to FieldInfo. If you are running on trunk you must / should re-index any trunk indexes once you update to the latest trunk. its likely if you open up old trunk (4.0) indexes, you will get an exception related to Read Past EOF. Simon
Re: Getting all field names in a Solr index via SolrJ
> Sure, but how does the web admin do > it without Luke? admin/analysis.jsp uses luke too.
FastVectorHighlighter and hl.fragsize parameter set to zero causes exception
According to the documentation on the Solr wiki page, setting the hl.fragsize parameter to "0" indicates that the whole field value should be used (no fragmenting). However the FastVectorHighlighter throws an exception message fragCharSize(0) is too small. It must be 18 or higher. java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 or higher. at org.apache.lucene.search.vectorhighlight.SimpleFragListBuilder.createFieldFragList(SimpleFragListBuilder.java:36) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getFieldFragList(FastVectorHighlighter.java:177) at org.apache.lucene.search.vectorhighlight.FastVectorHighlighter.getBestFragments(FastVectorHighlighter.java:166) at I noticed that in Solr 1268 there was an attempt to fix this problem but it apparently did not work. 1) Is there any plan to implement a feature in FastVectorHighlighter that would behave the same as the regular Highlighter, i.e. return the whole field value when hl.fragsize=0? 2) Should I edit the wiki entry to indicate that the hl.fragsize=0 does not work with FVH? 3) Are there other parameters listed in the wiki that should have a note indicating whether they apply to only the regular highlighter or the FVH? Tom Burton-West
Document has fields with different update frequencies: how best to model
Hi, We have a document type which has fields which are pretty static. Say they change once every 6 month. But the same document has a field which changes hourly What are the best approaches to index this document ? Eg Hotel ID (static) , Hotel Description (static and costly to get from a url etc), FromPrice (changes hourly) Option 1 Index hourly as a single document and don't worry about the unneeded field updates Option 2 Split into 2 document types and index independently. This would require the front end application to query multiple times? doc1 ID,Description,DocType doc2 ID,HotelID,Price,DocType application performs searches based on hotel attributes for each hotel match issue query to get price Any other options ? Can you query across documents ? We run 1.4.1, we could maybe update to 3.2 but I don't think I could swing to trunk for JOIN feature (if that indeed is JOIN's use case) Thanks in advance PS Am I just worrying about de-normalised data and should sort the source data out maybe by caching and get over it ...? cheers Lee c
Re: Default query parser operator
It could, it would be a little bit clunky but that's the direction I'm heading. On Tue, Jun 7, 2011 at 6:05 PM, lee carroll wrote: > Hi Brian could your front end app do this field query logic? > > (assuming you have an app in front of solr) > > > > On 7 June 2011 18:53, Jonathan Rochkind wrote: > > There's no feature in Solr to do what you ask, no. I don't think. > > > > On 6/7/2011 1:30 PM, Brian Lamb wrote: > >> > >> Hi Jonathan, > >> > >> Thank you for your reply. Your point about my example is a good one. So > >> let > >> me try to restate using your example. Suppose I want to apply AND to any > >> search terms within field1. > >> > >> Then > >> > >> field1:foo field2:bar field1:baz field2:bom > >> > >> would by written as > >> > >> http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR > >> field2:bom > >> > >> But if they were written together like: > >> > >> http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom) > >> > >> I would want it to be > >> > >> http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR > bom) > >> > >> But it sounds like you are saying that would not be possible. > >> > >> Thanks, > >> > >> Brian Lamb > >> > >> On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkind > >> wrote: > >> > >>> Nope, not possible. > >>> > >>> I'm not even sure what it would mean semantically. If you had default > >>> operator "OR" ordinarily, but default operator "AND" just for "field2", > >>> then > >>> what would happen if you entered: > >>> > >>> field1:foo field2:bar field1:baz field2:bom > >>> > >>> Where the heck would the ANDs and ORs go? The operators are BETWEEN > the > >>> clauses that specify fields, they don't belong to a field. In general, > >>> the > >>> operators are part of the query as a whole, not any specific field. > >>> > >>> In fact, I'd be careful of your example query: > >>>q=field1:foo bar field2:baz > >>> > >>> I don't think that means what you think it means, I don't think the > >>> "field1" applies to the "bar" in that case. Although I could be wrong, > >>> but > >>> you definitely want to check it. You need "field1:foo field1:bar", or > >>> set > >>> the default field for the query to "field1", or use parens (although > that > >>> will change the execution strategy and ranking): q=field1:(foo bar) > >>> > >>> > >>> At any rate, even if there's a way to specify this so it makes sense, > no, > >>> Solr/lucene doesn't support any such thing. > >>> > >>> > >>> > >>> > >>> On 6/7/2011 10:56 AM, Brian Lamb wrote: > >>> > I feel like this should be fairly easy to do but I just don't see > anywhere > in the documentation on how to do this. Perhaps I am using the wrong > search > parameters. > > On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb > wrote: > > Hi all, > > > > Is it possible to change the query parser operator for a specific > field > > without having to explicitly type it in the search field? > > > > For example, I'd like to use: > > > > http://localhost:8983/solr/search/?q=field1:word token field2:parser > > syntax > > > > instead of > > > > http://localhost:8983/solr/search/?q=field1:word AND token > > field2:parser > > syntax > > > > But, I only want it to be applied to field1, not field2 and I want > the > > operator to always be AND unless the user explicitly types in OR. > > > > Thanks, > > > > Brian Lamb > > > > > > >
Re: wildcard search
Hi Ahmet, >>> I don't use it myself (but I will soon), so I >>> may be wrong, but did you try >>> to use the ComplexPhraseQueryParser : >>> >>> ComplexPhraseQueryParser >>> QueryParser which >>> permits complex phrase query syntax eg "(john >>> jon jonathan~) peters*". >>> >>> It seems that you could do such type of queries : >>> >>> GOK:"IA 38*" >> >> yes that sounds interesting. >> But I don't know how to get and install it into solr. Cam >> you give me a hint? > > https://issues.apache.org/jira/browse/SOLR-1604 I tried to follow this recipe, adapting it to the solr 3.2 I am testing right now. The first try gave me a message [java] !!! Couldn't get license file for /Installer/solr/apache-solr-3.2.0/solr/lib/ComplexPhrase-1.0.jar [java] At least one file does not have a license, or it's license name is not in the proper format. See the logs. BUILD FAILED so I created a fake license ComplexPhrase-LICENSE-MIT.txt for ComplexPhrase and tried again, which ran through successfully, I hope this is OK. I registered queryparser not to solrhome/conf/solrconfig.xml (no such thing, I'm running multiple cores) but to solrhome/cores/lit/conf/solrconfig.xml and could search successfully for {!complexphrase}GOK:"IC 62*" > But it seems that you can achieve what you want with vanilla solr. > > I don't follow the multivalued part in your example but you can tokenize > "IA 300; IC 330; IA 317; IA 318" into these 4 tokens > > IA 300 > IC 330 > IA 314 > IA 318 I didn't have to split them up, they are already separated as field with multiValued="true". But I need to be able to search for IA 310 - IA 319 with one call, {!complexphrase}GOK:"IA 31?" will do this now, or even for {!complexphrase}GOK:"IA 3*" to catch all those in one go. Thanks, this helped a lot Thomas
Re: Getting all field names in a Solr index via SolrJ
Sure, but how does the web admin do it without Luke? On Fri, Jun 10, 2011 at 6:53 PM, Stefan Matheis < matheis.ste...@googlemail.com> wrote: > If you really want to have _all_ defined (and therefore possible > fields) you should have a look to > http://wiki.apache.org/solr/LukeRequestHandler > > Regards > Stefan > > On Fri, Jun 10, 2011 at 5:48 PM, Public Network Services > wrote: > > Hi... > > > > I would like to get a list of all field names in a Solr index, much like > the > > web admin can list all these fields in Schema Browser. It sounds trivial, > > but still looking around as to how it would best be implemented. > > > > If I run a query with the wildcard string ("*:*"), not all field names > are > > included in all returned documents (rows), so I have to retrieve several > > rows (i.e., a SolrDocumentList) and then iterate over all these rows > > (documents) and use the getFieldNames() method on each. This is a hack > and > > does not produce 100% reliable results (e.g., how many rows to retrieve > in > > order to get all the field names?). > > > > So, how is this properly done? > > > > Thanks! > > >
Re: Getting all field names in a Solr index via SolrJ
If you really want to have _all_ defined (and therefore possible fields) you should have a look to http://wiki.apache.org/solr/LukeRequestHandler Regards Stefan On Fri, Jun 10, 2011 at 5:48 PM, Public Network Services wrote: > Hi... > > I would like to get a list of all field names in a Solr index, much like the > web admin can list all these fields in Schema Browser. It sounds trivial, > but still looking around as to how it would best be implemented. > > If I run a query with the wildcard string ("*:*"), not all field names are > included in all returned documents (rows), so I have to retrieve several > rows (i.e., a SolrDocumentList) and then iterate over all these rows > (documents) and use the getFieldNames() method on each. This is a hack and > does not produce 100% reliable results (e.g., how many rows to retrieve in > order to get all the field names?). > > So, how is this properly done? > > Thanks! >
Getting all field names in a Solr index via SolrJ
Hi... I would like to get a list of all field names in a Solr index, much like the web admin can list all these fields in Schema Browser. It sounds trivial, but still looking around as to how it would best be implemented. If I run a query with the wildcard string ("*:*"), not all field names are included in all returned documents (rows), so I have to retrieve several rows (i.e., a SolrDocumentList) and then iterate over all these rows (documents) and use the getFieldNames() method on each. This is a hack and does not produce 100% reliable results (e.g., how many rows to retrieve in order to get all the field names?). So, how is this properly done? Thanks!
Re: Processing/Indexing CSV
Hi, thanks for the Intro, will do next week :-) greetings from berlin On Fri, Jun 10, 2011 at 2:49 PM, Erick Erickson wrote: > Well, here's a place to start if you want to patch the code: > > http://wiki.apache.org/solr/HowToContribute > > If you do want to take this on, hop on over to the dev list > and start a discussion. I'd start with some posts on that list > before entering or working on a JIRA issue, just ask for > some guidance. A good place to start is pretty much what > you've done here, state your problem, and what you think > the correct behavior is. > > Be prepared for things to be brought up you never thought > of ... which is the point of starting the discussion there. > > A very good way to start is to get the code, compile it, and then > run some of the test cases in an IDE, stepping through the test > case in the debugger. Sometimes that doesn't work easily, but > if it does it gives you an idea of how the code works. There are > instructions at the above link for setting things up in an IDE > (Eclipse and Intellij are popular). > > Just loading the project and looking for files that begin with > CSV might be a place to start. Then look for files that begin > with TestCSV. Both of these "look promising". > > Anyway, if you get that far, then go over to the dev list and say > "I'm thinking of XXX, this code appears to be handled in YYY and > I'm thinking of changing it like ZZZ" and it will be well received. > > Of course if you want to go ahead and make your changes and submit > a patch, that's even better, but it's often best to get a bit of guidance > first. > > Best > Erick > > On Thu, Jun 9, 2011 at 5:17 PM, Helmut Hoffer von Ankershoffen > wrote: > > On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler < > kkrugler_li...@transpac.com>wrote: > > > >> > >> On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: > >> > >> > Hi, > >> > > >> > ... that would be an option if there is a defined set of field names > and > >> a > >> > single column/CSV layout. The scenario however is different csv files > >> (from > >> > different shops) with individual column layouts (separators, encodings > >> > etc.). The idea is to map known field names to defined field names in > the > >> > solr schema. If I understand the capabilities of the CSVLoader > correctly > >> > (sorry, I am completely new to Solr, started work on it today) this is > >> not > >> > possible - is it? > >> > >> As per the documentation on > >> http://wiki.apache.org/solr/UpdateCSV#fieldnames, you can specify the > >> names/positions of fields in the CSV file, and ignore fieldnames. > >> > >> So this seems like it would solve your requirement, as each different > >> layout could specify its own such mapping during import. > >> > >> Sure, but the requirement (to keep the process of integrating new shops > > efficient) is not to have one mapping per import (cp. the Email regarding > > "more or less schema free") but to enhance one mapping that maps common > > field names to defined fields disregarding order of known fields/columns. > As > > far as I understand that is not a problem at all with DIH, however DIH > and > > CSV are not a perfect match ,-) > > > > > >> It could be handy to provide a fieldname map (versus the value map that > >> UpdateCSV supports). > > > > Definitely. Either a fieldname map in CSVLoader or a robust CSVLoader in > DIH > > ... > > > > > >> Then you could use the header, and just provide a mapping from header > >> fieldnames to schema fieldnames. > >> > > That's the idea -) > > > > => what's the best way to progress. Either someone enhances the CSVLoader > by > > a field mapper (with multipel input field names mapping to one field name > in > > the Solr schema) or someone enhances the DIH with a robust CSV loader > ,-). > > As I am completely new to this Community, please give me the direction to > go > > (or wait :-). > > > > best regards > > > > > >> -- Ken > >> > >> > On Thu, Jun 9, 2011 at 10:12 PM, Yonik Seeley < > >> yo...@lucidimagination.com>wrote: > >> > > >> >> On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen > >> >> wrote: > >> >>> Hi, > >> >>> yes, it's about CSV files loaded via HTTP from shops to be fed into > a > >> >>> shopping search engine. > >> >>> The CSV Loader cannot map fields (only field values) etc. > >> >> > >> >> You can provide your own list of fieldnames and optionally ignore the > >> >> first line of the CSV file (assuming it contains the field names). > >> >> http://wiki.apache.org/solr/UpdateCSV#fieldnames > >> >> > >> >> -Yonik > >> >> http://www.lucidimagination.com > >> >> > >> > >> -- > >> Ken Krugler > >> +1 530-210-6378 > >> http://bixolabs.com > >> custom data mining solutions > >> > >> > >> > >> > >> > >> > >> > > >
Re: WordDelimiter and stemEnglishPossessive doesn't work
Hmmm, that is confusing. the stemEnglishPossessive=0 actually leaves the 's' in the index, just not attached to the word. The admin/analysis page can help show this Setting it equal to 1 removes it entirely from the stream. If you set catenateWords=1, you'll get "mcdonalds" in your index if stemEnglishPosessive=0 but not if you set stemEnglishPosessive=1. Hope that helps Erick On Fri, Jun 10, 2011 at 3:51 AM, roySolr wrote: > Hello, > > I have some problem with the wordDelimiter. My data looks like this: > > mcdonald's#burgerking#Free record shop#h&m > > I want to tokenize this on #. After that it has to split on whitespace. I > use the > wordDelimiter for that(can't use 2 tokenizers) > > Now this works but there is one problem, it removes the '. My index looks > like this: > > mcdonald > burgerking > free > record > shop > h&m > > I don't want this so i use the stemEnglishPossessive. The description from > this part of the filter looks like this: > > stemEnglishPossessive="1" causes trailing "'s" to be removed for each > subword. > "Doug's" => "Doug" > default is true ("1"); set to 0 to turn off > > My Field looks like this: > > > > > > splitOnCaseChange="0" > splitOnNumerics="0" > stemEnglishPossessive="0" > catenateWords="0" > /> > > > > It looks like the stemEnglishPossessive=0 is not working. How can i fix this > problem? Other filter? Did i forget something? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/WordDelimiter-and-stemEnglishPossessive-doesn-t-work-tp3047678p3047678.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: how to Index and Search non-Eglish Text in solr
Well, no. Specifying both indexed and stored as "false" is essentially a no-op, you'd never find anything! But even with indexed="true", this solution has problems. It's essentially using a single field to store text from different languages. The problem is that tokenization, stemming etc. behaves differently in different languages, especially when you contrast CJK and Western languages. Best Erick On Fri, Jun 10, 2011 at 1:05 AM, Mohammad Shariq wrote: > Thanks Erick for your help. > I have another silly question. > Suppose I created mutiple fieldTypes e.g. news_English, news_Chinese, > news_Japnese etc. > after creating these field, can I copy all these to CopyField "*defaultquery" > *like below : > > * > > > > *and my "defaultquery" looks like :* > multiValued="true"/> > > *Is this right way to deal with multiple language Indexing and searching* * > ???* > > * > > > On 9 June 2011 19:06, Erick Erickson wrote: > >> No, you'd have to create multiple fieldTypes, one for each language >> >> Best >> Erick >> >> On Thu, Jun 9, 2011 at 5:26 AM, Mohammad Shariq >> wrote: >> > Can I specify multiple language in filter tag in schema.xml ??? like >> below >> > >> > >> > >> > >> > > > words="stopwords.txt" enablePositionIncrements="true"/> >> > > generateWordParts="1" >> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" >> > catenateAll="0" splitOnCaseChange="1"/> >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > > class="solr.SnowballPorterFilterFactory" language="Hungarian" /> >> > >> > >> > On 8 June 2011 18:47, Erick Erickson wrote: >> > >> >> This page is a handy reference for individual languages... >> >> http://wiki.apache.org/solr/LanguageAnalysis >> >> >> >> But the usual approach, especially for Chinese/Japanese/Korean >> >> (CJK) is to index the content in different fields with language-specific >> >> analyzers then spread your search across the language-specific >> >> fields (e.g. title_en, title_fr, title_ar). Stemming and stopwords >> >> particularly give "surprising" results if you put words from different >> >> languages in the same field. >> >> >> >> Best >> >> Erick >> >> >> >> On Wed, Jun 8, 2011 at 8:34 AM, Mohammad Shariq >> >> wrote: >> >> > Hi, >> >> > I had setup solr( solr-1.4 on Ubuntu 10.10) for indexing news articles >> in >> >> > English, but my requirement extend to index the news of other >> languages >> >> too. >> >> > >> >> > This is how my schema looks : >> >> > > >> > required="false"/> >> >> > >> >> > >> >> > And the "text" Field in schema.xml looks like : >> >> > >> >> > > positionIncrementGap="100"> >> >> > >> >> > >> >> > > >> > words="stopwords.txt" enablePositionIncrements="true"/> >> >> > > >> generateWordParts="1" >> >> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" >> >> > catenateAll="0" splitOnCaseChange="1"/> >> >> > >> >> > > language="English" >> >> > protected="protwords.txt"/> >> >> > >> >> > >> >> > >> >> > > synonyms="synonyms.txt" >> >> > ignoreCase="true" expand="true"/> >> >> > > >> > words="stopwords.txt" enablePositionIncrements="true"/> >> >> > > >> generateWordParts="1" >> >> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" >> >> > catenateAll="0" splitOnCaseChange="1"/> >> >> > >> >> > > language="English" >> >> > protected="protwords.txt"/> >> >> > >> >> > >> >> > >> >> > >> >> > My Problem is : >> >> > Now I want to index the news articles in other languages to e.g. >> >> > Chinese,Japnese. >> >> > How I can I modify my text field so that I can Index the news in other >> >> lang >> >> > too and make it searchable ?? >> >> > >> >> > Thanks >> >> > Shariq >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > View this message in context: >> >> >> http://lucene.472066.n3.nabble.com/how-to-Index-and-Search-non-Eglish-Text-in-solr-tp3038851p3038851.html >> >> > Sent from the Solr - User mailing list archive at Nabble.com. >> >> > >> >> >> > >> > >> > >> > -- >> > Thanks and Regards >> > Mohammad Shariq >> > >> > > > > -- > Thanks and Regards > Mohammad Shariq >
Re: Processing/Indexing CSV
Well, here's a place to start if you want to patch the code: http://wiki.apache.org/solr/HowToContribute If you do want to take this on, hop on over to the dev list and start a discussion. I'd start with some posts on that list before entering or working on a JIRA issue, just ask for some guidance. A good place to start is pretty much what you've done here, state your problem, and what you think the correct behavior is. Be prepared for things to be brought up you never thought of ... which is the point of starting the discussion there. A very good way to start is to get the code, compile it, and then run some of the test cases in an IDE, stepping through the test case in the debugger. Sometimes that doesn't work easily, but if it does it gives you an idea of how the code works. There are instructions at the above link for setting things up in an IDE (Eclipse and Intellij are popular). Just loading the project and looking for files that begin with CSV might be a place to start. Then look for files that begin with TestCSV. Both of these "look promising". Anyway, if you get that far, then go over to the dev list and say "I'm thinking of XXX, this code appears to be handled in YYY and I'm thinking of changing it like ZZZ" and it will be well received. Of course if you want to go ahead and make your changes and submit a patch, that's even better, but it's often best to get a bit of guidance first. Best Erick On Thu, Jun 9, 2011 at 5:17 PM, Helmut Hoffer von Ankershoffen wrote: > On Thu, Jun 9, 2011 at 11:05 PM, Ken Krugler > wrote: > >> >> On Jun 9, 2011, at 1:27pm, Helmut Hoffer von Ankershoffen wrote: >> >> > Hi, >> > >> > ... that would be an option if there is a defined set of field names and >> a >> > single column/CSV layout. The scenario however is different csv files >> (from >> > different shops) with individual column layouts (separators, encodings >> > etc.). The idea is to map known field names to defined field names in the >> > solr schema. If I understand the capabilities of the CSVLoader correctly >> > (sorry, I am completely new to Solr, started work on it today) this is >> not >> > possible - is it? >> >> As per the documentation on >> http://wiki.apache.org/solr/UpdateCSV#fieldnames, you can specify the >> names/positions of fields in the CSV file, and ignore fieldnames. >> >> So this seems like it would solve your requirement, as each different >> layout could specify its own such mapping during import. >> >> Sure, but the requirement (to keep the process of integrating new shops > efficient) is not to have one mapping per import (cp. the Email regarding > "more or less schema free") but to enhance one mapping that maps common > field names to defined fields disregarding order of known fields/columns. As > far as I understand that is not a problem at all with DIH, however DIH and > CSV are not a perfect match ,-) > > >> It could be handy to provide a fieldname map (versus the value map that >> UpdateCSV supports). > > Definitely. Either a fieldname map in CSVLoader or a robust CSVLoader in DIH > ... > > >> Then you could use the header, and just provide a mapping from header >> fieldnames to schema fieldnames. >> > That's the idea -) > > => what's the best way to progress. Either someone enhances the CSVLoader by > a field mapper (with multipel input field names mapping to one field name in > the Solr schema) or someone enhances the DIH with a robust CSV loader ,-). > As I am completely new to this Community, please give me the direction to go > (or wait :-). > > best regards > > >> -- Ken >> >> > On Thu, Jun 9, 2011 at 10:12 PM, Yonik Seeley < >> yo...@lucidimagination.com>wrote: >> > >> >> On Thu, Jun 9, 2011 at 4:07 PM, Helmut Hoffer von Ankershoffen >> >> wrote: >> >>> Hi, >> >>> yes, it's about CSV files loaded via HTTP from shops to be fed into a >> >>> shopping search engine. >> >>> The CSV Loader cannot map fields (only field values) etc. >> >> >> >> You can provide your own list of fieldnames and optionally ignore the >> >> first line of the CSV file (assuming it contains the field names). >> >> http://wiki.apache.org/solr/UpdateCSV#fieldnames >> >> >> >> -Yonik >> >> http://www.lucidimagination.com >> >> >> >> -- >> Ken Krugler >> +1 530-210-6378 >> http://bixolabs.com >> custom data mining solutions >> >> >> >> >> >> >> >
Re: how can I return function results in my query?
On Fri, Jun 10, 2011 at 8:31 AM, Markus Jelsma wrote: > Nice! Will SOLR-1298 with aliasing also work with an external file field since > that can be a source of a function query as well? Haven't tried it, but it definitely should! -Yonik http://www.lucidimagination.com
Re: how can I return function results in my query?
Nice! Will SOLR-1298 with aliasing also work with an external file field since that can be a source of a function query as well? > On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote: > > I want to be able to run a query like idf(text, 'term') and have that > > data returned with my search results. I've searched the docs,but I'm > > unable to find how to do it. Is this possible and how can I do that ? > > In trunk, there's a very new feature called pseudo-fields where (among > other things) you can include the results of arbitrary function > queries along with the stored fields for each document. > > fl=id,idf(text,'term'),termfreq(text,'term') > > Or if you want to alias the idf call to a different name: > > fl=id,myidf:idf(text,'term'),mytermfreq:termfreq(text,'term') > > Of course, in this specific case it's a bit of a waste since idf won't > change per document. > > -Yonik > http://www.lucidimagination.com
Re: how can I return function results in my query?
> Ahmet, that doesnt return the idf > data in my results, unless I am > doing something wrong. When you run any function you > get the results > of the function back? I have never used Relevance Functions but there is an example [1] in the wiki where result of the function query is reflected into score. Then in order to get the computed function query it uses &fl=*,score. [1] http://wiki.apache.org/solr/FunctionQuery#General_Example
Re: how can I return function results in my query?
On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote: > I want to be able to run a query like idf(text, 'term') and have that data > returned with my search results. I've searched the docs,but I'm unable to > find how to do it. Is this possible and how can I do that ? In trunk, there's a very new feature called pseudo-fields where (among other things) you can include the results of arbitrary function queries along with the stored fields for each document. fl=id,idf(text,'term'),termfreq(text,'term') Or if you want to alias the idf call to a different name: fl=id,myidf:idf(text,'term'),mytermfreq:termfreq(text,'term') Of course, in this specific case it's a bit of a waste since idf won't change per document. -Yonik http://www.lucidimagination.com
Re: how can I return function results in my query?
Jason, Solr cannot add results of functions to the resultset for now. I don't know about a ticket (but i'd think there is one) but it's also highly desired in returning distances in spatial queries. Check Jira if you need to know more. Take care, debugQuery can add significant delay. If you need idf frequently then consider tvc. Cheers, > Markus, > Thanks for this info, I'll use debugQuery to test for now. It seems strange > that I can't have arbitrary function results returned with my data. Is > this an obstacle on the lucene or solr side? > > Jason > > On Fri, Jun 10, 2011 at 5:59 AM, Markus Jelsma > > wrote: > > Ah, function results are not returned in the result set. You must either > > use > > debugQuery to get the value or the TermVectorComponent to get idf for > > existing > > terms. > > > > > Ahmet, that doesnt return the idf data in my results, unless I am > > > doing something wrong. When you run any function you get the results > > > of the function back? > > > > > > Can you show me an example query you run ? > > > > > > > > > > > > //http://wiki.apache.org/solr/FunctionQuery#idf > > > > > > On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote: > > > > I want to be able to run a query like idf(text, 'term') and have > > > > that data returned with my search results. I've searched the > > > > docs,but I'm unable to find how to do it. Is this possible and how > > > > can I do that ?
Re: how can I return function results in my query?
Markus, Thanks for this info, I'll use debugQuery to test for now. It seems strange that I can't have arbitrary function results returned with my data. Is this an obstacle on the lucene or solr side? Jason On Fri, Jun 10, 2011 at 5:59 AM, Markus Jelsma wrote: > Ah, function results are not returned in the result set. You must either > use > debugQuery to get the value or the TermVectorComponent to get idf for > existing > terms. > > > Ahmet, that doesnt return the idf data in my results, unless I am > > doing something wrong. When you run any function you get the results > > of the function back? > > > > Can you show me an example query you run ? > > > > > > > > //http://wiki.apache.org/solr/FunctionQuery#idf > > > > On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote: > > > I want to be able to run a query like idf(text, 'term') and have that > > > data returned with my search results. I've searched the docs,but I'm > > > unable to find how to do it. Is this possible and how can I do that ? > -- - sent from my mobile 6176064373
Re: getTransformer error
Ok I guess it is nonetheless a stylesheet problem, as a basic hello world outputting stylesheet works. thanks, Bryan Rasmussen On Fri, Jun 10, 2011 at 10:12 AM, bryan rasmussen wrote: > Hi, > I am trying to transforrm the results using xslt - I store my xslts in > conf/xslt/ > > I call them in the querystring with the parameters > > &wt=xslt&tr=result.xsl > > And get back an error: > > getTransformer fails in getContentType > > java.lang.RuntimeException: getTransformer fails in getContentType > ... > Caused by: java.io.IOException: Unable to initialize Templates 'result.xsl' > ... > Caused by: javax.xml.transform.TransformerConfigurationException: > Could not compile stylesheet > > I'm supposing it is not an XSLT issue as I am able to run the > transformation via command line with Xalan. > > > Thanks, > Bryan Rasmussen >
replication/search on separate LANs
Hi All, I'm wondering if anyone had experience on replicating and searching over separate LANs? currently we do both over the same one. So each slave would have 2 Ethernet cards, 1/LAN and the master just one. We're currently building and replicating a daily index, this is quite large about 15M docs, and during the replication we see a high CPU load and searching becomes slow so we're trying to mitigate this. Has anyone set this up? did it help ? Cheers, Dan
Re: how can I return function results in my query?
Ah, function results are not returned in the result set. You must either use debugQuery to get the value or the TermVectorComponent to get idf for existing terms. > Ahmet, that doesnt return the idf data in my results, unless I am > doing something wrong. When you run any function you get the results > of the function back? > > Can you show me an example query you run ? > > > > //http://wiki.apache.org/solr/FunctionQuery#idf > > On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote: > > I want to be able to run a query like idf(text, 'term') and have that > > data returned with my search results. I've searched the docs,but I'm > > unable to find how to do it. Is this possible and how can I do that ?
Re: how can I return function results in my query?
Ahmet, that doesnt return the idf data in my results, unless I am doing something wrong. When you run any function you get the results of the function back? Can you show me an example query you run ? //http://wiki.apache.org/solr/FunctionQuery#idf On Thu, Jun 9, 2011 at 9:23 AM, Jason Toy wrote: > I want to be able to run a query like idf(text, 'term') and have that data > returned with my search results. I've searched the docs,but I'm unable to > find how to do it. Is this possible and how can I do that ? > > > -- - sent from my mobile 6176064373
getTransformer error
Hi, I am trying to transforrm the results using xslt - I store my xslts in conf/xslt/ I call them in the querystring with the parameters &wt=xslt&tr=result.xsl And get back an error: getTransformer fails in getContentType java.lang.RuntimeException: getTransformer fails in getContentType ... Caused by: java.io.IOException: Unable to initialize Templates 'result.xsl' ... Caused by: javax.xml.transform.TransformerConfigurationException: Could not compile stylesheet I'm supposing it is not an XSLT issue as I am able to run the transformation via command line with Xalan. Thanks, Bryan Rasmussen
WordDelimiter and stemEnglishPossessive doesn't work
Hello, I have some problem with the wordDelimiter. My data looks like this: mcdonald's#burgerking#Free record shop#h&m I want to tokenize this on #. After that it has to split on whitespace. I use the wordDelimiter for that(can't use 2 tokenizers) Now this works but there is one problem, it removes the '. My index looks like this: mcdonald burgerking free record shop h&m I don't want this so i use the stemEnglishPossessive. The description from this part of the filter looks like this: stemEnglishPossessive="1" causes trailing "'s" to be removed for each subword. "Doug's" => "Doug" default is true ("1"); set to 0 to turn off My Field looks like this: It looks like the stemEnglishPossessive=0 is not working. How can i fix this problem? Other filter? Did i forget something? -- View this message in context: http://lucene.472066.n3.nabble.com/WordDelimiter-and-stemEnglishPossessive-doesn-t-work-tp3047678p3047678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud questions
Mohammad, There are two sides to using SolrCloud in production - the SolrCloud code, and the Solr 4.0 code that it is a part of. You can reduce the risk of being caught out by Solr/Lucene 4.0 changes (e.g. index structure changes) by using a Lucene 3.0 index format within Solr 4.0. While there's still risk involved in using an unreleased product, you'll have increased your chances of stability. Still hoping someone has answers to my original questions... Upayavira On Fri, 10 Jun 2011 10:55 +0530, "Mohammad Shariq" wrote: > I am also planning to move to SolrCloud; > since its still in under development, I am not sure about its behavior in > Production. > Please update us once you find it stable. > > > On 10 June 2011 03:56, Upayavira wrote: > > > I'm exploring SolrCloud for a new project, and have some questions based > > upon what I've found so far. > > > > The setup I'm planning is going to have a number of multicore hosts, > > with cores being moved between hosts, and potentially with cores merging > > as they get older (cores are time based, so once today has passed, they > > don't get updated). > > > > First question: The solr/conf dir gets uploaded to Zookeeper when you > > first start up, and using system properties you can specify a name to be > > associated with those conf files. How do you handle it when you have a > > multicore setup, and different configs for each core on your host? > > > > Second question: Can you query collections when using multicore? On > > single core, I can query: > > > > http://localhost:8983/solr/collection1/select?q=blah > > > > On a multicore system I can query: > > > > http://localhost:8983/solr/core1/select?q=blah > > > > but I cannot work out a URL to query collection1 when I have multiple > > cores. > > > > Third question: For replication, I'm assuming that replication in > > SolrCloud is still managed in the same way as non-cloud Solr, that is as > > ReplicationHandler config in solrconfig? In which case, I need a > > different config setup for each slave, as each slave has a different > > master (or can I delegate the decision as to which host/core is its > > master to zookeeper?) > > > > Thanks for any pointers. > > > > Upayavira > > --- > > Enterprise Search Consultant at Sourcesense UK, > > Making Sense of Open Source > > > > > > > -- > Thanks and Regards > Mohammad Shariq > --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source