Re: [jira] Commented: (SOLR-20) A simple Java client for updating and searching
: IMO, we should strive to be nice and not repeat keys when the : NamedList is more of the Map variety than the List. we should try .. but we can't garuntee .. i don't have any compelling cases where i've needed to reuse the same name, but i've certainly written plenty of code that puts multiple items in a list that have no name. : > mechanism from the client code like this one -- using XML really is the : > safest bet since it's the most expressive of all the formats we currently : > have) : : JSON responses are smaller and can be quite a bit faster to parse. i won't argue it's not faster or smaller -- just that it's not as expressive :) i'm guessing that if you generated enough JSON markup to be equally expressive and safe, the size would go up considerably (but still probably not be as big as XML) and the speed would be affected to some degree as well -- not to mention the ease of use, accessing the new more complex JSON structures you would have. -Hoss
Re: [jira] Commented: (SOLR-104) Update Plugins
On 1/25/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: Um... just for the record, some of these comments -- i'm not sure what they are in reponse too :) Someone on the lucene list had a tip about quoting in JIRA... they composed their reply as a response to the email sent to the dev list, then just pasted it in to JIRA. -Yonik
Re: [jira] Commented: (SOLR-20) A simple Java client for updating and searching
On 1/25/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I'm using a slightly modified version of the json.org code. It stores : things in a LinkedHashMap (to maintain order) and formats dates : explicitly. Uh... watch out with that ... a LinkedHashMap is first and for most a Map, so it doesn't support repeated keys. I currently have some (ugly) code in the response writer that handles repeated keys, just not nicely. IMO, we should strive to be nice and not repeat keys when the NamedList is more of the Map variety than the List. (I suspect for a client API that's going to completley hide the transport mechanism from the client code like this one -- using XML really is the safest bet since it's the most expressive of all the formats we currently have) JSON responses are smaller and can be quite a bit faster to parse. -Yonik
[jira] Commented: (SOLR-112) Hierarchical Handler Config
[ https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467712 ] Hoss Man commented on SOLR-112: --- I think you're dead on JJ ... any generic NamedList merging won't neccessarily do "the right thing" in all cases when talking about RequestHandler init args -- very special logic would need to be used to deal with the defaults/appends/invarients in a logical manner, and that logic may not be bale to take into account other init params that other RequestHandlers (subclasses of the core ones perhaps) might add. a cleaner way to deal with this might just be to have the individual RequestHandlers manage this themselves -- using SolrCore.getRequestHandler(String) and protected methods they explicitly support to allow other instances to get access to their SolrParams. ie... category 0.01 text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 search/products/all price:[0 TO 100] price:[100 TO *] inStock:true ...where DisMaxRequestHandler (or most likely teh new Base class Ryan has written) has methods like... protected SolrParams getDefaults() protected SolrParams getInvarients() protected SolrParams getAppends() ...and the init method looks for an "extends" arg, if it's there fetches it from the SolrCore, tests it's class and casts it, then calls the methods above and builds up it's own SolrParams usign a combination of those and the ones explicitly specified in it's config. > Hierarchical Handler Config > --- > > Key: SOLR-112 > URL: https://issues.apache.org/jira/browse/SOLR-112 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley >Priority: Minor > Fix For: 1.2 > > Attachments: SOLR-112.patch > > > From J.J. Larrea on SOLR-104 > 2. What would make this even more powerful would be the ability to "subclass" > (meaning refine and/or extend) request handler configs: If the requestHandler > element allowed an attribute extends="" and > chained the SolrParams, then one could do something like: >class="solr.DisMaxRequestHandler" > > > 0.01 > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > > ... much more, per the "dismax" example in the sample solrconfig.xml ... > > ... and replacing the "partitioned" example ... >extends="search/products/all" > > > inStock:true > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-104) Update Plugins
: I'll comment on points i have answers or questions. The rest will go : on the TODO list. Um... just for the record, some of these comments -- i'm not sure what they are in reponse too :) : To trigger raw request reading you *must* have a parameter on the URL. : This was my design in response to Yonik's observation that curl puts : "application/x-www-form-urlencoded" in the header even if it is not : form-urlencoded encoded. : : As written, it does not rely on clients putting accurate headers : (except for multipart) - it relies on a URL convention. Yonik's point was that we need to do that when supporting *legacy* requests ... since we are designing a new mechanism which will live at a new URL in the example (and which clients would have to explicitly map to the old URL in their configs) we have the freedom to be more strict in our parsing. we should require that people send us a Content-Type if they want to post data to us -- and we should use that content-Type to determine how to parse the content. (this is one of the reasons i'm suggesting we keep the existing Servlets arround -- that way we don't have to be worried about them so much as we tweak the new classes) : I only put it in there to make you happy! I'll take it out and we can : deal with it later if necessary. : : I didn't think i could get that past you! I'll take it out and save : the pleeding for another time. these are the comments where i'm not sure what they are refering to. : for a local file, you can use stream.url=file:///C:/pathtofile.txt, : for remote ones, you use stream.url=http://... oh yeah ... why didn't i think of that? : We should have a good notice in the config warning people to have some : security running before enabling streaming. yeah ... you had me convinced of that before, but i'm leaning more towards yonik's point now: Solr has a lot of inherient trust to anyone that can hit it directly. if/when we allow the list of RequestParsers to be configurable in solrconfig.xml, then the STREAM_URL support could be another RequestParser that they either refer to directly, or register as a "hook" on other RequestParsers. In the meantime though: having that option might misslead people to a false sense of security. : I had implemented it the normal way, BUT it broke many tests (since : they never call init). The better solution is to make sure the tests : call init a standard way, but that got me into editing many files I : don't quite understand, so i opted for lazy init. i don't understand ... why weren't your tests calling init? ... if you were doing everything via the TestHarness it inits the SolrCore which inits all the requesthandlers -- if you were constructing the RequestHandler yourlse you could just call the init(NamedList) method directly. BTW: something i keep forgetting to mention, is that it would be helpful if you could setup your IDE to use 2 spaces per tab-stop, and never use tab characters ... it'll make the patches easier to apply. (not every Solr source file is like that right now ... but it's the goal) -Hoss
Re: [jira] Commented: (SOLR-20) A simple Java client for updating and searching
: > > * I'm using wt=JSON rather then XML. (It maps to a hash easier) : I'm using a slightly modified version of the json.org code. It stores : things in a LinkedHashMap (to maintain order) and formats dates : explicitly. Uh... watch out with that ... a LinkedHashMap is first and for most a Map, so it doesn't support repeated keys. (I suspect for a client API that's going to completley hide the transport mechanism from the client code like this one -- using XML really is the safest bet since it's the most expressive of all the formats we currently have) -Hoss
[jira] Commented: (SOLR-84) New Solr logo?
[ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467699 ] Yonik Seeley commented on SOLR-84: -- What, no kryptonians around here? I like the rounded letters too. w.r.t. the "r" comment, perhaps try making it the same width as the "s"? > New Solr logo? > -- > > Key: SOLR-84 > URL: https://issues.apache.org/jira/browse/SOLR-84 > Project: Solr > Issue Type: Improvement >Reporter: Bertrand Delacretaz >Priority: Minor > Attachments: logo-solr-source-files-take2.zip, > solr-84-source-files.zip, solr-logo-20061214.jpg, solr-logo-20061218.JPG, > solr-logo-20070124.JPG, solr.jpg, solr.jpg > > > Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) > sarraux-dessous.ch) has reworked his logo proposal to be more "solar". > This can either be the start of a logo contest, or if people like it we could > adopt it. The gradients can make it a bit hard to integrate, not sure if this > is really a problem. > WDYT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-107) Iterable NamedList with java5 generics
[ https://issues.apache.org/jira/browse/SOLR-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-107. --- Resolution: Fixed > Iterable NamedList with java5 generics > -- > > Key: SOLR-107 > URL: https://issues.apache.org/jira/browse/SOLR-107 > Project: Solr > Issue Type: Improvement >Reporter: Ryan McKinley >Priority: Trivial > Attachments: IterableNamedList.patch, IterableNamedList.patch > > > Iterators and generics are nice! > this patch adds both to NamedList.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-107) Iterable NamedList with java5 generics
[ https://issues.apache.org/jira/browse/SOLR-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467661 ] Yonik Seeley commented on SOLR-107: --- Looks good, I just committed this. Thanks again Ryan! ps: if patches start in the trunk, it's easier for someone to commit it. > Iterable NamedList with java5 generics > -- > > Key: SOLR-107 > URL: https://issues.apache.org/jira/browse/SOLR-107 > Project: Solr > Issue Type: Improvement >Reporter: Ryan McKinley >Priority: Trivial > Attachments: IterableNamedList.patch, IterableNamedList.patch > > > Iterators and generics are nice! > this patch adds both to NamedList.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-84) New Solr logo?
[ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467662 ] Hoss Man commented on SOLR-84: -- I dig the version Ryan posted ... the rounded letters really make a huge differnece -- as for the colors of the sun, i think that red is a bit too ... red. I think the color palate of the current Solr logo would look better. The white in the center of the "o" makes it an obvious "o", but a less obvious sun ... which made me think about the "o" in previous versions. I think a solid orange circle would make a good Sun/"o", and in the context of the other letters would be an obvious character ... it could then have a radiating gradient or light orange to yellow ... it seems like that owuld be a good balance between the two designs posted so far. one other personal opinion: with teh crescnet, the "r" looks okay, but without it, we probably want to narrow it a bit -- it sticks way out there. As for a logo for "Flare" ... i think resusing the current "Solr on fire" logo might be perfect :) > New Solr logo? > -- > > Key: SOLR-84 > URL: https://issues.apache.org/jira/browse/SOLR-84 > Project: Solr > Issue Type: Improvement >Reporter: Bertrand Delacretaz >Priority: Minor > Attachments: logo-solr-source-files-take2.zip, > solr-84-source-files.zip, solr-logo-20061214.jpg, solr-logo-20061218.JPG, > solr-logo-20070124.JPG, solr.jpg, solr.jpg > > > Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) > sarraux-dessous.ch) has reworked his logo proposal to be more "solar". > This can either be the start of a logo contest, or if people like it we could > adopt it. The gradients can make it a bit hard to integrate, not sure if this > is really a problem. > WDYT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: facet response
: Of course we might stumble across cases where ordering isn't important, : but multiple values with the same key is... While not the case for : facet counts, if it ever came up that could be handled (in a different : json.nl variant) by serializing with all same-keyed values collected : into an Array, at the expense of strict ordering: : : [["DupKey", [Val1, Val3, Val4]], ["AnotherKey", Val2]] It comes down to a question of intent -- if the code producing the NamedList intended multiple uses of the same key to be interpreted as a single key refering ot a list of values, it could have created a Map where the values of some keyws was an array. : The SOLR NamedList is a simple analog of the element-tree part of XML : (no attributes or mixed content). This article gives a very thorough : summary of the mappings between XML and JSON which can be applied to the : NamedList->JSON issue as well: : : http://www.xml.com/lpt/a/1658 Hmmm... i only skimmed this but it seems that according to their rules of thumb, it's imposible to make either a reversable or semantically equivalent JSON structure out of a NamedList since neither of the following rules are garunted to be true... #1 all subelement names occur exactly once, or subelements with identical names are in sequence. #2 multiple homonymous subelements occur nonsequentially, and element order doesn't matter. : I stumbled across a Ruby OrderedHash which looks to be an analog of NamedList: : : http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/20551 i don't know RUby, but the cases i imagine that doesn't cover is key's pearing more then once, and values with no keys. -Hoss
Re: [Solr Wiki] Update of "SchemaXml" by JürgenHermann
is this addition really neccessary? ... the paragraph directly before it just said "Individual fields can override the various options (indexed, stored, etc...) that they inherit from their fieldtype" ... this just seems a bit redundant. ? : + Common options that fields can have are... : +* `indexed=true|false` : +* `stored=true|false` : +* `compressed=true|false` : +* `multiValued=true|false` : +* `omitNorms=true|false` -Hoss
Re: facet.missing?
: Sorry Hoss if I came down too hard against the view that "*" should mean : "all docs". With the renewed clarity that getting a little sleep : brings, I better appreciate the merits of your position. And that it's : dangerous trying to decide what makes sense for other people. no worries ... i didn't take it as being harsh. I don't actually have any opinion about what a plain q=* should do ... I just worry that if we add zero width prefixes, and that results in q=* meaning q=defaultSearchField:* that might confuse a lot of people who expect it to be something else. if 50% of the people thing q=* means one thing, and 50% think q=* means something else, then the safest thing to do is probably to make sure that q=* is an error. : Obviously from a practical perspective, a MatchAllDocsQuery is quite a : bit faster than matching all docs having some value for a particular : field, and should be encouraged unless the latter is explicitly desired. FYI: there is special syntax in the trunk Lucene QueryParser for generating a MatchAllDocs, it's: *:* ... we just don't use that version of Lucene in Solr yet. : Now I think there is fair agreement that it would be great if field:* : could be made to work. So if some portion of users want unqualified * i actually have no opinion on that either ... i think field:[* TO *] is just as easy to use, and less prone to confusion about what it means (ie: you have to understand some of the syntax to know to try and use it, you aren't likely to inadvertantely use it and think you're getting a different result set then you really are) : processed as the equivalent of :*, then perhaps there : should be a configuration directive which controls whether unqualified * : (or perhaps any defined string) is trapped? I haven't come across : existing SOLR code to handle "*" as a special case of PrefixQuery, so I : assume the trapping should be at the level of SolrQueryParser, similar : to how [* TO *] is trapped for range queries? there is a QueryParser.setAllowLeadingWildcard method which we could set based on a schema value (much like we do for the defaultOperator) ... i think it would be perfectly fine making that an option which defaults to false and the result of enabling it being that q=* uses the defaultSearchField ... at least then people need to consider the issue and turn it on first. -Hoss
Re: [jira] Commented: (SOLR-104) Update Plugins
On 1/24/07, Ryan McKinley (JIRA) <[EMAIL PROTECTED]> wrote: Ryan McKinley commented on SOLR-104: the 2MB limit is set in solrconfig.xml maybe 2MB is too small as the default, but i figured it shoudl be configurable. Yeah, but I was lacking knowledge about the reasons behind a limit. (potential denial of service attacks if this is exposed to the outside, keeping people from accidentally hurting themselves somehow?) -Yonik
[jira] Resolved: (SOLR-117) constrain field faceting to a prefix
[ https://issues.apache.org/jira/browse/SOLR-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-117. --- Resolution: Fixed committed. > constrain field faceting to a prefix > > > Key: SOLR-117 > URL: https://issues.apache.org/jira/browse/SOLR-117 > Project: Solr > Issue Type: New Feature > Components: search >Reporter: Yonik Seeley > Attachments: facet_prefix.patch, facet_prefix.patch > > > Useful for faceting as someone is typing, autocompletion, etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: facet response
Some peanut-gallery comments (apologies if they repeat ideas already discussed, I haven't read the full thread): >> > Chris Hostetter <[EMAIL PROTECTED]> wrote: as i said, i'd rather invert the use case set to find where "ordering >> >> isn't important" and change those to Maps It makes sense to use NamedLists only where they are truly needed so data that can be serialized to JSON as a hash always is, and doesn't get affected by json.nl. Of course we might stumble across cases where ordering isn't important, but multiple values with the same key is... While not the case for facet counts, if it ever came up that could be handled (in a different json.nl variant) by serializing with all same-keyed values collected into an Array, at the expense of strict ordering: [["DupKey", [Val1, Val3, Val4]], ["AnotherKey", Val2]] >On 1/24/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: >>Faceting is the only thing I've come upon. After playing with this >>more and contemplating all the messages on this thread, I can't say >>that it's "broken", but telling solr to sort things and then when >>pulling them back out on the other end in seemingly random order it >>sure feels that way. Re-sorting on the client is the easiest >>solution and I've gone that route for now. I agree that having to do client-side sorting of a pre-sorted dataset is horrible. But the problem has nothing to do with SOLR, it is due to the limitations of JavaScript and thus JSON. The SOLR NamedList is a simple analog of the element-tree part of XML (no attributes or mixed content). This article gives a very thorough summary of the mappings between XML and JSON which can be applied to the NamedList->JSON issue as well: http://www.xml.com/lpt/a/1658 >>Having the facet_counts area output as an ordered list in all cases >>seems the most sensible to me, since it is unlikely that the facets >>would be accessed by key. Agreed. In JavaScript-land one can use array[array.indexOf(key,fromIndex)][1] to extract values by name, if so desired. I stumbled across a Ruby OrderedHash which looks to be an analog of NamedList: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/20551 I imagine it would be fairly simple to extend that OrderedHash to construct from a JSON-read array-of-pairs, especially since faceting should not emit duplicate keys. Yonik wrote: >I think a nice compromise between >efficiency and human readability might be to just alternate key,val in >the array: >['foo',10,'bar',20] >That would even allow representing a null key as a null in the array. I'm not sure there is an advantage to the current json.nl=arrarr, but it doesn't hurt to allow that to be specified as an alternate non-default format. >But I'm leaning on keeping the current format for XML for both >slightly better readability, and backward compatibility. > 10 20 >As opposed to: > foo 10 bar 20 I agree re: not dumbing-down the XML serialization to match the lowest-common-denominator. - J.J.
Re: facet response
On 1/24/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: On Jan 22, 2007, at 6:14 PM, Yonik Seeley wrote: > Chris Hostetter <[EMAIL PROTECTED]> wrote: >> as i said, i'd rather invert the use case set to find where "ordering >> isn't important" and change those to Maps > > That might be a *lot* of changes... > What's currently broken, just faceting or anything else? Faceting is the only thing I've come upon. After playing with this more and contemplating all the messages on this thread, I can't say that it's "broken", but telling solr to sort things and then when pulling them back out on the other end in seemingly random order it sure feels that way. Re-sorting on the client is the easiest solution and I've gone that route for now. I plan on digging into the JSON option a bit and seeing if order is preserved, though I doubt it would be any difference since it will surely parse back into a Hash by default. Though the json.nl.arr=arr would surely preserve order, though that changes the access to things all over the place on the client. Having the facet_counts area output as an ordered list in all cases seems the most sensible to me, since it is unlikely that the facets would be accessed by key. For JSON and friends, I agree. I think a nice compromise between efficiency and human readability might be to just alternate key,val in the array: ['foo',10,'bar',20] That would even allow representing a null key as a null in the array. But I'm leaning on keeping the current format for XML for both slightly better readability, and backward compatibility. 10 20 As opposed to: foo 10 bar 20 -Yonik
[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support
[ https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467399 ] Yonik Seeley commented on SOLR-69: -- > MoreLikeThis queries should work irrelevant of whether fields are stored or > not, as it's based on what's indexed I haven't looked at the lucene-code for more-like-this, but it's just like highlighting... to get the tokens for a specific document, you need to either get it's stored field and re-analyze or store term vectors and use them. Looking up those terms in other documents is then fast (that's where the inverted index comes in) > PATCH:MoreLikeThis support > -- > > Key: SOLR-69 > URL: https://issues.apache.org/jira/browse/SOLR-69 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Bertrand Delacretaz >Priority: Minor > Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch, SOLR-69.patch, > SOLR-69.patch > > > Here's a patch that implements simple support of Lucene's MoreLikeThis class. > The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be > more appropriate ;-) Erik Hatcher's example mentioned in > http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html > To use it, add at least the following parameters to a standard or dismax > query: > mlt=true > mlt.fl=list,of,fields,which,define,similarity > See the MoreLikeThisHelper source code for more parameters. > Here are two URLs that work with the example config, after loading all > documents found in exampledocs in the index (just to show that it seems to > work - of course you need a larger corpus to make it interesting): > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > Results are added to the output like this: > > ... > > > > 1.5293242 > SOLR1000 > > > > > 1.5293242 > UTF8TEST > > > > I haven't tested this extensively yet, will do in the next few days. But > comments are welcome of course. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Stop words and phrases
On 1/25/07, Manuel Albela Miranda <[EMAIL PROTECTED]> wrote: Does anybody know how to ignore stopwords when searching with a phrase? I've been searching for information about this but find nothing. The thing is, i want to use stopwords when searching: /field:this is a house /and not to use them when searching like: /field:"this is a house"/. The easiest way might be to index the field twice, once with the stop filter and once without. See copyField in the schema for an easy way to copy one field to another when indexing. -Yonik
RE: Stop words and phrases
You have to use the pass a set of stop words(Strings) as "java.util.Set" in the constructor of the StandardAnalyzer(default)... Jeryl Cook -Original Message- From: Manuel Albela Miranda [mailto:[EMAIL PROTECTED] Sent: Thursday, January 25, 2007 5:00 AM To: solr-dev@lucene.apache.org Subject: Stop words and phrases Hello everybody, Does anybody know how to ignore stopwords when searching with a phrase? I've been searching for information about this but find nothing. The thing is, i want to use stopwords when searching: /field:this is a house /and not to use them when searching like: /field:"this is a house"/. Hope you can help me. Thank you! Regards. Manu
[jira] Commented: (SOLR-84) New Solr logo?
[ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467388 ] Clay Webster commented on SOLR-84: -- The rounded characters are nice. The sun's red color and open middle aren't so nice IMHO. Erik: perhaps an animated gif that fires out from the Solr's "o" and engulfs our planet? ;-) > New Solr logo? > -- > > Key: SOLR-84 > URL: https://issues.apache.org/jira/browse/SOLR-84 > Project: Solr > Issue Type: Improvement >Reporter: Bertrand Delacretaz >Priority: Minor > Attachments: logo-solr-source-files-take2.zip, > solr-84-source-files.zip, solr-logo-20061214.jpg, solr-logo-20061218.JPG, > solr-logo-20070124.JPG, solr.jpg, solr.jpg > > > Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) > sarraux-dessous.ch) has reworked his logo proposal to be more "solar". > This can either be the start of a logo contest, or if people like it we could > adopt it. The gradients can make it a bit hard to integrate, not sure if this > is really a problem. > WDYT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-104) Update Plugins
[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467330 ] Ryan McKinley commented on SOLR-104: Thanks for going through this! I'll comment on points i have answers or questions. The rest will go on the TODO list. Ok, so we should make sure to put the charset into ContentStream.getContentType() and open the Reader with: String charset = getCharset( stream.getContentType() ); new InputStreamReader( stream.getStream(), charset ); Sounds reasonable. I took them out because (at the time) it seemed clearer and has less duplicated code. yes. At some point it would also be good to make a stronger name distinction between UpdateHandler (the thing that handles the nity gritty lucene indexing) and the UpdateRequestHandler -- but lets save that for another day! As written, the StandardRequestParser: 1) checks if multipart 2) checks if it has parameters in the URL (?xxx=yyy) if it has parameters (?xxx=yyy) then use the RawRequestParser otherwise it pulls parameters from the map. (SimpleRequestParser) To trigger raw request reading you *must* have a parameter on the URL. This was my design in response to Yonik's observation that curl puts "application/x-www-form-urlencoded" in the header even if it is not form-urlencoded encoded. As written, it does not rely on clients putting accurate headers (except for multipart) - it relies on a URL convention. I only put it in there to make you happy! I'll take it out and we can deal with it later if necessary. I didn't think i could get that past you! I'll take it out and save the pleeding for another time. for a local file, you can use stream.url=file:///C:/pathtofile.txt, for remote ones, you use stream.url=http://... We should have a good notice in the config warning people to have some security running before enabling streaming. I had implemented it the normal way, BUT it broke many tests (since they never call init). The better solution is to make sure the tests call init a standard way, but that got me into editing many files I don't quite understand, so i opted for lazy init. That sounds fine. Since it is a tenative private interface, i was not too worried about it. > Update Plugins > -- > > Key: SOLR-104 > URL: https://issues.apache.org/jira/browse/SOLR-104 > Project: Solr > Issue Type: Improvement > Components: update >Affects Versions: 1.2 >Reporter: Ryan McKinley > Fix For: 1.2 > > Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, > DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, > DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, > HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, > HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, > HandlerRefactoring.DRAFT.zip > > > The plugin framework should work for 'update' actions in addition to 'search' > actions. > For more discussion on this, see: > http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Stop words and phrases
Hello everybody, Does anybody know how to ignore stopwords when searching with a phrase? I've been searching for information about this but find nothing. The thing is, i want to use stopwords when searching: /field:this is a house /and not to use them when searching like: /field:"this is a house"/. Hope you can help me. Thank you! Regards. Manu
[jira] Commented: (SOLR-104) Update Plugins
[ https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467305 ] Hoss Man commented on SOLR-104: --- Woot! ... i think we're really close to comiting this. I made a hodgepodge list of comments as i read through everything, and then tried to organize them. I agree with yonik that we should feel free to commit new functionality without being afraid of needing to change the api of that functionality befor the next release, but i'm not 100% comfortable with how backwards compatible this patch is for the existing /select and /update URLs ... this may just be an issue of me being paranoid (and tired) but there's at least one code path difference. Anyway, here are my notes... Comments regarding backwards compatibility of the patch... - SolrCore.update(Reader,Writer) was a public method that's been removed ... this is probably fine, just pointing it out for the record. - SolrUpdateServlet used HttpServletRequest.getReader, the new UpdateRequestHandler uses an InputStreamReader arround HttpServletRequest.getInputStream() ... this seems bad for legacy update support from a char encoding standpoint. - While i think it's important to refactor the XML Update parsing out of SolrCore - I'm still not clear what is gained by eliminating SolrServlet and SolrUpdate. The big advantage of the new dispatcher being a Filter is that it can pass requests on that it doesn't want to deal with, so why not leave the existing servlets arround with only the minimum neccessary changes... - move SolrCore's init to Dispatcher - use 3 arg core.execute in SolrServlet - have SolrUpdateServlet call UpdateRequestHandler.update(Reader) and generate the legacy response XML ...in order to reduce the possibility of an introducing bugs (particularly since the existing Servlets are the one area where we don't have *any* unit tests) Comments regarding functionality that i think we *may* want to address before commiting (but i won't fight over if i'm the only one that cares)... - UpdateRequestHandler should probably renamed XmlUpdateRequestHandler (particularly since i expect Yonik to commit a CsvUpdateRequestHandler real soon now) - StandardRequestParser can't assume that a POST which isn't multipart/* should be handled by a RawRequestParser ... if the content type is "application/x-www-form-urlencoded" then SimpleRequestParser should be used (so all params from query string and body are included) - What should the expectations of ContentStream.getInputStream().close() be? Should the Dispatcher iterate over any Iterable streams when writing the output and try to close them, ignoring any Exceptions? - I'm really not fond of having SolrParams.STREAM_TYPE. Can we please, please leave it out for now and rely on on content-type detection? We can add it back in if/when we make RequestParser a public interface and let people register them in solrconfig. - I really don't think we want to open the pandoras box of putting the HttpServletRequest in the SolrQueryRequest ... i'd hate to put that in and then have to support it forever. Things in the current patch that aren't strictly neccessary for the current issue and can (should?) be commited seperately... - are we definitely deprecating SolrQueryResponse.getException ? - StandardRequestHandler and DisMaxRequestHandler have only been changed to subclass the new base class. - only whitespace changes in SolrRequestHandler.java - SolrServletRequest has only imports rearranged Things which definitely shouldn't block up the patch, but should go on a short term todo list... - see backwards compatibility comment about (Xml)UpdateRequestHandler using InputStreamReader without specifying a charset ... in general the handler should look at the ContentStream's content type to determine the encoding of the InputStream (and probably default to UTF-8) - need to work out what kind of NamedList should be returned by (Xml)UpdateRequestHandler.update(Reader) - some of the new files are missing the Apache boilerplate. - a use case we talked about that still isn't covered is opening local files as a stream ... this should be easy to add later right along side STREAM_URL. - we should fill in the getURL methods for DisMax/Standard to point at wiki - CommitRequestHandler should use UpdateParams.OPTIMIZE - the init semantics for (Xml)UpdateRequestHandler are odd: as a RequestHandler it's garunteed that init(NamedList) will be called, but instead it uses it's own private init() that's called lazily. - DumpRequestHandler should dump ContentStream.getSize(). - doFilter should call parsers.parse( path, req ) as soon as it has the path, and then delegate to a helper method that doesn't have access to the HttpServletRequest ... this reduces both the
[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support
[ https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467282 ] mrball commented on SOLR-69: Yep, doesn't seem to work with non-stored fields. (if you only use non stored fields in mlt.fl) I believe the stored field values are used to build the query > PATCH:MoreLikeThis support > -- > > Key: SOLR-69 > URL: https://issues.apache.org/jira/browse/SOLR-69 > Project: Solr > Issue Type: Improvement > Components: search >Reporter: Bertrand Delacretaz >Priority: Minor > Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch, SOLR-69.patch, > SOLR-69.patch > > > Here's a patch that implements simple support of Lucene's MoreLikeThis class. > The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be > more appropriate ;-) Erik Hatcher's example mentioned in > http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html > To use it, add at least the following parameters to a standard or dismax > query: > mlt=true > mlt.fl=list,of,fields,which,define,similarity > See the MoreLikeThisHelper source code for more parameters. > Here are two URLs that work with the example config, after loading all > documents found in exampledocs in the index (just to show that it seems to > work - of course you need a larger corpus to make it interesting): > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > Results are added to the output like this: > > ... > > > > 1.5293242 > SOLR1000 > > > > > 1.5293242 > UTF8TEST > > > > I haven't tested this extensively yet, will do in the next few days. But > comments are welcome of course. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.