Re: [jira] Commented: (SOLR-20) A simple Java client for updating and searching

2007-01-25 Thread Chris Hostetter

: IMO, we should strive to be nice and not repeat keys when the
: NamedList is more of the Map variety than the List.

we should try .. but we can't garuntee .. i don't have any compelling
cases where i've needed to reuse the same name, but i've certainly written
plenty of code that puts multiple items in a list that have no name.

: > mechanism from the client code like this one -- using XML really is the
: > safest bet since it's the most expressive of all the formats we currently
: > have)
:
: JSON responses are smaller and can be quite a bit faster to parse.

i won't argue it's not faster or smaller -- just that it's not as
expressive :)

i'm guessing that if you generated enough JSON markup to be equally
expressive and safe, the size would go up considerably (but still probably
not be as big as XML) and the speed would be affected to some degree as
well -- not to mention the ease of use, accessing the new more complex
JSON structures you would have.


-Hoss



Re: [jira] Commented: (SOLR-104) Update Plugins

2007-01-25 Thread Yonik Seeley

On 1/25/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

Um... just for the record, some of these comments -- i'm not sure what
they are in reponse too :)


Someone on the lucene list had a tip about quoting in JIRA... they
composed their reply as a response to the email sent to the dev list,
then just pasted it in to JIRA.

-Yonik


Re: [jira] Commented: (SOLR-20) A simple Java client for updating and searching

2007-01-25 Thread Yonik Seeley

On 1/25/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:

: I'm using a slightly modified version of the json.org code.  It stores
: things in a LinkedHashMap (to maintain order) and formats dates
: explicitly.

Uh... watch out with that ... a LinkedHashMap is first and for most a Map,
so it doesn't support repeated keys.


I currently have some (ugly) code in the response writer that handles
repeated keys, just not nicely.

IMO, we should strive to be nice and not repeat keys when the
NamedList is more of the Map variety than the List.


(I suspect for a client API that's going to completley hide the transport
mechanism from the client code like this one -- using XML really is the
safest bet since it's the most expressive of all the formats we currently
have)


JSON responses are smaller and can be quite a bit faster to parse.

-Yonik


[jira] Commented: (SOLR-112) Hierarchical Handler Config

2007-01-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467712
 ] 

Hoss Man commented on SOLR-112:
---

I think you're dead on JJ ... any generic NamedList merging won't neccessarily 
do "the right thing" in all cases when talking about RequestHandler init args 
-- very special logic would need to be used to deal with the 
defaults/appends/invarients in a logical manner, and that logic may not be bale 
to take into account other init params that other RequestHandlers (subclasses 
of the core ones perhaps) might add.

a cleaner way to deal with this might just be to have the individual 
RequestHandlers manage this themselves -- using 
SolrCore.getRequestHandler(String) and protected methods they explicitly 
support to allow other instances to get access to their SolrParams.

ie...

  

 category
 0.01
 
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 
  

  
search/products/all

   price:[0 TO 100]
   price:[100 TO *]


  inStock:true

   

...where DisMaxRequestHandler (or most likely teh new Base class Ryan has 
written) has methods like...

   protected SolrParams getDefaults()
   protected SolrParams getInvarients()
   protected SolrParams getAppends()

...and the init method looks for an "extends" arg, if it's there fetches it 
from the SolrCore, tests it's class and casts it, then calls the methods above 
and builds up it's own SolrParams usign a combination of those and the ones 
explicitly specified in it's config.


> Hierarchical Handler Config
> ---
>
> Key: SOLR-112
> URL: https://issues.apache.org/jira/browse/SOLR-112
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
>Priority: Minor
> Fix For: 1.2
>
> Attachments: SOLR-112.patch
>
>
> From J.J. Larrea on SOLR-104
> 2. What would make this even more powerful would be the ability to "subclass" 
> (meaning refine and/or extend) request handler configs: If the requestHandler 
> element allowed an attribute extends="" and 
> chained the SolrParams, then one could do something like:
>class="solr.DisMaxRequestHandler" >
> 
>  0.01
>  
> text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
>  
>  ... much more, per the "dismax" example in the sample solrconfig.xml ...
>   
>   ... and replacing the "partitioned" example ...
>extends="search/products/all" >
> 
>   inStock:true
> 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-104) Update Plugins

2007-01-25 Thread Chris Hostetter

: I'll comment on points i have answers or questions.  The rest will go
: on the TODO list.

Um... just for the record, some of these comments -- i'm not sure what
they are in reponse too :)

: To trigger raw request reading you *must* have a parameter on the URL.
:  This was my design in response to Yonik's observation that curl puts
: "application/x-www-form-urlencoded" in the header even if it is not
: form-urlencoded encoded.
:
: As written, it does not rely on clients putting accurate headers
: (except for multipart) - it relies on a URL convention.

Yonik's point was that we need to do that when supporting *legacy*
requests ... since we are designing a new mechanism which will live at a
new URL in the example (and which clients would have to explicitly map to
the old URL in their configs) we have the freedom to be more strict in our
parsing.  we should require that people send us a Content-Type if
they want to post data to us -- and we should use that content-Type to
determine how to parse the content.

(this is one of the reasons i'm suggesting we keep the existing Servlets
arround -- that way we don't have to be worried about them so much as we
tweak the new classes)


: I only put it in there to make you happy!  I'll take it out and we can
: deal with it later if necessary.
:
: I didn't think i could get that past you!  I'll take it out and save
: the pleeding for another time.

these are the comments where i'm not sure what they are refering to.

: for a local file, you can use stream.url=file:///C:/pathtofile.txt,
: for remote ones, you use stream.url=http://...

oh yeah ... why didn't i think of that?

: We should have a good notice in the config warning people to have some
: security running before enabling streaming.

yeah ... you had me convinced of that before, but i'm leaning more
towards yonik's point now: Solr has a lot of inherient trust to anyone
that can hit it directly.  if/when we allow the list of RequestParsers to
be configurable in solrconfig.xml, then the STREAM_URL support could be
another RequestParser that they either refer to
directly, or register as a "hook" on other RequestParsers.

In the meantime though: having that option might misslead people to a
false sense of security.

: I had implemented it the normal way, BUT it broke many tests (since
: they never call init).   The better solution is to make sure the tests
: call init a standard way, but that got me into editing many files I
: don't quite understand, so i opted for lazy init.

i don't understand ... why weren't your tests calling init? ... if you
were doing everything via the TestHarness it inits the SolrCore which
inits all the requesthandlers -- if you were constructing the
RequestHandler yourlse you could just call the init(NamedList) method
directly.


BTW: something i keep forgetting to mention, is that it would be helpful
if you could setup your IDE to use 2 spaces per tab-stop, and never use
tab characters ... it'll make the patches easier to apply.

(not every Solr source file is like that right now ... but it's the goal)

-Hoss



Re: [jira] Commented: (SOLR-20) A simple Java client for updating and searching

2007-01-25 Thread Chris Hostetter

: > > * I'm using wt=JSON rather then XML. (It maps to a hash easier)

: I'm using a slightly modified version of the json.org code.  It stores
: things in a LinkedHashMap (to maintain order) and formats dates
: explicitly.

Uh... watch out with that ... a LinkedHashMap is first and for most a Map,
so it doesn't support repeated keys.

(I suspect for a client API that's going to completley hide the transport
mechanism from the client code like this one -- using XML really is the
safest bet since it's the most expressive of all the formats we currently
have)


-Hoss



[jira] Commented: (SOLR-84) New Solr logo?

2007-01-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467699
 ] 

Yonik Seeley commented on SOLR-84:
--

What, no kryptonians around here?

I like the rounded letters too.  w.r.t. the "r" comment, perhaps try making it 
the same width as the "s"?


> New Solr logo?
> --
>
> Key: SOLR-84
> URL: https://issues.apache.org/jira/browse/SOLR-84
> Project: Solr
>  Issue Type: Improvement
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: logo-solr-source-files-take2.zip, 
> solr-84-source-files.zip, solr-logo-20061214.jpg, solr-logo-20061218.JPG, 
> solr-logo-20070124.JPG, solr.jpg, solr.jpg
>
>
> Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) 
> sarraux-dessous.ch) has reworked his logo proposal to be more "solar".
> This can either be the start of a logo contest, or if people like it we could 
> adopt it. The gradients can make it a bit hard to integrate, not sure if this 
> is really a problem.
> WDYT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-107) Iterable NamedList with java5 generics

2007-01-25 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-107.
---

Resolution: Fixed

> Iterable NamedList with java5 generics
> --
>
> Key: SOLR-107
> URL: https://issues.apache.org/jira/browse/SOLR-107
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Priority: Trivial
> Attachments: IterableNamedList.patch, IterableNamedList.patch
>
>
> Iterators and generics are nice!
> this patch adds both to NamedList.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-107) Iterable NamedList with java5 generics

2007-01-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467661
 ] 

Yonik Seeley commented on SOLR-107:
---

Looks good, I just committed this.
Thanks again Ryan!

ps: if patches start in the trunk, it's easier for someone to commit it.

> Iterable NamedList with java5 generics
> --
>
> Key: SOLR-107
> URL: https://issues.apache.org/jira/browse/SOLR-107
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan McKinley
>Priority: Trivial
> Attachments: IterableNamedList.patch, IterableNamedList.patch
>
>
> Iterators and generics are nice!
> this patch adds both to NamedList.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-84) New Solr logo?

2007-01-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467662
 ] 

Hoss Man commented on SOLR-84:
--

I dig the version Ryan posted ... the rounded letters really make a huge 
differnece -- as for the colors of the sun, i think that red is a bit too ... 
red.  I think the color palate of the current Solr logo would look better.

The white in the center of the "o" makes it an obvious "o", but a less obvious 
sun ... which made me think about the "o" in previous versions.  I think a 
solid orange circle would make a good Sun/"o", and in the context of the other 
letters would be an obvious character ... it could then have a radiating 
gradient or light orange to yellow ... it seems like that owuld be a good 
balance between the two designs posted so far.

one other personal opinion: with teh crescnet, the "r" looks okay, but without 
it, we probably want to narrow it a bit -- it sticks way out there.

As for a logo for "Flare" ... i think resusing the current "Solr on fire" logo 
might be perfect :)

> New Solr logo?
> --
>
> Key: SOLR-84
> URL: https://issues.apache.org/jira/browse/SOLR-84
> Project: Solr
>  Issue Type: Improvement
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: logo-solr-source-files-take2.zip, 
> solr-84-source-files.zip, solr-logo-20061214.jpg, solr-logo-20061218.JPG, 
> solr-logo-20070124.JPG, solr.jpg, solr.jpg
>
>
> Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) 
> sarraux-dessous.ch) has reworked his logo proposal to be more "solar".
> This can either be the start of a logo contest, or if people like it we could 
> adopt it. The gradients can make it a bit hard to integrate, not sure if this 
> is really a problem.
> WDYT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: facet response

2007-01-25 Thread Chris Hostetter

: Of course we might stumble across cases where ordering isn't important,
: but multiple values with the same key is...  While not the case for
: facet counts, if it ever came up that could be handled (in a different
: json.nl variant) by serializing with all same-keyed values collected
: into an Array, at the expense of strict ordering:
:
: [["DupKey", [Val1, Val3, Val4]], ["AnotherKey", Val2]]

It comes down to a question of intent -- if the code producing the
NamedList intended multiple uses of the same key to be interpreted as a
single key refering ot a list of values, it could have created a Map where
the values of some keyws was an array.

: The SOLR NamedList is a simple analog of the element-tree part of XML
: (no attributes or mixed content).  This article gives a very thorough
: summary of the mappings between XML and JSON which can be applied to the
: NamedList->JSON issue as well:
:
:   http://www.xml.com/lpt/a/1658

Hmmm... i only skimmed this but it seems that according to their rules
of thumb, it's imposible to make either a reversable or semantically
equivalent JSON structure out of a NamedList since neither of the
following rules are garunted to be true...

  #1 all subelement names occur exactly once, or
 subelements with identical names are in sequence.

  #2 multiple homonymous subelements occur nonsequentially, and
 element order doesn't matter.

: I stumbled across a Ruby OrderedHash which looks to be an analog of NamedList:
:
: http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/20551

i don't know RUby, but the cases i imagine that doesn't cover is key's
pearing more then once, and values with no keys.


-Hoss



Re: [Solr Wiki] Update of "SchemaXml" by JürgenHermann

2007-01-25 Thread Chris Hostetter

is this addition really neccessary? ... the paragraph directly before it
just said "Individual fields can override the various options (indexed,
stored, etc...) that they inherit from their fieldtype"  ... this just
seems a bit redundant.

?

: +   Common options that fields can have are...
: +* `indexed=true|false`
: +* `stored=true|false`
: +* `compressed=true|false`
: +* `multiValued=true|false`
: +* `omitNorms=true|false`


-Hoss



Re: facet.missing?

2007-01-25 Thread Chris Hostetter

: Sorry Hoss if I came down too hard against the view that "*" should mean
: "all docs".  With the renewed clarity that getting a little sleep
: brings, I better appreciate the merits of your position.  And that it's
: dangerous trying to decide what makes sense for other people.

no worries ... i didn't take it as being harsh.

I don't actually have any opinion about what a plain q=* should do ... I
just worry that if we add zero width prefixes, and that results in q=*
meaning q=defaultSearchField:* that might confuse a lot of people who
expect it to be something else.

if 50% of the people thing q=* means one thing, and 50% think q=* means
something else, then the safest thing to do is probably to make sure
that q=* is an error.

: Obviously from a practical perspective, a MatchAllDocsQuery is quite a
: bit faster than matching all docs having some value for a particular
: field, and should be encouraged unless the latter is explicitly desired.

FYI: there is special syntax in the trunk Lucene QueryParser for
generating a MatchAllDocs, it's:  *:* ... we just don't use that version
of Lucene in Solr yet.

: Now I think there is fair agreement that it would be great if field:*
: could be made to work.  So if some portion of users want unqualified *

i actually have no opinion on that either ... i think field:[* TO *] is
just as easy to use, and less prone to confusion about what it means (ie:
you have to understand some of the syntax to know to try and use it, you
aren't likely to inadvertantely use it and think you're getting a
different result set then you really are)

: processed as the equivalent of :*, then perhaps there
: should be a configuration directive which controls whether unqualified *
: (or perhaps any defined string) is trapped?  I haven't come across
: existing SOLR code to handle "*" as a special case of PrefixQuery, so I
: assume the trapping should be at the level of SolrQueryParser, similar
: to how [* TO *] is trapped for range queries?

there is a QueryParser.setAllowLeadingWildcard method which we could set
based on a schema value (much like we do for the defaultOperator) ... i
think it would be perfectly fine making that an option which defaults to
false and the result of enabling it being that q=* uses the
defaultSearchField ... at least then people need to consider the issue and
turn it on first.



-Hoss



Re: [jira] Commented: (SOLR-104) Update Plugins

2007-01-25 Thread Yonik Seeley

On 1/24/07, Ryan McKinley (JIRA) <[EMAIL PROTECTED]> wrote:

Ryan McKinley commented on SOLR-104:
the 2MB limit is set in solrconfig.xml

  

maybe 2MB is too small as the default, but i figured it shoudl be configurable.


Yeah, but I was lacking knowledge about the reasons behind a limit.
(potential denial of service attacks if this is exposed to the
outside, keeping people from accidentally hurting themselves somehow?)

-Yonik


[jira] Resolved: (SOLR-117) constrain field faceting to a prefix

2007-01-25 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-117.
---

Resolution: Fixed

committed.

> constrain field faceting to a prefix
> 
>
> Key: SOLR-117
> URL: https://issues.apache.org/jira/browse/SOLR-117
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Reporter: Yonik Seeley
> Attachments: facet_prefix.patch, facet_prefix.patch
>
>
> Useful for faceting as someone is typing, autocompletion, etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: facet response

2007-01-25 Thread J.J. Larrea
Some peanut-gallery comments (apologies if they repeat ideas already discussed, 
I haven't read the full thread):

>> > Chris Hostetter <[EMAIL PROTECTED]> wrote:
 as i said, i'd rather invert the use case set to find where "ordering
>> >> isn't important" and change those to Maps

It makes sense to use NamedLists only where they are truly needed so data that 
can be serialized to JSON as a hash always is, and doesn't get affected by 
json.nl.

Of course we might stumble across cases where ordering isn't important, but 
multiple values with the same key is...  While not the case for facet counts, 
if it ever came up that could be handled (in a different json.nl variant) by 
serializing with all same-keyed values collected into an Array, at the expense 
of strict ordering:

[["DupKey", [Val1, Val3, Val4]], ["AnotherKey", Val2]]

>On 1/24/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:
>>Faceting is the only thing I've come upon.  After playing with this
>>more and contemplating all the messages on this thread, I can't say
>>that it's "broken", but telling solr to sort things and then when
>>pulling them back out on the other end in seemingly random order it
>>sure feels that way.  Re-sorting on the client is the easiest
>>solution and I've gone that route for now.

I agree that having to do client-side sorting of a pre-sorted dataset is 
horrible.  But the problem has nothing to do with SOLR, it is due to the 
limitations of JavaScript and thus JSON.

The SOLR NamedList is a simple analog of the element-tree part of XML (no 
attributes or mixed content).  This article gives a very thorough summary of 
the mappings between XML and JSON which can be applied to the NamedList->JSON 
issue as well:

  http://www.xml.com/lpt/a/1658

>>Having the facet_counts area output as an ordered list in all cases
>>seems the most sensible to me, since it is unlikely that the facets
>>would be accessed by key.

Agreed.

In JavaScript-land one can use
array[array.indexOf(key,fromIndex)][1]
to extract values by name, if so desired.

I stumbled across a Ruby OrderedHash which looks to be an analog of NamedList:

http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/20551

I imagine it would be fairly simple to extend that OrderedHash to construct 
from a JSON-read array-of-pairs, especially since faceting should not emit 
duplicate keys.

Yonik wrote:
>I think a nice compromise between
>efficiency and human readability might be to just alternate key,val in
>the array:
>['foo',10,'bar',20]
>That would even allow representing a null key as a null in the array.

I'm not sure there is an advantage to the current json.nl=arrarr, but it 
doesn't hurt to allow that to be specified as an alternate non-default format.

>But I'm leaning on keeping the current format for XML for both
>slightly better readability, and backward compatibility.
> 10 20 
>As opposed to:
> foo 10 bar 20 

I agree re: not dumbing-down the XML serialization to match the 
lowest-common-denominator.

- J.J.


Re: facet response

2007-01-25 Thread Yonik Seeley

On 1/24/07, Erik Hatcher <[EMAIL PROTECTED]> wrote:

On Jan 22, 2007, at 6:14 PM, Yonik Seeley wrote:

> Chris Hostetter <[EMAIL PROTECTED]> wrote:
>> as i said, i'd rather invert the use case set to find where "ordering
>> isn't important" and change those to Maps
>
> That might be a *lot* of changes...
> What's currently broken, just faceting or anything else?

Faceting is the only thing I've come upon.  After playing with this
more and contemplating all the messages on this thread, I can't say
that it's "broken", but telling solr to sort things and then when
pulling them back out on the other end in seemingly random order it
sure feels that way.  Re-sorting on the client is the easiest
solution and I've gone that route for now.

I plan on digging into the JSON option a bit and seeing if order is
preserved, though I doubt it would be any difference since it will
surely parse back into a Hash by default.  Though the json.nl.arr=arr
would surely preserve order, though that changes the access to things
all over the place on the client.

Having the facet_counts area output as an ordered list in all cases
seems the most sensible to me, since it is unlikely that the facets
would be accessed by key.


For JSON and friends, I agree.  I think a nice compromise between
efficiency and human readability might be to just alternate key,val in
the array:
['foo',10,'bar',20]
That would even allow representing a null key as a null in the array.

But I'm leaning on keeping the current format for XML for both
slightly better readability, and backward compatibility.
 10 20 
As opposed to:
 foo 10 bar 20 

-Yonik


[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support

2007-01-25 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467399
 ] 

Yonik Seeley commented on SOLR-69:
--

> MoreLikeThis queries should work irrelevant of whether fields are stored or 
> not, as it's based on what's indexed

I haven't looked at the lucene-code for more-like-this, but it's just like 
highlighting... to get the tokens for a specific document, you need to either 
get it's stored field and re-analyze or store term vectors and use them.
Looking up those terms in other documents is then fast (that's where the 
inverted index comes in)


> PATCH:MoreLikeThis support
> --
>
> Key: SOLR-69
> URL: https://issues.apache.org/jira/browse/SOLR-69
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch, SOLR-69.patch, 
> SOLR-69.patch
>
>
> Here's a patch that implements simple support of Lucene's MoreLikeThis class.
> The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
> more appropriate ;-) Erik Hatcher's example mentioned in 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html
> To use it, add at least the following parameters to a standard or dismax 
> query:
>   mlt=true
>   mlt.fl=list,of,fields,which,define,similarity
> See the MoreLikeThisHelper source code for more parameters.
> Here are two URLs that work with the example config, after loading all 
> documents found in exampledocs in the index (just to show that it seems to 
> work - of course you need a larger corpus to make it interesting):
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> Results are added to the output like this:
> 
>   ...
>   
> 
>   
> 1.5293242
> SOLR1000
>   
> 
> 
>   
> 1.5293242
> UTF8TEST
>   
> 
>   
> I haven't tested this extensively yet, will do in the next few days. But 
> comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Stop words and phrases

2007-01-25 Thread Yonik Seeley

On 1/25/07, Manuel Albela Miranda <[EMAIL PROTECTED]> wrote:

Does anybody know how to ignore stopwords when searching with a phrase?
I've been searching for information about this but find nothing. The
thing is, i want to use stopwords when searching: /field:this is a house
/and not to use them when searching like: /field:"this is a house"/.


The easiest way might be to index the field twice, once with the stop
filter and once without.
See copyField in the schema for an easy way to copy one field to
another when indexing.

-Yonik


RE: Stop words and phrases

2007-01-25 Thread Cook, Jeryl
You have to use the pass a set of stop words(Strings) as "java.util.Set"
in the constructor of the StandardAnalyzer(default)...



Jeryl Cook
-Original Message-
From: Manuel Albela Miranda [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 25, 2007 5:00 AM
To: solr-dev@lucene.apache.org
Subject: Stop words and phrases

Hello everybody,

Does anybody know how to ignore stopwords when searching with a phrase? 
I've been searching for information about this but find nothing. The 
thing is, i want to use stopwords when searching: /field:this is a house

/and not to use them when searching like: /field:"this is a house"/.

Hope you can help me.

Thank you!

Regards.

Manu


[jira] Commented: (SOLR-84) New Solr logo?

2007-01-25 Thread Clay Webster (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467388
 ] 

Clay Webster commented on SOLR-84:
--

The rounded characters are nice.  The sun's red color and open middle aren't so 
nice IMHO.

Erik: perhaps an animated gif that fires out from the Solr's "o" and engulfs 
our planet?  ;-)

> New Solr logo?
> --
>
> Key: SOLR-84
> URL: https://issues.apache.org/jira/browse/SOLR-84
> Project: Solr
>  Issue Type: Improvement
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: logo-solr-source-files-take2.zip, 
> solr-84-source-files.zip, solr-logo-20061214.jpg, solr-logo-20061218.JPG, 
> solr-logo-20070124.JPG, solr.jpg, solr.jpg
>
>
> Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) 
> sarraux-dessous.ch) has reworked his logo proposal to be more "solar".
> This can either be the start of a logo contest, or if people like it we could 
> adopt it. The gradients can make it a bit hard to integrate, not sure if this 
> is really a problem.
> WDYT?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-104) Update Plugins

2007-01-25 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467330
 ] 

Ryan McKinley commented on SOLR-104:



Thanks for going through this!

I'll comment on points i have answers or questions.  The rest will go
on the TODO list.


Ok, so we should make sure to put the charset into
ContentStream.getContentType() and open the Reader with:

  String charset = getCharset( stream.getContentType() );
  new InputStreamReader( stream.getStream(),  charset );



Sounds reasonable.  I took them out because (at the time) it seemed
clearer and has less duplicated code.



yes.  At some point it would also be good to make a stronger name
distinction between UpdateHandler (the thing that handles the nity
gritty lucene indexing) and the UpdateRequestHandler -- but lets save
that for another day!



As written, the StandardRequestParser:
1) checks if multipart
2) checks if it has parameters in the URL (?xxx=yyy)
  if it has parameters (?xxx=yyy) then use the RawRequestParser
  otherwise it pulls parameters from the map. (SimpleRequestParser)

To trigger raw request reading you *must* have a parameter on the URL.
 This was my design in response to Yonik's observation that curl puts
"application/x-www-form-urlencoded" in the header even if it is not
form-urlencoded encoded.

As written, it does not rely on clients putting accurate headers
(except for multipart) - it relies on a URL convention.



I only put it in there to make you happy!  I'll take it out and we can
deal with it later if necessary.



I didn't think i could get that past you!  I'll take it out and save
the pleeding for another time.



for a local file, you can use stream.url=file:///C:/pathtofile.txt,
for remote ones, you use stream.url=http://...

We should have a good notice in the config warning people to have some
security running before enabling streaming.



I had implemented it the normal way, BUT it broke many tests (since
they never call init).   The better solution is to make sure the tests
call init a standard way, but that got me into editing many files I
don't quite understand, so i opted for lazy init.


That sounds fine.  Since it is a tenative private interface, i was not
too worried about it.


> Update Plugins
> --
>
> Key: SOLR-104
> URL: https://issues.apache.org/jira/browse/SOLR-104
> Project: Solr
>  Issue Type: Improvement
>  Components: update
>Affects Versions: 1.2
>Reporter: Ryan McKinley
> Fix For: 1.2
>
> Attachments: commons-fileupload-20070107.jar, commons-io-1.2.jar, 
> DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, 
> DispatchFilter.patch, DispatchFilter.patch, DispatchFilter.patch, 
> HandlerRefactoring-DRAFT-SRC.zip, HandlerRefactoring-DRAFT-SRC.zip, 
> HandlerRefactoring.DRAFT.patch, HandlerRefactoring.DRAFT.patch, 
> HandlerRefactoring.DRAFT.zip
>
>
> The plugin framework should work for 'update' actions in addition to 'search' 
> actions.
> For more discussion on this, see:
> http://www.nabble.com/Re%3A-Handling-disparate-data-sources-in-Solr-tf2918621.html#a8305828

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Stop words and phrases

2007-01-25 Thread Manuel Albela Miranda

Hello everybody,

Does anybody know how to ignore stopwords when searching with a phrase? 
I've been searching for information about this but find nothing. The 
thing is, i want to use stopwords when searching: /field:this is a house 
/and not to use them when searching like: /field:"this is a house"/.


Hope you can help me.

Thank you!

Regards.

Manu


[jira] Commented: (SOLR-104) Update Plugins

2007-01-25 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467305
 ] 

Hoss Man commented on SOLR-104:
---

Woot! ... i think we're really close to comiting this. 

I made a hodgepodge list of comments as i read through everything, and then 
tried to organize them.  I agree with yonik that we should feel free to commit 
new functionality without being afraid of needing to change the api of that 
functionality befor the next release, but i'm not 100% comfortable with how 
backwards compatible this patch is for the existing /select and /update URLs 
... this may just be an issue of me being paranoid (and tired) but there's at 
least one code path difference.

Anyway, here are my notes...





Comments regarding backwards compatibility of the patch...

 - SolrCore.update(Reader,Writer) was a public method that's been 
   removed ... this is probably fine, just pointing it out for the 
   record.
 - SolrUpdateServlet used HttpServletRequest.getReader, the new
   UpdateRequestHandler uses an InputStreamReader arround
   HttpServletRequest.getInputStream() ... this seems bad for legacy
   update support from a char encoding standpoint.
 - While i think it's important to refactor the XML Update 
   parsing out of SolrCore - I'm still not clear what is gained by 
   eliminating SolrServlet and SolrUpdate.  The big advantage of
   the new dispatcher being a Filter is that it can pass requests on
   that it doesn't want to deal with, so why not leave the existing
   servlets arround with only the minimum neccessary changes...
- move SolrCore's init to Dispatcher
- use 3 arg core.execute in SolrServlet
- have SolrUpdateServlet call UpdateRequestHandler.update(Reader)
  and generate the legacy response XML
   ...in order to reduce the possibility of an introducing bugs
   (particularly since the existing Servlets are the one area where we
   don't have *any* unit tests)

Comments regarding functionality that i think we *may* want to address
before commiting (but i won't fight over if i'm the only one that cares)...

 - UpdateRequestHandler should probably renamed XmlUpdateRequestHandler
   (particularly since i expect Yonik to commit a
   CsvUpdateRequestHandler real soon now) 
 - StandardRequestParser can't assume that a POST which isn't
   multipart/* should be handled by a RawRequestParser ... if the
   content type is "application/x-www-form-urlencoded" then
   SimpleRequestParser should be used (so all params from query string
   and body are included)
 - What should the expectations of
   ContentStream.getInputStream().close() be? Should the Dispatcher
   iterate over any Iterable streams when writing the output and try
   to close them, ignoring any Exceptions?
 - I'm really not fond of having SolrParams.STREAM_TYPE. Can we please, 
   please leave it out for now and rely on on content-type detection?
   We can add it back in if/when we make RequestParser a public
   interface and let people register them in solrconfig.
 - I really don't think we want to open the pandoras box of putting 
   the HttpServletRequest in the SolrQueryRequest ... i'd hate to put
   that in and then have to support it forever.

Things in the current patch that aren't strictly neccessary
for the current issue and can (should?) be commited seperately...

 - are we definitely deprecating SolrQueryResponse.getException ?
 - StandardRequestHandler and DisMaxRequestHandler have only been
   changed to subclass the new base class.
 - only whitespace changes in SolrRequestHandler.java
 - SolrServletRequest has only imports rearranged

Things which definitely shouldn't block up the patch, but should go on
a short term todo list...

 - see backwards compatibility comment about (Xml)UpdateRequestHandler 
   using InputStreamReader without specifying a charset ... in general
   the handler should look at the ContentStream's content type to determine
   the encoding of the InputStream (and probably default to UTF-8)
 - need to work out what kind of NamedList should be returned by
   (Xml)UpdateRequestHandler.update(Reader)
 - some of the new files are missing the Apache boilerplate.
 - a use case we talked about that still isn't covered is opening local
   files as a stream ... this should be easy to add later right along 
   side STREAM_URL.
 - we should fill in the getURL methods for DisMax/Standard to point at wiki
 - CommitRequestHandler should use UpdateParams.OPTIMIZE
 - the init semantics for (Xml)UpdateRequestHandler are odd: as a
   RequestHandler it's garunteed that init(NamedList) will be called, but
   instead it uses it's own private init() that's called lazily.
 - DumpRequestHandler should dump ContentStream.getSize().
 - doFilter should call parsers.parse( path, req ) as soon as it has 
   the path, and then delegate to a helper method that doesn't have 
   access to the HttpServletRequest ... this reduces both the

[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support

2007-01-25 Thread mrball (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12467282
 ] 

mrball commented on SOLR-69:


Yep, doesn't seem to work with non-stored fields. (if you only use non stored 
fields in mlt.fl)

I believe the stored field values are used to build the query

> PATCH:MoreLikeThis support
> --
>
> Key: SOLR-69
> URL: https://issues.apache.org/jira/browse/SOLR-69
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Reporter: Bertrand Delacretaz
>Priority: Minor
> Attachments: lucene-queries-2.0.0.jar, SOLR-69.patch, SOLR-69.patch, 
> SOLR-69.patch
>
>
> Here's a patch that implements simple support of Lucene's MoreLikeThis class.
> The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
> more appropriate ;-) Erik Hatcher's example mentioned in 
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00878.html
> To use it, add at least the following parameters to a standard or dismax 
> query:
>   mlt=true
>   mlt.fl=list,of,fields,which,define,similarity
> See the MoreLikeThisHelper source code for more parameters.
> Here are two URLs that work with the example config, after loading all 
> documents found in exampledocs in the index (just to show that it seems to 
> work - of course you need a larger corpus to make it interesting):
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> Results are added to the output like this:
> 
>   ...
>   
> 
>   
> 1.5293242
> SOLR1000
>   
> 
> 
>   
> 1.5293242
> UTF8TEST
>   
> 
>   
> I haven't tested this extensively yet, will do in the next few days. But 
> comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.