RE: facet sort by ranking

2008-11-23 Thread Amit
Hi,

We having 100 category and each category having it own internal ranking.
Let consider if I search for any product and its fall under 30 categories
and we are showing top 10 categories in filter so that user can filter there
results.

Let consider hypothetical example(as we don't have correct data and we are
under testing solr features):
Categories values and internal ranking:
Cat1
- 1
Cat2
- 2
Cat3
- 3
Cat4
- 4
Cat5
- 5
Cat6
- 6
Cat7
- 7
Cat8
- 8
Cat9
- 9

Cat10 - 10

Cat11 - 11

Cat12 - 12

Cat13 - 13

Cat14 - 14

Cat15 - 15  
If I search for product it will return result:
   Category
count(as sort by count)
Cat2
- 20
Cat3
- 17
Cat4
- 15
Cat1
- 14
Cat7
- 13
Cat8
- 12
Cat9
- 10

Cat15 - 9

Cat13 - 8

Cat10 - 7   

Cat11 - 6

Cat12 - 5
Now we want show only top 10 values so we will miss: Cat11 and Cat12 as it
sort by count not by its ranking

We would like result below :

Cat15
  Cat13


Cat12 

Cat11 

Cat10 
Cat9

Cat8

Cat7

Cat4

Cat3

Cat2

Cat1

Hope this will convey what we want

Have great day .:)

Thanks and Regards,
Amit


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley
Sent: 22 November 2008 22:51
To: solr-user@lucene.apache.org
Subject: Re: facet sort by ranking

On Sat, Nov 22, 2008 at 12:05 PM, Amit <[EMAIL PROTECTED]> wrote:
> Actually we have some ranking associated to field on which we are faceting
> and we want to show only top 10 facet value now which is sort by count but
> we want to sort by it ranking.

I think you're going to have to give some concrete examples of what
your documents look like, and what results you want back.

-Yonik

No virus found in this incoming message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.9.9/1804 - Release Date: 21-11-2008
18:24
 

No virus found in this outgoing message.
Checked by AVG. 
Version: 7.5.549 / Virus Database: 270.9.9/1804 - Release Date: 21-11-2008
18:24
 



Query for Distributed search -

2008-11-23 Thread souravm
Hi,

Looking for some insight on distributed search.

Say I have an index distributed in 3 boxes and the index contains time and text 
data (typical log file). Each box has index for different timeline - say Box 1 
for all Jan to April, Box 2 for May to August and Box 3 for Sep to Dec.

Now if I try to search for a text string, will the search would happen in 
parallel in all 3 boxes or sequentially?

Regards,
Sourav

 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely 
for the use of the addressee(s). If you are not the intended recipient, please 
notify the sender by e-mail and delete the original message. Further, you are 
not 
to copy, disclose, or distribute this e-mail or its contents to any other 
person and 
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken 
every reasonable precaution to minimize this risk, but is not liable for any 
damage 
you may sustain as a result of any virus in this e-mail. You should carry out 
your 
own virus checks before opening the e-mail or attachment. Infosys reserves the 
right to monitor and review the content of all messages sent to or from this 
e-mail 
address. Messages sent to or from this e-mail address may be stored on the 
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Re: Please Help !! Question about Query Phrase Slop (qs) in dismax

2008-11-23 Thread Yonik Seeley
If you boost the phrase queries by enough, you could tell when you hit
the less relevant documents by the score.

-Yonik

On Mon, Nov 24, 2008 at 12:07 AM, anuvenk <[EMAIL PROTECTED]> wrote:
>
> Thanks for the response. Well my current ps setting works great for most
> search terms. But say this typical example, north dakota 1031 exchange
> lawyers - we don't have any relevant docs in the index. Solr is returning
> the irrelevant doc, just because it found 'lawyer', exchange, north & dakota
> somewhere. I thought if there is a way to just not return any results if
> they are not within close proximity, it would be great.
>
> Yonik Seeley wrote:
>>
>> On Sun, Nov 23, 2008 at 11:51 PM, anuvenk <[EMAIL PROTECTED]>
>> wrote:
>>> Please help someone...i've been waiting for an answer for the last couple
>>> of
>>> days & no one seems to be helping out here. I did search the wiki & this
>>> forum for an answer. But couldn't find an answer. I know if ps is set to
>>> 5
>>> words within 5 words of one another receive a boost in score. But is
>>> there a
>>> way to not return results that have the words in search terms more than 5
>>> words apart. ?
>>
>> Not with dismax.  I'm not sure why it's a problem, given that with
>> enough boost you should be able to ensure that all of the results with
>> a slop less than 5 appear before other results.
>> Anyway, if you want to restrict results to those with a slop of 5, use
>> the standard query parser with an explicit sloppy phrase query:
>>
>> "north dakota 1031 exchange lawyers"~5
>>
>> -Yonik
>>
>>
>>> Typical example: north dakota 1031 exchange lawyers
>>> My first result is absolutely ir-relevant. It returned a north dakota doc
>>> though but had an occurrence of attorney somewhere & an occurrence of
>>> exchange (not related to 1031 exchange though). They were not within 5
>>> words
>>> of one another. My guys have been hammering me reg this relevancy issue.
>>> Please help someone.
>>>
>>> anuvenk wrote:

 From the solr wiki, it sounded like if qs is set to 5 for example, & if
 the search term is 'child custody', only docs with 'child' & 'custody'
 within 5 words of one another would be returned in results. Is this
 correct? If so, it doesn't seem to be working for me. I see docs with
 'child' & 'custody' more than 5 words of one another (excluding stop
 words) which is resulting in bad user experience as those docs are not
 so
 relevant. What more could i do to improve quality in the results?

>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20654906.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20655014.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Please Help !! Question about Query Phrase Slop (qs) in dismax

2008-11-23 Thread anuvenk

Thanks for the response. Well my current ps setting works great for most
search terms. But say this typical example, north dakota 1031 exchange
lawyers - we don't have any relevant docs in the index. Solr is returning
the irrelevant doc, just because it found 'lawyer', exchange, north & dakota
somewhere. I thought if there is a way to just not return any results if
they are not within close proximity, it would be great. 

Yonik Seeley wrote:
> 
> On Sun, Nov 23, 2008 at 11:51 PM, anuvenk <[EMAIL PROTECTED]>
> wrote:
>> Please help someone...i've been waiting for an answer for the last couple
>> of
>> days & no one seems to be helping out here. I did search the wiki & this
>> forum for an answer. But couldn't find an answer. I know if ps is set to
>> 5
>> words within 5 words of one another receive a boost in score. But is
>> there a
>> way to not return results that have the words in search terms more than 5
>> words apart. ?
> 
> Not with dismax.  I'm not sure why it's a problem, given that with
> enough boost you should be able to ensure that all of the results with
> a slop less than 5 appear before other results.
> Anyway, if you want to restrict results to those with a slop of 5, use
> the standard query parser with an explicit sloppy phrase query:
> 
> "north dakota 1031 exchange lawyers"~5
> 
> -Yonik
> 
> 
>> Typical example: north dakota 1031 exchange lawyers
>> My first result is absolutely ir-relevant. It returned a north dakota doc
>> though but had an occurrence of attorney somewhere & an occurrence of
>> exchange (not related to 1031 exchange though). They were not within 5
>> words
>> of one another. My guys have been hammering me reg this relevancy issue.
>> Please help someone.
>>
>> anuvenk wrote:
>>>
>>> From the solr wiki, it sounded like if qs is set to 5 for example, & if
>>> the search term is 'child custody', only docs with 'child' & 'custody'
>>> within 5 words of one another would be returned in results. Is this
>>> correct? If so, it doesn't seem to be working for me. I see docs with
>>> 'child' & 'custody' more than 5 words of one another (excluding stop
>>> words) which is resulting in bad user experience as those docs are not
>>> so
>>> relevant. What more could i do to improve quality in the results?
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20654906.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20655014.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Please Help !! Question about Query Phrase Slop (qs) in dismax

2008-11-23 Thread Yonik Seeley
On Sun, Nov 23, 2008 at 11:51 PM, anuvenk <[EMAIL PROTECTED]> wrote:
> Please help someone...i've been waiting for an answer for the last couple of
> days & no one seems to be helping out here. I did search the wiki & this
> forum for an answer. But couldn't find an answer. I know if ps is set to 5
> words within 5 words of one another receive a boost in score. But is there a
> way to not return results that have the words in search terms more than 5
> words apart. ?

Not with dismax.  I'm not sure why it's a problem, given that with
enough boost you should be able to ensure that all of the results with
a slop less than 5 appear before other results.
Anyway, if you want to restrict results to those with a slop of 5, use
the standard query parser with an explicit sloppy phrase query:

"north dakota 1031 exchange lawyers"~5

-Yonik


> Typical example: north dakota 1031 exchange lawyers
> My first result is absolutely ir-relevant. It returned a north dakota doc
> though but had an occurrence of attorney somewhere & an occurrence of
> exchange (not related to 1031 exchange though). They were not within 5 words
> of one another. My guys have been hammering me reg this relevancy issue.
> Please help someone.
>
> anuvenk wrote:
>>
>> From the solr wiki, it sounded like if qs is set to 5 for example, & if
>> the search term is 'child custody', only docs with 'child' & 'custody'
>> within 5 words of one another would be returned in results. Is this
>> correct? If so, it doesn't seem to be working for me. I see docs with
>> 'child' & 'custody' more than 5 words of one another (excluding stop
>> words) which is resulting in bad user experience as those docs are not so
>> relevant. What more could i do to improve quality in the results?
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20654906.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Please Help !! Question about Query Phrase Slop (qs) in dismax

2008-11-23 Thread anuvenk

Please help someone...i've been waiting for an answer for the last couple of
days & no one seems to be helping out here. I did search the wiki & this
forum for an answer. But couldn't find an answer. I know if ps is set to 5
words within 5 words of one another receive a boost in score. But is there a
way to not return results that have the words in search terms more than 5
words apart. ?
Typical example: north dakota 1031 exchange lawyers
My first result is absolutely ir-relevant. It returned a north dakota doc
though but had an occurrence of attorney somewhere & an occurrence of
exchange (not related to 1031 exchange though). They were not within 5 words
of one another. My guys have been hammering me reg this relevancy issue.
Please help someone.

anuvenk wrote:
> 
> From the solr wiki, it sounded like if qs is set to 5 for example, & if
> the search term is 'child custody', only docs with 'child' & 'custody'
> within 5 words of one another would be returned in results. Is this
> correct? If so, it doesn't seem to be working for me. I see docs with
> 'child' & 'custody' more than 5 words of one another (excluding stop
> words) which is resulting in bad user experience as those docs are not so
> relevant. What more could i do to improve quality in the results?
> 

-- 
View this message in context: 
http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20654906.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Newbie Question - getting search results from dataimport request handler

2008-11-23 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Mon, Nov 24, 2008 at 7:25 AM, Chris Hostetter
<[EMAIL PROTECTED]> wrote:
>
> : > Logging an error and returning successfully (without adding any docs) is
> : > still inconsistent with the way all other RequestHandlers work: fail the
> : > request.
> : >
> : > I know DIH isn't a typical RequestHandler, but some things (like failing
> : > on failure) seem like they should be a given.
> : SOLR-842 .
> : DIH is an ETL tool pretending to be a RequestHandler. Originally it
> : was built to run outside of Solr using SolrJ. For better integration
> : and ease of use we changed it later.
> :
> : SOLR-853 aims to achieve the oroginal goal
> :
> : The goal of DIH is to become a full featured ETL tool.
>
> Understood ... but shouldn't ETL Tools "fail on failure" ?
>
> I mean forget Solr for a minute:   If i've got a standalone ETL Tool that
> runs as a daemon, and on startup it logs some error messages because i've
> got bad configs (and it can tell the fields i've listed for my
> 'target' system don't exist there) should it report "success" everytime i
> push data to it?
>
> Based on this thread, that's what it sounds like DIH is doing right now in
> situations like this.
>
> If nothing else, we could give DIH a way to check the global
>  value from solrconfig.xml and make it's
> decisison that way
We considered these. The severity of errors are very much specific to
the source of data. It is very unlikely that a DB source throws up
errors. In xml data sources say out of x urls 1 or two are wrong,
would the user wish to ignore or want to abort the entire import.


So we decided to give more options and the implementations are left to
the EntityProcessor. Moreover the default is set to onError=abort


>
>
>
> -Hoss
>
>



-- 
--Noble Paul


Re: Using Solr for indexing emails

2008-11-23 Thread Norberto Meijome
On Sun, 23 Nov 2008 16:02:16 +0200
Timo Sirainen <[EMAIL PROTECTED]> wrote:

> Hi,

Hi Timo,

> 
[...]

> The main problem is that before doing the search, I first have to check
> if there are any unindexed messages and then add them to Solr. This is
> done using a query like:
>  - fl=uid
>  - rows=1
>  - sort=uid desc
>  - q=uidv: box: user:

So, if I understand correctly, the process is :

1. user sends search query Q to search interface
2. interface checks highest indexed uidv in SOLR
3. checks in IMAP store for mailbox if there are any objects ('emails') newer
than uidv from 2.
4. anything found in 3. is processed, submitted to SOLR, committed.
5. interface submits search query Q to index, gets results
6. results are presented / returned to user

It strikes me that this may work ok in some situations but may not scale. I
would decouple the {find new documents / submit / commit } process from the
{ search / presentation} layer - SPECIALLY if you plan to have several
mailboxes in play now.

> So it returns the highest IMAP UID field (which is an always-ascending
> integer) for the given mailbox (you can ignore the uidvalidity). I can
> then add all messages with higher UIDs to Solr before doing the actual
> search.
> 
> When searching multiple mailboxes the above query would have to be sent
> to every mailbox separately. 

hmm...not sure what you mean by "query would have to be sent to every
MAILBOX" ... 

> That really doesn't seem like the best
> solution, especially when there are a lot of mailboxes. But I don't
> think Solr has a way to return "highest uid field for each
> box:"?

hmmm... maybe you can use facets on 'box' ... ? though you'd still have to
query for each box, i think...

> Is that above query even efficient for a single mailbox? 

i don't think so.

>I did consider
> using separate documents for storing the highest UID for each mailbox,
> but that causes annoying desynchronization possibilities. Especially
> because currently I can just keep sending documents to Solr without
> locking and let it drop duplicates automatically (should be rare). With
> per-mailbox highest-uid documents I can't really see a way to do this
> without locking or allowing duplicate fields to be added and later some
> garbage collection deleting all but the one highest value (annoyingly
> complex).

I have a feeling the issues arise from serialising the whole process (as I
described above... ). It makes more sense (to me)  to implement something
similar to DIH, where you load data as needed (even a 'delta query', which
would only return new data... I am not sure whether you could use DIH ( RSS
feed from IMAP store? )

> I could of course also keep track of what's indexed on Dovecot's side,
> but that could also lead to desynchronization issues and I'd like to
> avoid them.
> 
> I guess the ideal solution would be if it was somehow possible to create
> a SQL-like trigger that updates the per-mailbox highest-uid document
> whenever adding a new document with a higher UID value.

I am not sure how much effort you want to put into this...but I would think
that writing a lean app that periodically (for a period that makes sense for
your hardware and user's expectation... 5 minutes? 10?  1? ) crawls the IMAP
stores for UID, processes them and submits to SOLR, and keeps its own state
( dbm or sqlite ) may be a more flexible approach. Or, if dovecot support this,
a 'plugin / hook ' that sends a msg to your indexing app everytime a new
document is created.

I am interested to hear what you decide to go with, and why.

cheers,
B

_
{Beto|Norberto|Numard} Meijome

"All parts should go together without forcing. You must remember that the parts
you are reassembling were disassembled by you. Therefore, if you can't get them
together again, there must be a reason. By all means, do not use hammer." IBM
maintenance manual, 1975

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.


Re: Wait Flush, Wait Searcher and commit Scenarios

2008-11-23 Thread Yonik Seeley
On Tue, Nov 18, 2008 at 10:55 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> Does waitFlush do anything now? I only see it being set if eclipse is not
> missing a reference...

Not currently.  The idea was that if waitFlush== false that the call
would be totally asynchronous and return immediately.  If
waitFlush==true, then the call would return only after everything was
flushed to stable storage (which is always the case now).

-Yonik

p.s. late replies since I'm getting back from a week of travel.


RE: [VOTE] Community Logo Preferences

2008-11-23 Thread Vinu Kumar
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12394353/solr.s5.jpg
https://issues.apache.org/jira/secure/attachment/12394265/apache_solr_b_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394167/solrlogo.jpg
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png

- Vinu


-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Sunday, November 23, 2008 10:30 PM
To: solr-user@lucene.apache.org
Subject: [VOTE] Community Logo Preferences

Please submit your preferences for the solr logo.

For full voting details, see:
   http://wiki.apache.org/solr/LogoContest#Voting

The eligible logos are:
   http://people.apache.org/~ryan/solr-logo-options.html

Any and all members of the Solr community are encouraged to reply to
this thread and list (up to) 5 ranked choices by listing the Jira
attachment URLs. Votes will be assigned a point value based on rank.
For each vote, 1st choice has a point value of 5, 5th place has a
point value of 1, and all others follow a similar pattern.

https://issues.apache.org/jira/secure/attachment/12345/yourfrstchoice.jpg
https://issues.apache.org/jira/secure/attachment/34567/yoursecondchoice.jpg
...

This poll will be open until Wednesday November 26th, 2008 @ 11:59PM GMT

When the poll is complete, the solr committers will tally the
community preferences and take a final vote on the logo.

A big thanks to everyone would submitted possible logos -- its great
to see so many good options.


Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Nick Jenkin
https://issues.apache.org/jira/secure/attachment/12394366/solr3_maho.png
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12392306/apache_solr_sun.png
https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg

Good work to all the people who contributed.
-Nick

On Mon, Nov 24, 2008 at 3:06 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote:
> On Sun, 23 Nov 2008 11:59:50 -0500
> Ryan McKinley <[EMAIL PROTECTED]> wrote:
>
>> Please submit your preferences for the solr logo.
>
> https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
> https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg
> https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
> https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
> https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
>
> thanks!!
> B
>
> _
> {Beto|Norberto|Numard} Meijome
>
> "Tell a person you're the Metatron and they stare at you blankly. Mention 
> something out of a Charleton Heston movie and suddenly everyone's a Theology 
> scholar!"
>   Dogma
>
> I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
> Reading disclaimers makes you go blind. Writing them is worse. You have been 
> Warned.
>


Re: How can i protect the SOLR Cores?

2008-11-23 Thread Chris Hostetter

: 1) modify web.xml (part of the sources of solr.war, which you'll have to 
: rebuild)  to define the authentication constraints you want.

for many servlet containers, this isn't neccessary.  Jetty cor example 
also lets you define security realms in the jetty.xml (there's an example 
of this commented out in the example jetty.xml)



-Hoss



Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Norberto Meijome
On Sun, 23 Nov 2008 11:59:50 -0500
Ryan McKinley <[EMAIL PROTECTED]> wrote:

> Please submit your preferences for the solr logo.

https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg

thanks!!
B

_
{Beto|Norberto|Numard} Meijome

"Tell a person you're the Metatron and they stare at you blankly. Mention 
something out of a Charleton Heston movie and suddenly everyone's a Theology 
scholar!"
   Dogma

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: WordDelimeterFilter and its Factory: access to charTypeTable

2008-11-23 Thread Chris Hostetter

: I was wondering if it is possible to access and modify the charTypeTable
: of the WordDelimeterFilter. 

FWIW: WordDelimeterFilter has a static package protected 
defaultWordDelimTable but there is no need to modify it -- you can 
pass your own charTypeTable directly to the WordDelimeterFilter 
constructor ... this might mean writing your own Factory, but you don't 
need to muck with the guts of WDF itself.


-Hoss



Re: not string or text fields and shards

2008-11-23 Thread Yonik Seeley
On Thu, Nov 20, 2008 at 7:41 AM, Marc Sturlese <[EMAIL PROTECTED]> wrote:
> I have started working with an index divided in 3 shards. When I did a
> distributed search I got an error with the fields that were not string or
> text. I read that the error was due to BinaryResponseWriter and not
> string/text empty fields.

I think it's more the case that if you have an invalid field value, it
could blow up at different points in different code paths.  The root
cause is still an invalid value in the field.

-Yonik


Re: Newbie Question - getting search results from dataimport request handler

2008-11-23 Thread Chris Hostetter

: > Logging an error and returning successfully (without adding any docs) is
: > still inconsistent with the way all other RequestHandlers work: fail the
: > request.
: >
: > I know DIH isn't a typical RequestHandler, but some things (like failing
: > on failure) seem like they should be a given.
: SOLR-842 .
: DIH is an ETL tool pretending to be a RequestHandler. Originally it
: was built to run outside of Solr using SolrJ. For better integration
: and ease of use we changed it later.
: 
: SOLR-853 aims to achieve the oroginal goal
: 
: The goal of DIH is to become a full featured ETL tool.

Understood ... but shouldn't ETL Tools "fail on failure" ?

I mean forget Solr for a minute:   If i've got a standalone ETL Tool that 
runs as a daemon, and on startup it logs some error messages because i've 
got bad configs (and it can tell the fields i've listed for my 
'target' system don't exist there) should it report "success" everytime i 
push data to it?

Based on this thread, that's what it sounds like DIH is doing right now in 
situations like this.

If nothing else, we could give DIH a way to check the global
 value from solrconfig.xml and make it's 
decisison that way.



-Hoss



RE: Updating schema.xml without deleting index?

2008-11-23 Thread Chris Hostetter

: of myfield as the same result.  I wish there was an option to just
: completely reindex all data..i suppose optimize may do that a little
: bit?

"optimize" is just a low level lucene call to purge all deleted docs and 
merge all index segments into a single segment.

and there is an option to reindex all data: take whatever you used to 
index in the data the first time, and do it again. :)

seriously though, if you use something like DateImportHandler this is 
fairly easy, if you don't use something like DIH, it's a matter of 
designing whatever system you do use so that it's easy do reindex later as 
needed (unless you're certain that your schema is perfect and never needs 
to change)

The way you solved your use case (exclude things that don't have a value) 
is exactly how i go about deal with situations like this routinely.



-Hoss



Re: [VOTE] Community Logo Preferences

2008-11-23 Thread phil cryer
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12394475/solr2_maho-vote.png
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg

On Sun, Nov 23, 2008 at 10:59 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> Please submit your preferences for the solr logo.
>
> For full voting details, see:
>  http://wiki.apache.org/solr/LogoContest#Voting
>
> The eligible logos are:
>  http://people.apache.org/~ryan/solr-logo-options.html
>
> Any and all members of the Solr community are encouraged to reply to this
> thread and list (up to) 5 ranked choices by listing the Jira attachment
> URLs. Votes will be assigned a point value based on rank. For each vote, 1st
> choice has a point value of 5, 5th place has a point value of 1, and all
> others follow a similar pattern.
>
> https://issues.apache.org/jira/secure/attachment/12345/yourfrstchoice.jpg
> https://issues.apache.org/jira/secure/attachment/34567/yoursecondchoice.jpg
> ...
>
> This poll will be open until Wednesday November 26th, 2008 @ 11:59PM GMT
>
> When the poll is complete, the solr committers will tally the community
> preferences and take a final vote on the logo.
>
> A big thanks to everyone would submitted possible logos -- its great to see
> so many good options.



-- 
http://fak3r.com dim high beams for oncoming traffic
http://lefttochance.com know your rights, don't lose them


Re: filtering on blank OR specific range

2008-11-23 Thread Chris Hostetter

: I'm having difficultly filtering my documents when a field is either
: blank or set to a specific value.  I would have thought this would work
: 
:   fq=-Type:[* TO *] OR Type:blue

Rule#1 don't try to mix AND/OR syntax with +/- syntax ... it never works 
the way you want.

"a OR b" is just syntactic sugar for "a b" ... "-a OR b" is equivilent to 
"-a b" ... if you use debugQuery=true and look at the 
parsed_filter_queries you'll see that your fq is being parsed as...

   -Type:[* TO *]  Type:blue

...looking at it that way, odes it make sense why it doesn't match any 
documents?  there is only one "positive" clause, which is that Type == 
blue.  But then you are excluding any docs where Type has a value, so you 
get the empty set.


you could have a special "Type_empty" boolean field and use...

fq = Type_empty:true Type:blue

...or you can play tricks with the syntax, and do something like this...

fq = (*:* -Type:[* TO *]) Type:blue


-Hoss



Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Jon Baer

https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg


Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Chris Haggstrom

https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg
https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12394366/solr3_maho.png
https://issues.apache.org/jira/secure/attachment/12393936/logo_remake.jpg

Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Mark Miller

https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png
https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png
https://issues.apache.org/jira/secure/attachment/12393951/sslogo-solr-classic.png
https://issues.apache.org/jira/secure/attachment/12391946/apache_solr_burning.png
https://issues.apache.org/jira/secure/attachment/12392306/apache_solr_sun.png

- Mark


Re: Pagination with Solr

2008-11-23 Thread lupiss

 ok! gracias ryguasu por tu respuesta, mira que ahora que recuerdo si hay un
setStart y setRows trataré con eso y espero poder terminar mi proyecto, 1000
gracias =)
-- 
View this message in context: 
http://www.nabble.com/Pagination-with-Solr-tp13847908p20650529.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: QueryElevationComponent

2008-11-23 Thread Erik Hatcher


On Nov 23, 2008, at 3:06 PM, Paolo Ruscitti wrote:

Thanks Ryan for your answer.

The only thing that may be weird is that if you id field is named  
"myid",
your elevate.xml file still refers to "id" as the unique key.  Is  
that what

you are refering to?

yes, my id field is named "myid", but elevate.xml expects its name  
is "id" .


Please find below more info:

I' using the very last revision (720030)

I also tried both








As Ryan said, that is incorrect - it must be id="..." regardless of  
what your uniqueKey field is.










remove "myid:" from that value and you should be in good shape.

Granted it is confusing.  But what's the alternative?  Maybe calling  
every attribute that needs to refer to a uniqueKey literally  
"uniqueKey"?   I don't think we want to have attributes changing their  
name based on the uniqueKey field name.


Erik



Re: QueryElevationComponent

2008-11-23 Thread Paolo Ruscitti
Thanks Ryan for your answer.

>The only thing that may be weird is that if you id field is named "myid",
your elevate.xml file still refers to "id" as the unique key.  Is that what
you are refering to?

yes, my id field is named "myid", but elevate.xml expects its name is "id" .

Please find below more info:

I' using the very last revision (720030)

I also tried both








and








In the former case I've got a tomcat error:

HTTP Status 500 - Severe errors in solr configuration. Check your log files
for more detailed information on what may be wrong. If you want solr to
continue after configuration errors, change:
false in solr.xml
-
org.apache.solr.common.SolrException: Error initializing
QueryElevationComponent. at
org.apache.solr.handler.component.QueryElevationComponent.inform(QueryElevationComponent.java:200)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:319)
at org.apache.solr.core.SolrCore.(SolrCore.java:563) at
...

In the latter case solr works but the QueryElevation does not.

The query I' using is:
http://localhost:8080/solr/post1/select/?q=cars&version=2.2&start=0&rows=10&indent=on&enableElevation=true

thanks
Paolo

On Sun, Nov 23, 2008 at 12:29 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

> hymm -- that *should* not be the case.  The id field in
> QueryElevationComponent uses the globally defined field:
>
>SchemaField sf = core.getSchema().getUniqueKeyField();
>...
>idField = sf.getName().intern();
>
> The only thing that may be weird is that if you id field is named "myid",
> your elevate.xml file still refers to "id" as the unique key.  Is that what
> you are refering to?
>
> I have not tested this, so it may very well be broken.
>
> ryan
>
>
>
>
> On Nov 22, 2008, at 5:31 PM, Paolo Ruscitti wrote:
>
>  I have a question about QueryElevationComponent.
>>
>> I'm trying to use it but it seems it works properly if, and only if, the
>> id
>> field name in   definition is '*id*'.
>>
>> so if I have *myid*, it does not work.
>>
>>
>> Could you please tell me what I'm doing wrong?
>> thaks a lot
>>
>> Paolo
>>
>> - this is my elevate.xml
>>
>> 
>> 
>> 
>> 
>> 
>>
>> - I added at the tail of solrconfig.xml file
>> ...
>>
>> 
>>  
>>   
>>   string
>>   elevate.xml
>>  
>>
>>  
>>  > startup="lazy">
>>   
>> explicit
>>   
>>   
>> elevator
>>   
>>  
>>
>> 
>>
>> - in my schema I have
>>
>> > required="true"
>> />
>> ...
>> myid
>>
>
>


Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Tricia Williams

https://issues.apache.org/jira/secure/attachment/12394282/solr2_maho_impression.png
https://issues.apache.org/jira/secure/attachment/12394366/solr3_maho.png
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg
https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg
https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png


Compiling Solr 1.3.0 + KStem

2008-11-23 Thread Chris Haggstrom
I was hoping to try using KStem with Solr 1.3.0, but am having trouble  
getting it to compile.


With a fresh Solr 1.3.0 that will build successfully, I unzipped the  
KStemSolr.zip within the apache-solr-1.3.0 directory, but when I then  
try to build (using Ant 1.7.1 and Sun HotSpot JDK 1.6.0 update 10), I  
get:


[EMAIL PROTECTED]:/usr/local/build/apache-solr-1.3.0$ ant compile
Buildfile: build.xml

init-forrest-entities:
[mkdir] Created dir: /usr/local/build/apache-solr-1.3.0/build
[mkdir] Created dir: /usr/local/build/apache-solr-1.3.0/build/web

compile-common:
[mkdir] Created dir: /usr/local/build/apache-solr-1.3.0/build/ 
common

[javac] Compiling 36 source files to
/usr/local/build/apache-solr-1.3.0/build/common
[javac] Note:
/usr/local/build/apache-solr-1.3.0/src/java/org/apache/solr/common/ 
util/FastInputStream.java

uses or overrides a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /usr/local/build/apache-solr-1.3.0/build/core
[javac] Compiling 350 source files to
/usr/local/build/apache-solr-1.3.0/build/core
[javac]
/usr/local/build/apache-solr-1.3.0/src/java/org/apache/solr/analysis/ 
KStemFilterFactory.java:63:

cannot find symbol
[javac] symbol  : method
init 
(org 
.apache 
.solr.core.SolrConfig,java.util.Map)
[javac] location: class  
org.apache.solr.analysis.BaseTokenFilterFactory

[javac] super.init(solrConfig, args);
[javac]  ^
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.
[javac] 1 error

BUILD FAILED
/usr/local/build/apache-solr-1.3.0/build.xml:125: The following error
occurred while executing this line:
/usr/local/build/apache-solr-1.3.0/common-build.xml:149: Compile failed;
see the compiler error output for details.


I've also tried to build the KStem filter factory using the KStem.jar  
via the instructions on the Wiki, but I am not sure I'm doing the  
right things in steps 3 and 5:


3.  Modify the package name on the source files to match your install

Does that mean to change package org.apache.lucene.analysis; to  
org.apache.solr.analysis?


5.  Build the jar file and drop that into your Solr /lib directory.

Nothing I've tried here gives me any .class files, just more "cannot  
find symbol" errors.


Any suggestions would be much appreciated.  I am definitely a novice  
in building Java apps, so I could be missing something very simple  
here.  Thanks,


-Chris


Re: [VOTE] Community Logo Preferences

2008-11-23 Thread Mark Lindeman

https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394265/apache_solr_b_blue.jpg
https://issues.apache.org/jira/secure/attachment/12394263/apache_solr_a_blue.jpg

b.t.w, 2 logo's are missing:

https://issues.apache.org/jira/secure/attachment/12394270/apache_solr_d_blue.jpg
and
https://issues.apache.org/jira/secure/attachment/12394271/apache_solr_d_red.jpg

Ryan McKinley schreef op 11/23/2008 05:59 PM:

Please submit your preferences for the solr logo.

For full voting details, see:
  http://wiki.apache.org/solr/LogoContest#Voting

The eligible logos are:
  http://people.apache.org/~ryan/solr-logo-options.html

Any and all members of the Solr community are encouraged to reply to 
this thread and list (up to) 5 ranked choices by listing the Jira 
attachment URLs. Votes will be assigned a point value based on rank. For 
each vote, 1st choice has a point value of 5, 5th place has a point 
value of 1, and all others follow a similar pattern.


https://issues.apache.org/jira/secure/attachment/12345/yourfrstchoice.jpg
https://issues.apache.org/jira/secure/attachment/34567/yoursecondchoice.jpg
...

This poll will be open until Wednesday November 26th, 2008 @ 11:59PM GMT

When the poll is complete, the solr committers will tally the community 
preferences and take a final vote on the logo.


A big thanks to everyone would submitted possible logos -- its great to 
see so many good options.




[VOTE] Community Logo Preferences

2008-11-23 Thread Ryan McKinley

Please submit your preferences for the solr logo.

For full voting details, see:
  http://wiki.apache.org/solr/LogoContest#Voting

The eligible logos are:
  http://people.apache.org/~ryan/solr-logo-options.html

Any and all members of the Solr community are encouraged to reply to  
this thread and list (up to) 5 ranked choices by listing the Jira  
attachment URLs. Votes will be assigned a point value based on rank.  
For each vote, 1st choice has a point value of 5, 5th place has a  
point value of 1, and all others follow a similar pattern.


https://issues.apache.org/jira/secure/attachment/12345/yourfrstchoice.jpg
https://issues.apache.org/jira/secure/attachment/34567/yoursecondchoice.jpg
...

This poll will be open until Wednesday November 26th, 2008 @ 11:59PM GMT

When the poll is complete, the solr committers will tally the  
community preferences and take a final vote on the logo.


A big thanks to everyone would submitted possible logos -- its great  
to see so many good options.

Re: Question about Query Phrase Slop (qs) in dismax

2008-11-23 Thread anuvenk

Somebody please help clear this doubt. What more could i do with the dismax
handler to remove results that don't have 'word1'', 'word2', 'word3' etc in
a search phrase not within 5 words of one another, to not come up in the
results?


anuvenk wrote:
> 
> From the solr wiki, it sounded like if qs is set to 5 for example, & if
> the search term is 'child custody', only docs with 'child' & 'custody'
> within 5 words of one another would be returned in results. Is this
> correct? If so, it doesn't seem to be working for me. I see docs with
> 'child' & 'custody' more than 5 words of one another (excluding stop
> words) which is resulting in bad user experience as those docs are not so
> relevant. What more could i do to improve quality in the results?
> 

-- 
View this message in context: 
http://www.nabble.com/Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20648109.html
Sent from the Solr - User mailing list archive at Nabble.com.



Using Solr for indexing emails

2008-11-23 Thread Timo Sirainen
Hi,

A while ago I implemented searching emails with Solr for my IMAP server
(www.dovecot.org). Seems to work ok, but now I'm having a bit of trouble
trying to figure out how to implement searching from multiple mailboxes
efficiently. Would be great if someone had suggestions how to do things
better.

The main problem is that before doing the search, I first have to check
if there are any unindexed messages and then add them to Solr. This is
done using a query like:

 - fl=uid
 - rows=1
 - sort=uid desc
 - q=uidv: box: user:

So it returns the highest IMAP UID field (which is an always-ascending
integer) for the given mailbox (you can ignore the uidvalidity). I can
then add all messages with higher UIDs to Solr before doing the actual
search.

When searching multiple mailboxes the above query would have to be sent
to every mailbox separately. That really doesn't seem like the best
solution, especially when there are a lot of mailboxes. But I don't
think Solr has a way to return "highest uid field for each
box:"?

Is that above query even efficient for a single mailbox? I did consider
using separate documents for storing the highest UID for each mailbox,
but that causes annoying desynchronization possibilities. Especially
because currently I can just keep sending documents to Solr without
locking and let it drop duplicates automatically (should be rare). With
per-mailbox highest-uid documents I can't really see a way to do this
without locking or allowing duplicate fields to be added and later some
garbage collection deleting all but the one highest value (annoyingly
complex).

I could of course also keep track of what's indexed on Dovecot's side,
but that could also lead to desynchronization issues and I'd like to
avoid them.

I guess the ideal solution would be if it was somehow possible to create
a SQL-like trigger that updates the per-mailbox highest-uid document
whenever adding a new document with a higher UID value.


signature.asc
Description: This is a digitally signed message part