Re: 2Gb process on 32 bits

2007-11-12 Thread James liu
if u use tomcat,,,it default port: 8080 and other default port.

so u just use other tomcat which use 8181 and other port...(i remember u
should modify three port(one tomcat) )

I used to have four tomcat in One SERVER.

On Nov 9, 2007 7:39 AM, Isart Montane <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> i'm experiencing some trouble when i'm trying to lauch solr with more
> than 1.6GB. My server is a FC5 with 8GB RAM but when I start solr like
> this
>
> java -Xmx2000m -jar start.jar
>
> i get the following errors:
>
> Error occurred during initialization of VM
> Could not reserve enough space for object heap
> Could not create the Java virtual machine.
>
> I've tried to start a virtual machine like this
>
> java -Xmx2000m -version
>
> but i get the same errors.
>
> I've read there's a kernel limitation for a 32 bits architecture of 2Gb
> per process, and i just wanna know if anybody knows an alternative to
> get a new 64bits server.
>
> Thanks
> Isart
>



-- 
regards
jl


Re: Does SOLR supports multiple instances within the same webapplication?

2007-11-12 Thread James liu
if I understand correct,,u just do it like that:(i use php)

$data1 = getDataFromInstance1($url);
$data2 = getDataFromInstance2($url);

it just have multi solr Instance. and getData from the distance.


On Nov 12, 2007 11:15 PM, Dilip.TS <[EMAIL PROTECTED]> wrote:

> Hello,
>
>  Does SOLR supports multiple instances within the same web application? If
> so how is this achieved?
>
>  Thanks in advance.
>
> Regards,
> Dilip TS
>
>


-- 
regards
jl


Re: Faceting over limited result set

2007-11-12 Thread Chris Hostetter

: It's not really a performance-related issue, the primary goal is to use the
: facet information to determine the most relevant product category related to
: the particular search being performed.

ah ... ok, i understand now.  the order does matter, you want the "top N" 
documents sorted by some criteria (either score, or maybe popularity i 
would imagine) and then you want to pick the categories based on that.

i had to build this for CNET back before solr went open source, but yes - 
i did it using a custom subclass of dismax similar to what i 
discribed before.

one thing to watch out for is that you probably want to use a consistent 
sort independent of the user's sort -- if the user re-sorts by price it 
can be disconcerting for them if that changes the navigation links.


-Hoss



Re: DINSTINCT ON functionality in Solr?

2007-11-12 Thread Pieter Berkel
Currently this functionality is not available in Solr out-of-the-box,
however there is a patch implementing Field Collapsing
http://issues.apache.org/jira/browse/SOLR-236 which might be similar to what
you are trying to achieve.

Piete



On 13/11/2007, Jörg Kiegeland <[EMAIL PROTECTED]> wrote:
>
> Is there a way to define a query in that way that a search result
> contains only one representative of every set of documents which are
> equal on a given field (it is not important which representative
> document), i.e. to have the DINTINCT ON-concept from relational
> databases in Solr?
>
> If this cannot be done with the search API of Lucene, may be one can use
> Solr server side hooks or filters to achieve this? How?
>
> The reason why I do not want to do this filtering manually, is, because
> I want to have as many matches as possible with respect to my defined
> result limit for the query (and filtering the search result on client
> side may really kick me off from this limit far away).
>
> Thanks..
>


Re: Faceting over limited result set

2007-11-12 Thread Pieter Berkel
On 13/11/2007, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
>
> can you elaborate on your use case ... the only time i've ever seen people
> ask about something like this it was because true facet counts were too
> expensive to compute, so they were doing "sampling" of the first N
> results.
>
> In Solr, Sampling like this would likely be just as expensive as getting
> the full count.


It's not really a performance-related issue, the primary goal is to use the
facet information to determine the most relevant product category related to
the particular search being performed.

Generally the facets returned by simple, generic queries are fine for this
purpose (e.g. a search for "nokia" will correctly return "Mobile / Cell
Phone" as the most frequent facet), however facet data for more specific
searches are not as clear-cut (e.g. "samsung tv" where TVs will appear at
the top of the search results, but will also match other "samsung' products
like mobile phones and mp3 players - obviously I could tweak 'mm' parameter
to fix this particular case, but it wouldn't really solve my problem).

The theory is that facet information generated from the first 'x' (lets say
100) matches to a query (ordered by score / relevance) will be more accurate
(for the above purpose) than facets obtained over the entire result set.  So
ideally, it would be useful to be able to contstrain the size of the DocSet
somehow (as you mention below).


matching occurs in increasing order of docid, so even if there was as hook
> to say "stop matching after N docs" those N wouldn't be a good
> representative sample, they would be biased towards "older" documents
> (based on when they were indexed, not on any particular date field)
>
> if what you are interested in is stats on the first N docs according to a
> specific sort (score or otherwise) then you could write a custom request
> handler that executed a search with a limit of N, got the DocList,
> iterated over it to build a DocSet, and then used that DocSet to do
> faceting ... but that would probably take even longer then just using the
> full DocSet matching the entire query.



I was hoping to avoid having to write a custom request handler but your
suggestion above sounds like it would do the trick.  I'm also debating
whether to extract my own facet info from a result set on the client side,
but this would be even slower.

Thanks for your suggestions so far,
Piete


Re: Associating pronouns instances to proper nouns?

2007-11-12 Thread David Neubert
All

 have found (from using the Admin/Analysis page) that if I were to append 
unique initials (that didn't match any other word or acronym) to each pronoun 
(e.g. I-WCN, she-WCN,  my-WCN etc) that the default parsing and tokenization 
for the text field in SOLR might actually do the trick -- it parses down to  I, 
wcn, IWCN, i, idgn -- all at the same word position -- so that is perfect.  I 
haven't exhaustively tested all capitalization nuances, but am too woried about 
that.

If I want to do an exhaustive search for person WCN, i just have to enter 
his/her initials and than can get all references including pronouns?

Anybody see any holes in this?  (sounds alarmingly easy so far)?

Dave

- Original Message 
From: David Neubert <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 12, 2007 3:04:20 PM
Subject: Re: Associating pronouns instances to proper nouns?


Attempting to answer my own question, which I should probably just try, 
assuming I can doctor the indexed text ---I suppose I could do something like 
change all instances or I, he, etc that refer to one person to IJBA HEJBA, 
HIMJBA (making sure they would never equal a normal word) -- then use the 
synonym feature to link IJBA, HEJBA, HIMJBA, Joe Book Author, J.B.Author 
(although, even if this were a good approach)  I don't know if you can link 
synonyms for phrases as opposed to a single word. And of course this would 
require a correlative translation mechanism at display time to render I, he, 
him, instead of the indexed acronym.

- Original Message 
From: David Neubert <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 12, 2007 2:54:11 PM
Subject: Associating pronouns instances to proper nouns?


All,

I am working with very exact text and search over permament documents (books).  
It would be great to associate pronouns like he, she, him, her, I, my, etc. 
with the acutal author or person the pronoun refers to.  I can see how I could 
get pretty darn close with the synonym feature in Lucene.  Unfortunately 
though, as I understand it, this would associate all instances or I, he, she, 
etc. instead of particular instances.

I have come up with a crude mechanism, adding the initials for the referred 
person, immediately after the pronoun ... him{DGN}, but this of course 
complicates word counts and potential prhase lookups, etc. (which I could 
probably live with and work around).

But after understanding how easy it is to add synonymns for any particular
 word in a document, is there any standard practical way to add synonymns to a 
particular word instance within a document?  That would really do the trick?

Dave





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Chris Hostetter

: > - problem is, I am one week into both technologies (though have years in 
the search space) -- wish I could
: > go to Hong Kong -- any discounts available anywhere :)
: 
: Unfortunately the OS Summit has been canceled.

Or rescheduled to 2008 ... depending on wether you are a half-empty / 
half-full kind of person.

And lets not forget atlanta ... starting today and all...

http://us.apachecon.com/us2007/



-Hoss



Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 2:20 PM, David Neubert <[EMAIL PROTECTED]> wrote:
> Erik - thanks, I am considering this approach, verses explicit redundant 
> indexing -- and am also considering Lucene -

There's not a well defined solution in either IMO.

> - problem is, I am one week into both technologies (though have years in the 
> search space) -- wish I could
> go to Hong Kong -- any discounts available anywhere :)

Unfortunately the OS Summit has been canceled.

-Yonik


Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert
Erik - thanks, I am considering this approach, verses explicit redundant 
indexing -- and am also considering Lucene -- problem is, I am one week into 
both technologies (though have years in the search space) -- wish I could go to 
Hong Kong -- any discounts available anywhere :)

Dave

- Original Message 
From: Erick Erickson <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 12, 2007 2:11:14 PM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)

DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this
 may be
useful

For your line number, page number etc perspective, it is possible to
 index
special guaranteed-to-not-match tokens then use the termdocs/termenum
data, along with SpanQueries to figure this out at search time. For
instance,
coincident with the last term in each line, index the token "$".
Coincident
with the last token of every paragraph index the token "#". If you
 get
the
offsets of the matching terms, you can quite quickly simply count the
 number
of line and paragraph tokens using TermDocs/TermEnums and correlate
 hits
to lines and paragraphs. The trick is to index your special tokens with
 an
increment of 0 (see SynonymAnalyzer in Lucene In Action for more on
 this).


Another possibility is to add a special field with each document with
 the
offsets
of each end-of-sentence and end-of-paragraph offsets (stored, not
 indexed).
Again, "given the offsets",  you can read in this field and figure out
 what
line/
paragraph your hits are in.

How suitable either of these is depends on a lot of characteristics of
 your
particular problem space. I'm not sure either of them is suitable for
 very
high
volume applications.

Also, I'm approaching this from an in-the-guts-of-lucene perspective,
 so
don't
even *think* of asking me how to really make this work in SOLR .

Best
Erick

On Nov 11, 2007 12:44 AM, David Neubert <[EMAIL PROTECTED]> wrote:

> Ryan (and others who need something to put them so sleep :) )
>
> Wow -- the light-bulb finally went off -- the Analzyer admin page is
 very
> cool -- I just was not at all thinking the SOLR/Lucene way.
>
> I need to rethink my whole approach now that I understand (from
 reviewing
> the schema.xml closer and playing with the Analyser) how compatible
 index
> and query policies can be applied automatically on a field by field
 basis by
> SOLR at both index and query time.
>
> I still may have a stumper here, but I need to give it some thought,
 and
> may return again with another question:
>
> The problem is that my text is book text (fairly large) that ooks
 very
> much like one would expect:
> 
> 
> ...
> ...
> ..
> 
> 
> The search results need to return exact sentences or paragraphs with
 their
> exact page:line numbers (which is available in the embedded markup in
 the
> text).
>
> There were previous responses by others, suggesting I look into
 payloads,
> but I did not fully understand that -- I may have to re-read those
 e-mails
> now that I am getting a clearer picture of SOLR/Lucene.
>
> However, the reason I resorted to indexing each paragraph as a single
> document, and then redundantly indexing each sentence as a single
 document,
> is because I was planning on pre-parsing the text myself (outside of
 SOLR)
> -- and feeding separate  elements to the  because in that
 way I
> could produce the page:line reference in the pre-parsing (again
 outside of
> SOLR) and feed it in as explict field in the  elements of the
 
> requests.  Therefore at query time, I will have the exact page:line
> corresponding to the start of the paragraph or sentence.
>
> But I am beginning to suspect, I was planning to do a lot of work
 that
> SOLR can do for me.
>
> I will continue to study this and respond when I am a bit clearer,
 but the
> closer I could get to just submitting the books a chapter at a time
 -- and
> letting SOLR do the work, the better (cause I have all the books in
 well
> formed xml at chapter levels).  However, I don't  see yet how I could
 get
> par/sen granular search result hits, along with their exact page:line
> coordinates unless I approach it by explicitly indexing the pars and
 sens as
> single documents, not chapters hits, and also return the entire text
 of the
> sen or par, and highlight the keywords within (for the search result
 hit).
>  Once a search result hit is selected, it would then act as expected
 and
> position into the chapter, at the selected reference, highlight again
 the
> key words, but this time in the context of an entire chapter (the
 whole
> document to the user's mind).
>
> Even with my new understanding you (and others) have given me, which
 I can
> use to certainly improve my approach -- it still seems to me that
 because
> multi-valued fields concatenate text -- even if you use the
> positionGapIncrment feature to prohibit unwanted phrase matches, how
 do you
> produce a well definied search result hit, bounde

Re: Associating pronouns instances to proper nouns?

2007-11-12 Thread David Neubert
Attempting to answer my own question, which I should probably just try, 
assuming I can doctor the indexed text ---I suppose I could do something like 
change all instances or I, he, etc that refer to one person to IJBA HEJBA, 
HIMJBA (making sure they would never equal a normal word) -- then use the 
synonym feature to link IJBA, HEJBA, HIMJBA, Joe Book Author, J.B.Author 
(although, even if this were a good approach)  I don't know if you can link 
synonyms for phrases as opposed to a single word. And of course this would 
require a correlative translation mechanism at display time to render I, he, 
him, instead of the indexed acronym.

- Original Message 
From: David Neubert <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, November 12, 2007 2:54:11 PM
Subject: Associating pronouns instances to proper nouns?


All,

I am working with very exact text and search over permament documents (books).  
It would be great to associate pronouns like he, she, him, her, I, my, etc. 
with the acutal author or person the pronoun refers to.  I can see how I could 
get pretty darn close with the synonym feature in Lucene.  Unfortunately 
though, as I understand it, this would associate all instances or I, he, she, 
etc. instead of particular instances.

I have come up with a crude mechanism, adding the initials for the referred 
person, immediately after the pronoun ... him{DGN}, but this of course 
complicates word counts and potential prhase lookups, etc. (which I could 
probably live with and work around).

But after understanding how easy it is to add synonymns for any particular
 word in a document, is there any standard practical way to add synonymns to a 
particular word instance within a document?  That would really do the trick?

Dave





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Phrase-based (vs. Word-Based) Proximity Search

2007-11-12 Thread Ken Krugler

Hi Chris,


I gather that the standard Solr query parser uses the same syntax for
proximity searches as Lucene, and that Lucene syntax is described at

http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches

This syntax lets me look for terms that are within x words of each
other. Their example is that

  "jakarta apache"~10

will find documents where "jakarta" and "apache" occur within 10 words
of one another.

What I would like to do is is find documents where *phrases*, not just
terms, are within x words of each other. I want to be able to say
things like

  Find the documents where the phrases "apache jakarta" and "sun
microsystems" occur within ten words
  of one another.


[snip]

I'd thought that span queries would allow you to do this type of 
thing, but they're not supported (currently) by the standard query 
parser.


E.g. check out the SpanNearQuery support in (recent) Lucene releases:

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/search/spans/SpanNearQuery.html

I would recommend re-posting this on the Lucene user list.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"


Associating pronouns instances to proper nouns?

2007-11-12 Thread David Neubert
All,

I am working with very exact text and search over permament documents (books).  
It would be great to associate pronouns like he, she, him, her, I, my, etc. 
with the acutal author or person the pronoun refers to.  I can see how I could 
get pretty darn close with the synonym feature in Lucene.  Unfortunately 
though, as I understand it, this would associate all instances or I, he, she, 
etc. instead of particular instances.

I have come up with a crude mechanism, adding the initials for the referred 
person, immediately after the pronoun ... him{DGN}, but this of course 
complicates word counts and potential prhase lookups, etc. (which I could 
probably live with and work around).

But after understanding how easy it is to add synonymns for any particular word 
in a document, is there any standard practical way to add synonymns to a 
particular word instance within a document?  That would really do the trick?

Dave





__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

RE: Solr + autocomplete

2007-11-12 Thread Park, Michael
Will I need to use Solr 1.3 with the EdgeNGramFilterFactory in order to
get the autosuggest feature?

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 12, 2007 1:05 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr + autocomplete


: "Error loading class 'solr.EdgeNGramFilterFactory'".  For some reason

EdgeNGramFilterFactory didn't exist when Solr 1.2 was released, but the 
EdgeNGramTokenizerFactory did.  (the javadocs that come with each
release 
list all of the various factories in that release)


-Hoss



Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread David Neubert
Erik,

Probably because of my newness to SOLR/Lucene, I see now what you/Yonik meant 
by "case" field, but I am not clear about your wording "per-book setting 
attached at index time" - would you mind ellaborating on that, so I am clear?

Dave

- Original Message 
From: Erik Hatcher <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Sunday, November 11, 2007 5:21:45 AM
Subject: Re: Redundant indexing * 4 only solution (for par/sen and case 
sensitivity)


Solr query syntax is documented here: 

What Yonik is referring to is creating your own "case" field with the  
per-book setting attached at index time.

Erik


On Nov 11, 2007, at 12:55 AM, David Neubert wrote:

> Yonik (or anyone else)
>
> Do you know where on-line documentation on the +case: syntax is  
> located?  I can't seem to find it.
>
> Dave
>
> - Original Message 
> From: Yonik Seeley <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, November 10, 2007 4:56:40 PM
> Subject: Re: Redundant indexing * 4 only solution (for par/sen and  
> case sensitivity)
>
>
> On Nov 10, 2007 4:24 PM, David Neubert <[EMAIL PROTECTED]> wrote:
>> So if I am hitting multiple fields (in the same search request) that
>  invoke different Analyzers -- am I at a dead end, and have to  
> result to
>  consequetive multiple queries instead
>
> Solr handles that for you automatically.
>
>> The app that I am replacing (and trying to enhance) has the ability
>  to search multiple books at once
>> with sen/par and case sensitivity settings individually selectable
>  per book
>
> You could easily select case sensitivity or not *per query* across
 all
>  books.
> You should step back and see what the requirements actually are (i.e.
> the reasons why one needs to be able to select case
> sensitive/insensitive on a book level... it doesn't make sense to me
> at first blush).
>
> It could be done on a per-book level in solr with a more complex
 query
> structure though...
>
> (+case:sensitive +(normal relevancy query on the case sensitive
 fields
> goes here)) OR (+case:insensitive +(normal relevancy query on the
 case
> insensitive fields goes here))
>
> -Yonik
>
>
>
>
>
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

Re: Redundant indexing * 4 only solution (for par/sen and case sensitivity)

2007-11-12 Thread Erick Erickson
DISCLAIMER: This is from a Lucene-centric viewpoint. That said, this may be
useful

For your line number, page number etc perspective, it is possible to index
special guaranteed-to-not-match tokens then use the termdocs/termenum
data, along with SpanQueries to figure this out at search time. For
instance,
coincident with the last term in each line, index the token "$".
Coincident
with the last token of every paragraph index the token "#". If you get
the
offsets of the matching terms, you can quite quickly simply count the number
of line and paragraph tokens using TermDocs/TermEnums and correlate hits
to lines and paragraphs. The trick is to index your special tokens with an
increment of 0 (see SynonymAnalyzer in Lucene In Action for more on this).


Another possibility is to add a special field with each document with the
offsets
of each end-of-sentence and end-of-paragraph offsets (stored, not indexed).
Again, "given the offsets",  you can read in this field and figure out what
line/
paragraph your hits are in.

How suitable either of these is depends on a lot of characteristics of your
particular problem space. I'm not sure either of them is suitable for very
high
volume applications.

Also, I'm approaching this from an in-the-guts-of-lucene perspective, so
don't
even *think* of asking me how to really make this work in SOLR .

Best
Erick

On Nov 11, 2007 12:44 AM, David Neubert <[EMAIL PROTECTED]> wrote:

> Ryan (and others who need something to put them so sleep :) )
>
> Wow -- the light-bulb finally went off -- the Analzyer admin page is very
> cool -- I just was not at all thinking the SOLR/Lucene way.
>
> I need to rethink my whole approach now that I understand (from reviewing
> the schema.xml closer and playing with the Analyser) how compatible index
> and query policies can be applied automatically on a field by field basis by
> SOLR at both index and query time.
>
> I still may have a stumper here, but I need to give it some thought, and
> may return again with another question:
>
> The problem is that my text is book text (fairly large) that ooks very
> much like one would expect:
> 
> 
> ...
> ...
> ..
> 
> 
> The search results need to return exact sentences or paragraphs with their
> exact page:line numbers (which is available in the embedded markup in the
> text).
>
> There were previous responses by others, suggesting I look into payloads,
> but I did not fully understand that -- I may have to re-read those e-mails
> now that I am getting a clearer picture of SOLR/Lucene.
>
> However, the reason I resorted to indexing each paragraph as a single
> document, and then redundantly indexing each sentence as a single document,
> is because I was planning on pre-parsing the text myself (outside of SOLR)
> -- and feeding separate  elements to the  because in that way I
> could produce the page:line reference in the pre-parsing (again outside of
> SOLR) and feed it in as explict field in the  elements of the 
> requests.  Therefore at query time, I will have the exact page:line
> corresponding to the start of the paragraph or sentence.
>
> But I am beginning to suspect, I was planning to do a lot of work that
> SOLR can do for me.
>
> I will continue to study this and respond when I am a bit clearer, but the
> closer I could get to just submitting the books a chapter at a time -- and
> letting SOLR do the work, the better (cause I have all the books in well
> formed xml at chapter levels).  However, I don't  see yet how I could get
> par/sen granular search result hits, along with their exact page:line
> coordinates unless I approach it by explicitly indexing the pars and sens as
> single documents, not chapters hits, and also return the entire text of the
> sen or par, and highlight the keywords within (for the search result hit).
>  Once a search result hit is selected, it would then act as expected and
> position into the chapter, at the selected reference, highlight again the
> key words, but this time in the context of an entire chapter (the whole
> document to the user's mind).
>
> Even with my new understanding you (and others) have given me, which I can
> use to certainly improve my approach -- it still seems to me that because
> multi-valued fields concatenate text -- even if you use the
> positionGapIncrment feature to prohibit unwanted phrase matches, how do you
> produce a well definied search result hit, bounded by the exact sen or par,
> unless you index them as single documents?
>
> Should I still read up on the payload discussion?
>
> Dave
>
>
>
>
> - Original Message 
> From: Ryan McKinley <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Saturday, November 10, 2007 5:00:43 PM
> Subject: Re: Redundant indexing * 4 only solution (for par/sen and case
> sensitivity)
>
>
> David Neubert wrote:
> > Ryan,
> >
> > Thanks for your response.  I infer from your response that you can
>  have a different analyzer for each field
>
> yes!  each field 

Phrase-based (vs. Word-Based) Proximity Search

2007-11-12 Thread Chris Harris
I gather that the standard Solr query parser uses the same syntax for
proximity searches as Lucene, and that Lucene syntax is described at

  http://lucene.apache.org/java/docs/queryparsersyntax.html#Proximity%20Searches

This syntax lets me look for terms that are within x words of each
other. Their example is that

  "jakarta apache"~10

will find documents where "jakarta" and "apache" occur within 10 words
of one another.

What I would like to do is is find documents where *phrases*, not just
terms, are within x words of each other. I want to be able to say
things like

  Find the documents where the phrases "apache jakarta" and "sun
microsystems" occur within ten words
  of one another.

If I gave such a search, I would *not* want it to count as a match if,
for instance, "apache" appeared near "microsystems" but "apache"
wasn't followed immediately by "jakarta", or "microsystems" wasn't
preceded immediately by "sun". I would also not want it to match if
"apache jakarta" appeared, but "sun microsystems" did not appear.

Is there any way to do such a search currently? I suppose it might work to say

  "apache jakarta sun microsystems"~10 +"apache jakarta" +"sun microsystems"

but that seems like an unfortunate hack. In any case it's not really
something I can expect my users to be able to type in by themselves.
In our current query language (which I'm hoping to wean our users off
of), they can type

  "apache jakarta"  "sun microsystems"

which I believe is more intuitive.

Any ideas?

Chris


DINSTINCT ON functionality in Solr?

2007-11-12 Thread Jörg Kiegeland
Is there a way to define a query in that way that a search result 
contains only one representative of every set of documents which are 
equal on a given field (it is not important which representative 
document), i.e. to have the DINTINCT ON-concept from relational 
databases in Solr?


If this cannot be done with the search API of Lucene, may be one can use 
Solr server side hooks or filters to achieve this? How?


The reason why I do not want to do this filtering manually, is, because 
I want to have as many matches as possible with respect to my defined 
result limit for the query (and filtering the search result on client 
side may really kick me off from this limit far away).


Thanks..


RE: Solr + autocomplete

2007-11-12 Thread Chris Hostetter

: "Error loading class 'solr.EdgeNGramFilterFactory'".  For some reason

EdgeNGramFilterFactory didn't exist when Solr 1.2 was released, but the 
EdgeNGramTokenizerFactory did.  (the javadocs that come with each release 
list all of the various factories in that release)


-Hoss



RE: Solr + autocomplete

2007-11-12 Thread Park, Michael
Thanks Ryan,

This looks like the way to go.  However, when I set up my schema I get,
"Error loading class 'solr.EdgeNGramFilterFactory'".  For some reason
the class is not found.  I tried the stable 1.2 build and even tried the
nightly build.  I'm using "".

Any suggestions?

Thanks,
Mike

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, October 15, 2007 4:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr + autocomplete

> 
> I would imagine there is a library to set up an autocomplete search
with
> Solr.  Does anyone have any suggestions?  Scriptaculous has a
JavaScript
> autocomplete library.  However, the server must return an unordered
> list.
> 

Solr does not provide an autocomplete UI, but it can return JSON that a 
JS library can use to populate an autocomplete.

Depending on you index size/ query speed, you may be fine with a 
standard faceting prefix filter.  If the index is large, you may want to

index using the EdgeNGramFilterFactory.

Check the last comment in:
https://issues.apache.org/jira/browse/SOLR-357

ryan




Re: Multiple indexes

2007-11-12 Thread Jae Joo
I have built the master solr instance and indexed some files. Once I run
snapshotter, i complains the error..  - snapshooter -d data/index (in
solr/bin directory)
Did I missed something?

++ date '+%Y/%m/%d %H:%M:%S'
+ echo 2007/11/12 12:38:40 taking snapshot
/solr/master/solr/data/index/snapshot.20071112123840
+ [[ -n '' ]]
+ mv 
/solr/master/solr/data/index/temp-snapshot.20071112123840/solr/master/solr/data/index/snapshot.20071112123840
mv: cannot access /solr/master/solr/data/index/temp-snapshot.20071112123840
Jae

On Nov 12, 2007 9:09 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

>
> just use the standard collection distribution stuff.  That is what it is
> made for! http://wiki.apache.org/solr/CollectionDistribution
>
> Alternatively, open up two indexes using the same config/dir -- do your
> indexing on one and the searching on the other.  when indexing is done
> (or finishes a big chunk) send  to the 'searching' one and it
> will see the new stuff.
>
> ryan
>
>
>
> Jae Joo wrote:
> > Here is my situation.
> >
> > I have 6 millions articles indexed and adding about 10k articles
> everyday.
> > If I maintain only one index, whenever the daily feeding is running, it
> > consumes the heap area and causes FGC.
> > I am thinking the way to have multiple indexes - one is for ongoing
> querying
> > service and one is for update. Once update is done, switch the index by
> > automatically and/or my application.
> >
> > Thanks,
> >
> > Jae joo
> >
> >
> > On Nov 12, 2007 8:48 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:
> >
> >> The advantages of a multi-core setup are configuration flexibility and
> >> dynamically changing available options (without a full restart).
> >>
> >> For high-performance production solr servers, I don't think there is
> >> much reason for it.  You may want to split the two indexes on to two
> >> machines.  You may want to run each index in a separate JVM (so if one
> >> crashes, the other does not)
> >>
> >> Maintaining 2 indexes is pretty easy, if that was a larger number or
> you
> >> need to create indexes for each user in a system then it would be worth
> >> investigating the multi-core setup (it is still in development)
> >>
> >> ryan
> >>
> >>
> >> Pierre-Yves LANDRON wrote:
> >>> Hello,
> >>>
> >>> Until now, i've used two instance of solr, one for each of my
> >> collections ; it works fine, but i wonder
> >>> if there is an advantage to use multiple indexes in one instance over
> >> several instances with one index each ?
> >>> Note that the two indexes have different schema.xml.
> >>>
> >>> Thanks.
> >>> PL
> >>>
>  Date: Thu, 8 Nov 2007 18:05:43 -0500
>  From: [EMAIL PROTECTED]
>  To: solr-user@lucene.apache.org
>  Subject: Multiple indexes
> 
>  Hi,
> 
>  I am looking for the way to utilize the multiple indexes for signle
> >> sole
>  instance.
>  I saw that there is the patch 215  available  and would like to ask
> >> someone
>  who knows how to use multiple indexes.
> 
>  Thanks,
> 
>  Jae Joo
> >>> _
> >>> Discover the new Windows Vista
> >>> http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE
> >>
> >
>
>


Re: leading wildcards

2007-11-12 Thread Michael Kimsal
Vote for that issue and perhaps it'll gain some more traction.  A former
colleague of mine was the one who contributed the patch in SOLR 218 and it
would be nice to have that configuration option 'standard' (if off by
default) in the next SOLR release.


On Nov 12, 2007 11:18 AM, Traut <[EMAIL PROTECTED]> wrote:

> Seems like there is no way to enable leading wildcard queries except
> code editing and files repacking. :(
>
> On 11/12/07, Bill Au <[EMAIL PROTECTED]> wrote:
> > The related bug is still open:
> >
> > http://issues.apache.org/jira/browse/SOLR-218
> >
> > Bill
> >
> > On Nov 12, 2007 10:25 AM, Traut <[EMAIL PROTECTED]> wrote:
> > > Hi
> > >  I found the thread about enabling leading wildcards in
> > > Solr as additional option in config file. I've got nightly Solr build
> > > and I can't find any options connected with leading wildcards in
> > > config files.
> > >
> > >  How I can enable leading wildcard queries in Solr? Thank
> you
> > >
> > >
> > > --
> > > Best regards,
> > > Traut
> > >
> >
>
>
> --
> Best regards,
> Traut
>



-- 
Michael Kimsal
http://webdevradio.com


Re: leading wildcards

2007-11-12 Thread Traut
Seems like there is no way to enable leading wildcard queries except
code editing and files repacking. :(

On 11/12/07, Bill Au <[EMAIL PROTECTED]> wrote:
> The related bug is still open:
>
> http://issues.apache.org/jira/browse/SOLR-218
>
> Bill
>
> On Nov 12, 2007 10:25 AM, Traut <[EMAIL PROTECTED]> wrote:
> > Hi
> >  I found the thread about enabling leading wildcards in
> > Solr as additional option in config file. I've got nightly Solr build
> > and I can't find any options connected with leading wildcards in
> > config files.
> >
> >  How I can enable leading wildcard queries in Solr? Thank you
> >
> >
> > --
> > Best regards,
> > Traut
> >
>


-- 
Best regards,
Traut


Re: Faceting over limited result set

2007-11-12 Thread Chris Hostetter
: I'm trying to obtain faceting information based on the first 'x' (lets say
: 100-500) results matching a given (dismax) query.  The actual documents
: matching the query are not important in this case, so intuitively the

can you elaborate on your use case ... the only time i've ever seen people 
ask about something like this it was because true facet counts were too 
expensive to compute, so they were doing "sampling" of the first N 
results.

In Solr, Sampling like this would likely be just as expensive as getting 
the full count.

: Unfortunately I can't find any easy way to limit the number of documents
: matched (and returned in the set).  It might be possible to achieve the

matching occurs in increasing order of docid, so even if there was as hook 
to say "stop matching after N docs" those N wouldn't be a good 
representative sample, they would be biased towards "older" documents 
(based on when they were indexed, not on any particular date field)

if what you are interested in is stats on the first N docs according to a 
specific sort (score or otherwise) then you could write a custom request 
handler that executed a search with a limit of N, got the DocList, 
iterated over it to build a DocSet, and then used that DocSet to do 
faceting ... but that would probably take even longer then just using the 
full DocSet matching the entire query.

but again: what is your use case?  the underlying question really baffles 
me.


-Hoss



Re: solr workflow ?

2007-11-12 Thread Venkatraman S
Highly unfortunate!

On Nov 12, 2007 9:07 PM, Traut <[EMAIL PROTECTED]> wrote:

> rtfm :)
> http://lucene.apache.org/solr/tutorial.html
>
> On Nov 12, 2007 4:33 PM, Dwarak R <[EMAIL PROTECTED]> wrote:
> > Hi Guys
> >
> > How do we add word documents / pdf / text / etc documents in solr ?. How
> do the content of the files are stored or indexed ?. Are these documents
> stored as XML in the SOLR filesystem ?
> >
> > Regards
> >
> > Dwarak R
> >
> > This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information. If you have
> received it in error, please notify the sender&[EMAIL PROTECTED] immediately 
> and delete the original. Any other use of the email by you is
> prohibited.
> >
>
>
>
> --
> Best regards,
> Traut
>



--


Re: Does SOLR supports multiple instances within the same webapplication?

2007-11-12 Thread Ryan McKinley

Dilip.TS wrote:

Hello,

  Does SOLR supports multiple instances within the same web application? If
so how is this achieved?



If you want multiple indices, you can run multiple web-apps.

If you need multiple indices in the same web-app, check SOLR-350 -- it 
is still in development, and make sure you *really* need it before going 
that route.


ryan


Re: leading wildcards

2007-11-12 Thread Bill Au
The related bug is still open:

http://issues.apache.org/jira/browse/SOLR-218

Bill

On Nov 12, 2007 10:25 AM, Traut <[EMAIL PROTECTED]> wrote:
> Hi
>  I found the thread about enabling leading wildcards in
> Solr as additional option in config file. I've got nightly Solr build
> and I can't find any options connected with leading wildcards in
> config files.
>
>  How I can enable leading wildcard queries in Solr? Thank you
>
>
> --
> Best regards,
> Traut
>


Re: solr workflow ?

2007-11-12 Thread Traut
rtfm :)
http://lucene.apache.org/solr/tutorial.html

On Nov 12, 2007 4:33 PM, Dwarak R <[EMAIL PROTECTED]> wrote:
> Hi Guys
>
> How do we add word documents / pdf / text / etc documents in solr ?. How do 
> the content of the files are stored or indexed ?. Are these documents stored 
> as XML in the SOLR filesystem ?
>
> Regards
>
> Dwarak R
>
> This message is for the designated recipient only and may contain privileged, 
> proprietary, or otherwise private information. If you have received it in 
> error, please notify the sender&[EMAIL PROTECTED]  immediately and delete the 
> original. Any other use of the email by you is prohibited.
>



-- 
Best regards,
Traut


leading wildcards

2007-11-12 Thread Traut
Hi
 I found the thread about enabling leading wildcards in
Solr as additional option in config file. I've got nightly Solr build
and I can't find any options connected with leading wildcards in
config files.

 How I can enable leading wildcard queries in Solr? Thank you


-- 
Best regards,
Traut


Does SOLR supports multiple instances within the same webapplication?

2007-11-12 Thread Dilip.TS
Hello,

  Does SOLR supports multiple instances within the same web application? If
so how is this achieved?

  Thanks in advance.

Regards,
Dilip TS



Re: no segments* file found

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 3:46 AM, SDIS M. Beauchamp <[EMAIL PROTECTED]> wrote:
> If I don't optimize, I 've got a too many files open at about 450K files
> and 3 Gb index

You may need to increase the number of filedescriptors in your system.
If you're using Linux, see this:
http://www.cs.uwaterloo.ca/~brecht/servers/openfiles.html
Check the system wide limit and the per-process limit.

-Yonik


RE: Best way to create multiple indexes

2007-11-12 Thread Rishabh Joshi

Ryan,

We currently have 8-9 million documents to index and this number will grow in 
the future. Also, we will never have a query that will search across groups, 
but, we will have queries that will search across sub-groups for sure.
Now, keeping this in mind we were thinking if we could have multiple indexes at 
the 'group' level at least.
Also, can multiple indexes be created dynamically? For example: In my 
application if I create a 'logical group', then an index should be created for 
that group.

Rishabh

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED]
Sent: Monday, November 12, 2007 7:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Best way to create multiple indexes

For starters, do you need to be able to search across groups or
sub-groups (in one query?)

If so, then you have to stick everything in one index.

You can add a field to each document saying what 'group' or 'sub-group'
it is in and then limit it at query time

  q="kittens +group:A"

The advantage to splitting it into multiple indexes is that you could
put each index on independent hardware.  Depending on your queries and
index size that may make a big difference.

ryan


Rishabh Joshi wrote:
> Hi,
>
> I have a requirement and was wondering if someone could help me in how to go 
> about it. We have to index about 8-9 million documents and their size can be 
> anywhere from a few KBs to a couple of MBs. These documents are categorized 
> into many 'groups' and 'sub-groups'. I wanted to know if we can create 
> multiple indexes based on 'groups' and then on 'sub-groups' in Solr? If yes, 
> then how do we go about it? I tried going through the section on 
> 'Collections' in the Solr Wiki, but could not make much use of it.

>
> Regards,
> Rishabh Joshi
>
>
>
>
>



solr workflow ?

2007-11-12 Thread Dwarak R
Hi Guys

How do we add word documents / pdf / text / etc documents in solr ?. How do the 
content of the files are stored or indexed ?. Are these documents stored as XML 
in the SOLR filesystem ?

Regards

Dwarak R

This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in 
error, please notify the sender&[EMAIL PROTECTED]  immediately and delete the 
original. Any other use of the email by you is prohibited.


Re: Best way to create multiple indexes

2007-11-12 Thread Dwarak R

Hi Guys

How do we add word documents / pdf / text / etc documents in solr ?. How the 
content of the files are stored or indexed ?. Does the documents are stored 
as XML in the filesystem ?


Regards

Dwarak R
- Original Message - 
From: "Ryan McKinley" <[EMAIL PROTECTED]>

To: 
Sent: Monday, November 12, 2007 7:43 PM
Subject: Re: Best way to create multiple indexes


For starters, do you need to be able to search across groups or sub-groups 
(in one query?)


If so, then you have to stick everything in one index.

You can add a field to each document saying what 'group' or 'sub-group' it 
is in and then limit it at query time


 q="kittens +group:A"

The advantage to splitting it into multiple indexes is that you could put 
each index on independent hardware.  Depending on your queries and index 
size that may make a big difference.


ryan


Rishabh Joshi wrote:

Hi,

I have a requirement and was wondering if someone could help me in how to 
go about it. We have to index about 8-9 million documents and their size 
can be anywhere from a few KBs to a couple of MBs. These documents are 
categorized into many 'groups' and 'sub-groups'. I wanted to know if we 
can create multiple indexes based on 'groups' and then on 'sub-groups' in 
Solr? If yes, then how do we go about it? I tried going through the 
section on 'Collections' in the Solr Wiki, but could not make much use of 
it.


Regards,
Rishabh Joshi











This message is for the designated recipient only and may contain privileged, 
proprietary, or otherwise private information. If you have received it in error, 
please notify the sender&[EMAIL PROTECTED]  immediately and delete the 
original. Any other use of the email by you is prohibited.


Re: Best way to create multiple indexes

2007-11-12 Thread Ryan McKinley
For starters, do you need to be able to search across groups or 
sub-groups (in one query?)


If so, then you have to stick everything in one index.

You can add a field to each document saying what 'group' or 'sub-group' 
it is in and then limit it at query time


 q="kittens +group:A"

The advantage to splitting it into multiple indexes is that you could 
put each index on independent hardware.  Depending on your queries and 
index size that may make a big difference.


ryan


Rishabh Joshi wrote:

Hi,

I have a requirement and was wondering if someone could help me in how to go 
about it. We have to index about 8-9 million documents and their size can be 
anywhere from a few KBs to a couple of MBs. These documents are categorized 
into many 'groups' and 'sub-groups'. I wanted to know if we can create multiple 
indexes based on 'groups' and then on 'sub-groups' in Solr? If yes, then how do 
we go about it? I tried going through the section on 'Collections' in the Solr 
Wiki, but could not make much use of it.

Regards,
Rishabh Joshi









Re: Multiple indexes

2007-11-12 Thread Ryan McKinley


just use the standard collection distribution stuff.  That is what it is 
made for! http://wiki.apache.org/solr/CollectionDistribution


Alternatively, open up two indexes using the same config/dir -- do your 
indexing on one and the searching on the other.  when indexing is done 
(or finishes a big chunk) send  to the 'searching' one and it 
will see the new stuff.


ryan



Jae Joo wrote:

Here is my situation.

I have 6 millions articles indexed and adding about 10k articles everyday.
If I maintain only one index, whenever the daily feeding is running, it
consumes the heap area and causes FGC.
I am thinking the way to have multiple indexes - one is for ongoing querying
service and one is for update. Once update is done, switch the index by
automatically and/or my application.

Thanks,

Jae joo


On Nov 12, 2007 8:48 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:


The advantages of a multi-core setup are configuration flexibility and
dynamically changing available options (without a full restart).

For high-performance production solr servers, I don't think there is
much reason for it.  You may want to split the two indexes on to two
machines.  You may want to run each index in a separate JVM (so if one
crashes, the other does not)

Maintaining 2 indexes is pretty easy, if that was a larger number or you
need to create indexes for each user in a system then it would be worth
investigating the multi-core setup (it is still in development)

ryan


Pierre-Yves LANDRON wrote:

Hello,

Until now, i've used two instance of solr, one for each of my

collections ; it works fine, but i wonder

if there is an advantage to use multiple indexes in one instance over

several instances with one index each ?

Note that the two indexes have different schema.xml.

Thanks.
PL


Date: Thu, 8 Nov 2007 18:05:43 -0500
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: Multiple indexes

Hi,

I am looking for the way to utilize the multiple indexes for signle

sole

instance.
I saw that there is the patch 215  available  and would like to ask

someone

who knows how to use multiple indexes.

Thanks,

Jae Joo

_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE








Re: Multiple indexes

2007-11-12 Thread Jae Joo
Here is my situation.

I have 6 millions articles indexed and adding about 10k articles everyday.
If I maintain only one index, whenever the daily feeding is running, it
consumes the heap area and causes FGC.
I am thinking the way to have multiple indexes - one is for ongoing querying
service and one is for update. Once update is done, switch the index by
automatically and/or my application.

Thanks,

Jae joo


On Nov 12, 2007 8:48 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote:

> The advantages of a multi-core setup are configuration flexibility and
> dynamically changing available options (without a full restart).
>
> For high-performance production solr servers, I don't think there is
> much reason for it.  You may want to split the two indexes on to two
> machines.  You may want to run each index in a separate JVM (so if one
> crashes, the other does not)
>
> Maintaining 2 indexes is pretty easy, if that was a larger number or you
> need to create indexes for each user in a system then it would be worth
> investigating the multi-core setup (it is still in development)
>
> ryan
>
>
> Pierre-Yves LANDRON wrote:
> > Hello,
> >
> > Until now, i've used two instance of solr, one for each of my
> collections ; it works fine, but i wonder
> > if there is an advantage to use multiple indexes in one instance over
> several instances with one index each ?
> > Note that the two indexes have different schema.xml.
> >
> > Thanks.
> > PL
> >
> >> Date: Thu, 8 Nov 2007 18:05:43 -0500
> >> From: [EMAIL PROTECTED]
> >> To: solr-user@lucene.apache.org
> >> Subject: Multiple indexes
> >>
> >> Hi,
> >>
> >> I am looking for the way to utilize the multiple indexes for signle
> sole
> >> instance.
> >> I saw that there is the patch 215  available  and would like to ask
> someone
> >> who knows how to use multiple indexes.
> >>
> >> Thanks,
> >>
> >> Jae Joo
> >
> > _
> > Discover the new Windows Vista
> > http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE
>
>


Query and heap Size

2007-11-12 Thread Jae Joo
In my system, the heap size (old generation) keeps growing up caused by
heavy traffic.
I have adjusted the size of young generation, but it does not work well.

Does anyone have any recommendation regarding this issue? - Solr
configuration and/or web.xml ...etc...

Thanks,

Jae


Re: solr range query

2007-11-12 Thread Yonik Seeley
On Nov 12, 2007 8:02 AM, Heba Farouk <[EMAIL PROTECTED]> wrote:
> I would like to use solr  to return ranges of searches on an integer
> field, if I wrote in the url  offset:[0 TO 10], it returns documents
> with offset values 0, 1, 10 only  but I want to return the range 0,1,2,
> 3, 4 ,10. How can I do that with solr

Use fieldType="sint" (sortable int... see the schema.xml), and reindex.

-Yonik


Re: Multiple indexes

2007-11-12 Thread Ryan McKinley
The advantages of a multi-core setup are configuration flexibility and 
dynamically changing available options (without a full restart).


For high-performance production solr servers, I don't think there is 
much reason for it.  You may want to split the two indexes on to two 
machines.  You may want to run each index in a separate JVM (so if one 
crashes, the other does not)


Maintaining 2 indexes is pretty easy, if that was a larger number or you 
need to create indexes for each user in a system then it would be worth 
investigating the multi-core setup (it is still in development)


ryan


Pierre-Yves LANDRON wrote:

Hello,

Until now, i've used two instance of solr, one for each of my collections ; it 
works fine, but i wonder
if there is an advantage to use multiple indexes in one instance over several 
instances with one index each ?
Note that the two indexes have different schema.xml.

Thanks.
PL


Date: Thu, 8 Nov 2007 18:05:43 -0500
From: [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Subject: Multiple indexes

Hi,

I am looking for the way to utilize the multiple indexes for signle sole
instance.
I saw that there is the patch 215  available  and would like to ask someone
who knows how to use multiple indexes.

Thanks,

Jae Joo


_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE




Re: I18N with SOLR?

2007-11-12 Thread Ed Summers
I'd say yes. Solr supports Unicode and ships with language specific
analyzers, and allows you to provide your own custom analyzers if you
need them. This allows you to create different  definitions
for the languages you want to support. For example here is an example
field type for French text which uses a French stopword list and
French stemming.


  




  


Then you can create a  definitions that allow you to
index and query your documents using the correct field type:



This means that when you index you need to know what language your
data is in so that you know what field names to use in your document
(e.g. title_french). And at search time you need to know what language
you are in so you know which fields to search.  Most user interfaces
are in a single language context so from the query perspective you'll
most likely know the language they want to search in. If you don't
know the language context in either case you could try to guess using
something like org.apache.nutch.analysis.lang.LanguageIdentifier.

I hope this helps. We used this technique (without the guessing) quite
effectively at the Library of Congress recently for a prototype
application that needed to provide search functionality in 7 different
languages.

//Ed

On Nov 12, 2007 1:56 AM, Dilip.TS <[EMAIL PROTECTED]> wrote:
> Hello,
>
>   Does SOLR supports I18N (with multiple language support) ?
>   Thanks in advance.
>
> Regards,
> Dilip TS
>
>


solr range query

2007-11-12 Thread Heba Farouk
Hello,

 

I would like to use solr  to return ranges of searches on an integer
field, if I wrote in the url  offset:[0 TO 10], it returns documents
with offset values 0, 1, 10 only  but I want to return the range 0,1,2,
3, 4 ,10. How can I do that with solr

 

Thanks in advance

 

 Best regards,

 

Heba Farouk

Software Engineer

Bibliotheca Alexandrina



RE: Multiple indexes

2007-11-12 Thread Pierre-Yves LANDRON

Hello,

Until now, i've used two instance of solr, one for each of my collections ; it 
works fine, but i wonder
if there is an advantage to use multiple indexes in one instance over several 
instances with one index each ?
Note that the two indexes have different schema.xml.

Thanks.
PL

> Date: Thu, 8 Nov 2007 18:05:43 -0500
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Multiple indexes
> 
> Hi,
> 
> I am looking for the way to utilize the multiple indexes for signle sole
> instance.
> I saw that there is the patch 215  available  and would like to ask someone
> who knows how to use multiple indexes.
> 
> Thanks,
> 
> Jae Joo

_
Discover the new Windows Vista
http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE

RE: no segments* file found

2007-11-12 Thread SDIS M. Beauchamp
No , I'm using a custom indexer, written in C# which submits content using some 
post request.

I let lucene manage the index on his own

Florent BEAUCHAMP

-Message d'origine-
De : Venkatraman S [mailto:[EMAIL PROTECTED] 
Envoyé : lundi 12 novembre 2007 10:19
À : solr-user@lucene.apache.org
Objet : Re: no segments* file found

are you using embedded solr?

I had stumbled on a similar error :
http://www.mail-archive.com/solr-user@lucene.apache.org/msg06085.html

-V

On Nov 12, 2007 2:16 PM, SDIS M. Beauchamp <[EMAIL PROTECTED]> wrote:

> I'm using solr to index our files servers ( 480K files )
>
> If I don't optimize, I 've got a too many files open at about 450K 
> files and 3 Gb index
>
> If i optimize I've got this stacktrace during the commit of all the 
> following update
>
> java.io.FileNotFoundException: no segments* file 
> found in
> org.apache.lucene.store.FSDirectory@/root/trunk/example/solr/data/index:
> files: _7xr.tis _7xt.fdt _7o1.tii _7xq.tis _7xn.nrm _7ws.fdt _7xt.prx 
> _7xp.nrm _7ws.nrm _7xo.nrm _7ws.tis _7xs.fdt _7vc.fnm _7u6.tis 
> _7vx.fnm _7vx.frq _7xs.nrm _7xn.tis _7xq.frq _7xs.tis _7xq.prx 
> _7vx.fdx _7ur.tii _7ur.frq _7xq.fnm _7xr.nrm _7vc.fdt _7xt.frq 
> _7xp.fdx _7ws.prx _7xs.frq _7xo.prx _7xq.nrm _7vx.tii _7vx.prx 
> _7xq.tii _7xs.fnm _7xs.tii _7ws.tii _7xt.fdx _7vc.nrm _7vc.prx 
> _7vc.tis _7xq.fdt _7ur.prx _7xn.fdx _7xp.frq _7vx.nrm _7ur.fdt 
> _7xr.fnm _7ws.fdx _7u6.tii _7xr.tii _7vc.frq _7vx.tis _7xp.fdt 
> _7xr.frq_7ur.tis _7xp.prx _7xr.fdx _7xt.fnm _7xn.tii _7vc.fdx _7xo.fdt 
> _7u6.fnm _7xn.frq _7xp.tis _7o1.frq _7xn.prx _7ur.fdx _7ur.fnm 
> _7o1.fdx _7xs.fdx _7xn.fdt _7xt.tis _7xp.fnm _7xo.fnm _7xn.fnm 
> _7u6.prx _7xq.fdx _7xo.tii _7ws.fnm _7vc.tii _7o1.prx _7xr.fdt 
> _7o1.fdt _7ur.nrm _7ws.frq _7u6.nrm _7o1.nrm _7vx.fdt _7xt.tii 
> _7u6.fdx _7xo.frq _7u6.frq _7xs.prx _7xr.prx _7o1.tis _7xt.nrm _7xp.tii 
> _7xo.tis _7u6.fdt _7xo.fdx _7o1.fnm segments.gen
>at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
> s.java:516)
>at
> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:243)
>at
> org.apache.lucene.index.IndexWriter.init(IndexWriter.java:616)
>at
> org.apache.lucene.index.IndexWriter.(IndexWriter.java:410)
>at
> org.apache.solr.update.SolrIndexWriter.< 
> ;init>(SolrIndexWriter.java
> :97)
>at
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandl
> er
> .java:121)
>at
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHan
> dl
> er2.java:189)
>at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.
> java:267)
>at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpda
> te
> ProcessorFactory.java :67)
>at
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdat
> eR
> equestHandler.java:196)
>at
> org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpda
> te
> RequestHandler.java :386)
>at
> org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:
> 57)
>
>
> 
>
> If I restart solr I've got a NullPointerException in DispatchFilter
>
> tested with solr 1.2 and 1.3 , the behaviour is the same
>
> Regards
>
> Florent BEAUCHAMP
>



--


Re: Trim filer active for solr.StrField ?

2007-11-12 Thread Jörg Kiegeland



what is your specific SolrQuery?

calling:
 query.setQuery( " stuff with spaces   " );

does not call trim(), but some other calls do.

My query looks e.g.

(myField:"_T8sY05EAEdyU7fJs63mvdA" OR myField:"_T8sY0ZEAEdyU7fJs63mvdA" 
OR myField:"_T8sY0pEAEdyU7fJs63mvdA") AND NOT 
myField:"_T8sY1JEAEdyU7fJs63mvdA"



So I want to find all documents where field "myField" contains any of 
some UUIDs and must not contain another set of other UUIDs.


The only other thing I do is set the result limit:

   solrQuery.setRows(resultLimit);

The actual strings which are truncated are in other fields of returned 
documents.


Any idea?




Re: no segments* file found

2007-11-12 Thread Venkatraman S
are you using embedded solr?

I had stumbled on a similar error :
http://www.mail-archive.com/solr-user@lucene.apache.org/msg06085.html

-V

On Nov 12, 2007 2:16 PM, SDIS M. Beauchamp <[EMAIL PROTECTED]> wrote:

> I'm using solr to index our files servers ( 480K files )
>
> If I don't optimize, I 've got a too many files open at about 450K files
> and 3 Gb index
>
> If i optimize I've got this stacktrace during the commit of all the
> following update
>
> java.io.FileNotFoundException: no segments* file
> found in
> org.apache.lucene.store.FSDirectory@/root/trunk/example/solr/data/index:
> files: _7xr.tis _7xt.fdt _7o1.tii _7xq.tis _7xn.nrm _7ws.fdt _7xt.prx
> _7xp.nrm _7ws.nrm _7xo.nrm _7ws.tis _7xs.fdt _7vc.fnm _7u6.tis _7vx.fnm
> _7vx.frq _7xs.nrm _7xn.tis _7xq.frq _7xs.tis _7xq.prx _7vx.fdx _7ur.tii
> _7ur.frq _7xq.fnm _7xr.nrm _7vc.fdt _7xt.frq _7xp.fdx _7ws.prx _7xs.frq
> _7xo.prx _7xq.nrm _7vx.tii _7vx.prx _7xq.tii _7xs.fnm _7xs.tii _7ws.tii
> _7xt.fdx _7vc.nrm _7vc.prx _7vc.tis _7xq.fdt _7ur.prx _7xn.fdx _7xp.frq
> _7vx.nrm _7ur.fdt _7xr.fnm _7ws.fdx _7u6.tii _7xr.tii _7vc.frq _7vx.tis
> _7xp.fdt _7xr.frq_7ur.tis _7xp.prx _7xr.fdx _7xt.fnm _7xn.tii _7vc.fdx
> _7xo.fdt _7u6.fnm _7xn.frq _7xp.tis _7o1.frq _7xn.prx _7ur.fdx _7ur.fnm
> _7o1.fdx _7xs.fdx _7xn.fdt _7xt.tis _7xp.fnm _7xo.fnm _7xn.fnm _7u6.prx
> _7xq.fdx _7xo.tii _7ws.fnm _7vc.tii _7o1.prx _7xr.fdt _7o1.fdt _7ur.nrm
> _7ws.frq _7u6.nrm _7o1.nrm _7vx.fdt _7xt.tii _7u6.fdx _7xo.frq _7u6.frq
> _7xs.prx _7xr.prx _7o1.tis _7xt.nrm _7xp.tii _7xo.tis _7u6.fdt _7xo.fdx
> _7o1.fnm segments.gen
>at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
> s.java:516)
>at
> org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:243)
>at
> org.apache.lucene.index.IndexWriter.init(IndexWriter.java:616)
>at
> org.apache.lucene.index.IndexWriter.(IndexWriter.java:410)
>at
> org.apache.solr.update.SolrIndexWriter.< ;init>(SolrIndexWriter.java
> :97)
>at
> org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler
> .java:121)
>at
> org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandl
> er2.java:189)
>at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.
> java:267)
>at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdate
> ProcessorFactory.java :67)
>at
> org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateR
> equestHandler.java:196)
>at
> org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpdate
> RequestHandler.java :386)
>at
> org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:
> 57)
>
>
> 
>
> If I restart solr I've got a NullPointerException in DispatchFilter
>
> tested with solr 1.2 and 1.3 , the behaviour is the same
>
> Regards
>
> Florent BEAUCHAMP
>



--


no segments* file found

2007-11-12 Thread SDIS M. Beauchamp
I'm using solr to index our files servers ( 480K files ) 
 
If I don't optimize, I 've got a too many files open at about 450K files
and 3 Gb index
 
If i optimize I've got this stacktrace during the commit of all the
following update
 
java.io.FileNotFoundException: no segments* file
found in
org.apache.lucene.store.FSDirectory@/root/trunk/example/solr/data/index:
files: _7xr.tis _7xt.fdt _7o1.tii _7xq.tis _7xn.nrm _7ws.fdt _7xt.prx
_7xp.nrm _7ws.nrm _7xo.nrm _7ws.tis _7xs.fdt _7vc.fnm _7u6.tis _7vx.fnm
_7vx.frq _7xs.nrm _7xn.tis _7xq.frq _7xs.tis _7xq.prx _7vx.fdx _7ur.tii
_7ur.frq _7xq.fnm _7xr.nrm _7vc.fdt _7xt.frq _7xp.fdx _7ws.prx _7xs.frq
_7xo.prx _7xq.nrm _7vx.tii _7vx.prx _7xq.tii _7xs.fnm _7xs.tii _7ws.tii
_7xt.fdx _7vc.nrm _7vc.prx _7vc.tis _7xq.fdt _7ur.prx _7xn.fdx _7xp.frq
_7vx.nrm _7ur.fdt _7xr.fnm _7ws.fdx _7u6.tii _7xr.tii _7vc.frq _7vx.tis
_7xp.fdt _7xr.frq_7ur.tis _7xp.prx _7xr.fdx _7xt.fnm _7xn.tii _7vc.fdx
_7xo.fdt _7u6.fnm _7xn.frq _7xp.tis _7o1.frq _7xn.prx _7ur.fdx _7ur.fnm
_7o1.fdx _7xs.fdx _7xn.fdt _7xt.tis _7xp.fnm _7xo.fnm _7xn.fnm _7u6.prx
_7xq.fdx _7xo.tii _7ws.fnm _7vc.tii _7o1.prx _7xr.fdt _7o1.fdt _7ur.nrm
_7ws.frq _7u6.nrm _7o1.nrm _7vx.fdt _7xt.tii _7u6.fdx _7xo.frq _7u6.frq
_7xs.prx _7xr.prx _7o1.tis _7xt.nrm _7xp.tii _7xo.tis _7u6.fdt _7xo.fdx
_7o1.fnm segments.gen
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfo
s.java:516)
at
org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:243)
at
org.apache.lucene.index.IndexWriter.init(IndexWriter.java:616)
at
org.apache.lucene.index.IndexWriter.(IndexWriter.java:410)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java
:97)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler
.java:121)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandl
er2.java:189)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.
java:267)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdate
ProcessorFactory.java:67)
at
org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateR
equestHandler.java:196)
at
org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpdate
RequestHandler.java:386)
at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:
57)
   
 

 
If I restart solr I've got a NullPointerException in DispatchFilter
 
tested with solr 1.2 and 1.3 , the behaviour is the same 
 
Regards
 
Florent BEAUCHAMP