date:20101110


Hi List,

in one of our application's use-case scenarios we create a response from
different data sources.
In clear words: We combine different responses from different data sources
(SQL, another Webservice and Solr) to one response.

We would cache this information per request for a couple of minutes or hours
outside of solr, since the data to cache does not come only from solr
itself.

However, I am not sure whether it would make sense to disable Solr's
internal cache-mechanisms or at last which cache-mechanisms I can disable,
because I am not sure what are the impacts of each cache in the long run.

A query is usually type of dismax and uses some functionQueries.
We do not sort, but we may use some filterQueries.

Furthermore we retrive just one of up to 10 (stored) fields from our index.
Most of the time it will be the same field (95-98% of the requests).

I think using the filterCache makes sense, but what about documentCache and
the others?
Since I retrive in 95-98% of all cases the same field from our stored
documents, how can I boost retriving that information?

Thank you!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1875289.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to Facet on a price range

2010-11-10 Thread gwk


On 11/9/2010 7:32 PM, Geert-Jan Brits wrote:

when you drag the sliders , an update of how many results would match is
immediately shown. I really like this. How did you do this? IS this
out-of-the-box available with the suggested Facet_by_range patch?


Hi,

With the range facets you get the facet counts for every discrete step 
of the slider, these values are requested in the AJAX request whenever 
search criteria change and then someone uses the sliders we simply check 
the range that is selected and add the discrete values of that range to 
get the expected amount of results. So yes it is available, but as Solr 
is just the search backend the frontend stuff you'll have to write yourself.


Regards,

gwk

Re: How to Facet on a price range

2010-11-10 Thread Geert-Jan Brits

Ah I see: like you said it's part of the facet range implementation.
Frontend is already working, just need the 'update-on-slide' behavior.

Thanks
Geert-Jan

2010/11/10 gwk g...@eyefi.nl

 On 11/9/2010 7:32 PM, Geert-Jan Brits wrote:

 when you drag the sliders , an update of how many results would match is
 immediately shown. I really like this. How did you do this? IS this
 out-of-the-box available with the suggested Facet_by_range patch?


 Hi,

 With the range facets you get the facet counts for every discrete step of
 the slider, these values are requested in the AJAX request whenever search
 criteria change and then someone uses the sliders we simply check the range
 that is selected and add the discrete values of that range to get the
 expected amount of results. So yes it is available, but as Solr is just the
 search backend the frontend stuff you'll have to write yourself.

 Regards,

 gwk

Re: scheduling imports and heartbeats

2010-11-10 Thread Tri Nguyen

i'm looking for another solution other than cron job.

can i configure solr to schedule imports?

From: Ranveer Kumar ranveer.s...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 8:13:03 PM
Subject: Re: scheduling imports and heartbeats

You should use cron for that..

On 10 Nov 2010 08:47, Tri Nguyen tringuye...@yahoo.com wrote:

Hi,

Can I configure solr to schedule imports at a specified time (say once a
day,
once an hour, etc)?

Also, does solr have some sort of heartbeat mechanism?

Thanks,

Tri

Re: Using Multiple Cores for Multiple Users

2010-11-10 Thread Jan Høydahl / Cominvent

Hi,

If your index is supposed to handle only public information, i.e. public RSS 
feeds, then I don't see a need for multiple cores.

I would probably try to handle this on the query side only. Imagine this 
scenario:

User A registers RSS-X and RSS-Y (the application starts pulling and indexing 
these feeds)
User B registers RSS-Z (the application starts pulling feed Z)
User C registers RSS-X and RSS-Z (the application does nothing, as these are 
already being indexed)

When searching, add a filter to each user's queries. Solr will handle MANY 
terms in such a filter, and it is not likely that a human user subscribes to 
more than say a few 100 feeds.

So for user C, the query would look like .../solr/select?q=foo 
barfq=feedID:(RSS-X OR RSS-Z)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 10. nov. 2010, at 03.00, Adam Estrada wrote:

 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...
 
 Adam
 
 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 
 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.
 
 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.
 
 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.
 
 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.
 
 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users
 
 Hi,
 
 All,
 
 I have a web application that requires the user to register and then
 login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after
 each
 core is added to the multi-core xml file, right?
 
 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].
 
 
 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?
 
 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.
 
 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.
 
 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.
 
 
 What is the best approach for this kind of thing?
 
 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless the demands are too
 specific.
 
 
 Thanks in advance,
 Adam
 
 [1]: http://wiki.apache.org/solr/CoreAdmin
 
 Cheers

Re: facetting when using field collapsing

2010-11-10 Thread Lukas Kahwe Smith


On 07.11.2010, at 20:13, Lukas Kahwe Smith wrote:

 Hi,
 
 I am pondering making use of field collapsing. I am currently indexing 
 clauses (sections) inside UN documents:
 http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=clause
 
 Now since right now my data set is still fairly small I am doing field 
 collapsing in userland:
 http://resolutionfinder.org/search/unifiedResults?q=africa=t[22]=medicationdc=st=document
 
 However while this works alright (not ideal, since I am fetching essentially 
 the entire result set and not paged as for clauses) etc, I still have no idea 
 how to get the facet filters to display the right counts. So I am wondering 
 if field collapsing in its current form supports faceting, since its not 
 mentioned on the wiki page:
 http://wiki.apache.org/solr/FieldCollapsing


The above wiki page seems to be out of date. Reading the comments in 
https://issues.apache.org/jira/browse/SOLR-236 it seems like group should be 
replaced with collapse.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

RE: Output Search Result in ADD-XML-Format

2010-11-10 Thread Dyer, James

I'm not sure, but SOLR-1499 might have what you want.

https://issues.apache.org/jira/browse/SOLR-1499

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] 
Sent: Wednesday, November 10, 2010 5:59 AM
To: solr-user@lucene.apache.org
Subject: Output Search Result in ADD-XML-Format

Dear all,

my use case is:

Creating an index using DIH where the sub-entity is querying another
SOLR index for more fields.
As there is a very convenient attribute useSolrAddSchema that would
spare me to list all the fields I want to add from the other index, I'm
looking for a way to get the search results in the ADD format directly.

Before starting on the XSLT file that would transform the regular SOLR
result into an SOLR update xml, I just wanted to ask whether there
already exists a solution for this. Maybe I missed some request handler
that already returns the result in update format?

Thanks!
Chantal

Re: Next Word - Any Suggestions?

2010-11-10 Thread Sean O'Connor

Hi Christopher,
I am working my way through trying to implement SpanQueries in Solr
(svn trunk). From my lack of progress, I am skeptical that I can help
much, but I would be happy to try.

I imagine you have already found (either before your message, or
after posting it) Grant's lucene, spanquery, and WindowTermVectorMapper
overview:

http://www.lucidimagination.com/blog/2009/05/26/accessing-words-around-a-positional-match-in-lucene/

I'd be interested in hearing about your progress.
Good luck

Sean

On 10/26/2010 08:26 AM, Christopher Ball wrote:

Am about to implement a custom query that is sort of mash-up of Facets,
Highlighting, and SpanQuery - but thought I'd see if anyone has done
anything similar.

In simple words, I need facet on the next word given a target word.

For example, if my index only had the following 5 documents (comprised of a
sentence each):

Doc 1 - The quick brown fox jumped over the fence.

Doc 2 - The sly fox skipped over the fence.

Doc 3 - The fat fox skipped his afternoon class.

Doc 4 - A brown duck and red fox, crashed the party.

Doc 5 - Charles Brown! Fox! Crashed my damn car.

The query should give the frequency of the distinct terms after the word
fox:

skipped - 2

crashed - 2

jumped - 1

Long-term, do the opposite - frequency of the distinct terms before the word
fox:

brown - 2

sly - 1

fat - 1

red - 1

My guess is that either the FastVectorHighlighter or SpanQuery would be a
reasonable starting point. I was hoping to take advantage of Vectors as I am
storing termVectors, termPositions, and termOffsets for the field in
question.

Grateful for any thoughts . . . reference implementations . . . words of
encouragement . . . free beer - whatever you can offer.

Gracias,

Christopher

SpanQuery basics in Solr QueryComponent(?)

2010-11-10 Thread Sean O'Connor


Hi all,
I seem to be lost in the new flex indexing api. In the older api I 
was able to extend QueryComponent with my custom component, parse a 
restricted-syntax user query into a SpanQuery, and then grab an 
IndexReader. From there I worked with the spanquery's spans. For a bit 
of reference my old QueryComponent code looks something like:


 @Override
public void process(ResponseBuilder rb) throws IOException {
SolrQueryRequest req = rb.req;
SolrQueryResponse rsp = rb.rsp;
SDRQParser qparser = (SDRQParser) rb.getQparser();

SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
// custom parser returns SpanQuery

IndexReader reader = req.getSearcher().getReader();
Spans spans = stq.getSpans(reader);
// work with spans here...

}

With the new (1.5?) api, I got the warning about wrapping 
IndexReader with SlowMultiReaderWrapper, so I changed my approach above 
to something like:


 SolrIndexReader fullReader = req.getSearcher().getReader();
 IndexReader reader = SlowMultiReaderWrapper.wrap(fullReader);
// need help avoiding this...?


I then got a NPE on what seems to be EmptyTerms.toString(). For 
kicks, I noticed that EmpytyTerms did not override its parent 
(TermSpans) toString() method, which seemed to be the cause of the 
problems. Overriding that, fixed the NPE, and now I get results (so I 
will look at filing a bug report unless someone mentions otherwise).


Any hints on how I can/should 'properly' work with spans in solr? 
Also, are there any introductory documents to the MultiFields and 
sub-indexes stuff? Particularly how to implement MultiFields as a better 
approach to SlowMultiReaderWrapper (thanks for the warnings about 
performance). I cannot seem to find the relevant beginner material to 
avoid using the SMRW. The material I do find seems to require that you 
pass in a 'found' document, or perhaps walk through all subReaders?


And finally: should I be looking at some existing Solr code to lead 
guide me? I am having trouble finding the highlighter code which I 
believe uses spans (WeightedSpanTerm??). Is there already code to 
convert user queries to span queries?

Thanks,

Sean

Re: Using Multiple Cores for Multiple Users

2010-11-10 Thread Shalin Shekhar Mangar

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
estrada.adam.gro...@gmail.comwrote:

 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...


I think it is customary for me to mention the techniques mentioned in
LotsOfCores for these kind of questions. The patches are mostly useless at
this point but if you are looking for a per-user solution, you will need
most of the tricks mentioned on the wiki page.

http://wiki.apache.org/solr/LotsOfCores

-- 
Regards,
Shalin Shekhar Mangar.

Re: To cache or to not cache


Thank you Shalin.
Yes, both - Solr and some other applications could possible run on the same
box.
I hoped that not storing redundantly in Solr and somewhere else in the RAM
would not touch Solr's performance very much. 

Just to understand Solr'c caching mechanism:

My first query is red firefox - all caches were turned on.
If I am searching now for red star, does this query makes any usage from
the cache, since both share the term red?

Kind regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1876767.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Output Search Result in ADD-XML-Format

2010-11-10 Thread Chantal Ackermann

Thank you, James. I was looking for something like that (and I remember
having stumbled over it, in the past, now that you mention it).

I've created an xslt file that transforms the regular result to an
update xml document. Seeing that the SolrEntityProcessor is still in
development, I will stick to the XSLT solution while we are still using
1.4 but I will add a note that with the new release we should try this
SolrEntityProcessor.

(Reading through the JIRA issue I'm not sure whether I can simply get
all fields from the other index and dump them into the index which is
being built. With the XSLT + useSolrAddSchema solution this works just
fine without the need to list all the fields. I should try that before
the next solr release to be able to give some feedback.)

Thanks!
Chantal


On Wed, 2010-11-10 at 15:13 +0100, Dyer, James wrote:
 I'm not sure, but SOLR-1499 might have what you want.
 
 https://issues.apache.org/jira/browse/SOLR-1499
 
 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: Chantal Ackermann [mailto:chantal.ackerm...@btelligent.de] 
 Sent: Wednesday, November 10, 2010 5:59 AM
 To: solr-user@lucene.apache.org
 Subject: Output Search Result in ADD-XML-Format
 
 Dear all,
 
 my use case is:
 
 Creating an index using DIH where the sub-entity is querying another
 SOLR index for more fields.
 As there is a very convenient attribute useSolrAddSchema that would
 spare me to list all the fields I want to add from the other index, I'm
 looking for a way to get the search results in the ADD format directly.
 
 Before starting on the XSLT file that would transform the regular SOLR
 result into an SOLR update xml, I just wanted to ask whether there
 already exists a solution for this. Maybe I missed some request handler
 that already returns the result in update format?
 
 Thanks!
 Chantal

Re: To cache or to not cache


Em wrote:

My first query is red firefox - all caches were turned on.
If I am searching now for red star, does this query makes any usage from
the cache, since both share the term red?
  

I don't believe it does, no.

I understand your question -- if your caching things externally anyway, 
do you need caches in Solr, or is that just redundant?


The answer is kind of complicated though -- maybe, maybe not.  In some 
cases having too small Solr caches will make your Solr performance 
really bad --  if you want to page through Solr results, for instance, 
the document cache is going to be important. In fact, if Solr can't hold 
enough for the _current page_ in the cache, that's going to mess up Solr 
even more, and even returning a single request, Solr functions that want 
to look at the documents are (in some cases) going to keep retreiving 
them over and over again, instead of getting them from the cache -- even 
within a single Solr request-response.


I could be wrong about some of those details, this is me kind of 
hand-waving because I'm not an expert at this stuff. I know just enough 
to try not to be dangerous (ha), meaning that I am pretty sure that you 
can't issue a blanket yeah, get rid of Solr caches in your 
circumstance.  There are probably some caches you can make (much) 
smaller , but it requires kind of complicated Solr-fu to understand 
which those are.


You could certainly keep your caches fairly small, and see what happens, 
do some benchmarking.


Jonathan

Re: To cache or to not cache

2010-11-10 Thread Shalin Shekhar Mangar

On Wed, Nov 10, 2010 at 7:51 AM, Em mailformailingli...@yahoo.de wrote:


 Thank you Shalin.
 Yes, both - Solr and some other applications could possible run on the same
 box.
 I hoped that not storing redundantly in Solr and somewhere else in the RAM
 would not touch Solr's performance very much.

 Just to understand Solr'c caching mechanism:

 My first query is red firefox - all caches were turned on.
 If I am searching now for red star, does this query makes any usage from
 the cache, since both share the term red?


Well, we can assume that some documents will be common so the documentCache
will be hit. If you are using a sort on fields or function queries, the
fieldCache built by lucene (not configurable) will be used. If there are any
common fq clauses, those will hit the filterCache. Apart from that, it is
difficult to say unless we know the field types and the parsed query.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Highlighter - multiple instances of term being combined

2010-11-10 Thread Sasank Mudunuri

Ahh this reconfirms. The analyzers are properly pulling things apart. There
are two instances of the query keyword with words between them. But from
your last comment, it sounds like the system's not trying to do any sort of
phrase highlighting, but is just hitting a weird edge case? I'm seeing this
behavior somewhat commonly, so I thought for sure there must be some option
that says if two highlighted words are sufficiently close together,
highlight them as a single phrase.

On Tue, Nov 9, 2010 at 7:11 PM, Lance Norskog goks...@gmail.com wrote:

 Have you looked at solr/admin/analysis.jsp? This is 'Analysis' link
 off the main solr admin page. It will show you how text is broken up
 for both the indexing and query processes. You might get some insight
 about how these words are torn apart and assigned positions. Trying
 the different Analyzers and options might get you there.

 But to be frank- highlighting is a tough problem and has always had a
 lot of edge cases.

 On Tue, Nov 9, 2010 at 6:08 PM, Sasank Mudunuri sas...@gmail.com wrote:
  I'm finding that if a keyword appears in a field multiple times very
 close
  together, it will get highlighted as a phrase even though there are other
  terms between the two instances. So this search:
 
  http://localhost:8983/solr/select/?
 
  hl=true
  hl.snippets=1
  q=residue
  hl.fragsize=0
  mergeContiguous=false
  indent=on
  hl.usePhraseHighlighter=false
  debugQuery=on
  hl.fragmenter=gap
  hl.highlightMultiTerm=false
 
  Highlights as:
  What does low-emresidue mean? Like low-residue/em diet?
 
  Trying to get it to highlight as:
  What does low-emresidue/em mean? Like low-emresidue/em diet?
  I've tried playing with various combinations of mergeContiguous,
  highlightMultiTerm, and usePhraseHighlighter, but they all yield the same
  output.
 
  For reference, field type uses a StandardTokenizerFactory and
  SynonymFilterFactory, StopFilterFactory, StandardFilterFactory and
  SnowballFilterFactory. I've confirmed that the intermediate words don't
  appear in either the synonym or the stop words list. I can post the full
  definition if helpful.
 
  Any pointers as to how to debug this would be greatly appreciated!
  sasank
 



 --
 Lance Norskog
 goks...@gmail.com

Is there a way to create multiple doc using DIH and access the data pertaining to a particular doc name ?

2010-11-10 Thread bbarani

Hi,

I have a peculiar situation where we are trying to use SOLR for indexing
multiple tables (There is no relation between these tables). We are trying
to use the SOLR index instead of using the source tables and hence we are
trying to create the SOLR index as that of source tables.

There are 3 tables which needs to be indexed.

Table 1, table 2 and table 3.

I am trying to index each table in seperate doc tag with different doc tag
name and each table has some of the common field names. For Ex:

document name=DataStoreElement
entity name=DataStoreElement query=
field column=DATA_STOR name=DATA_STO/
/entity
/document
document name=DataStore
entity name=DataStore query=
field column=DATA_STOR name=DATA_STO/
/entity
/document

After indexing is complete, I am interested in searching the DATA_STO
present under a particular document(not from both documents). something like
/search?q=test AND documentname:DataStoreElement

Is it possible to do this using DIH in SOLR?

My current approach is to manipulate the source field names to unique names
during indexing.

One more approach would be to have multi core setup and index each tables
seperately in different core..

Please let me know if there is any other suggestion for this issue.

Thanks,
Barani

--
View this message in context:
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-create-multiple-doc-using-DIH-and-access-the-data-pertaining-to-a-particular-doc-n-tp1877203p1877203.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: spell check vs terms component

2010-11-10 Thread bbarani


Shalin / Ken,

Thanks a lot for your suggestions ..I havent tried NGrams filter.. I will
try that too..


Thanks,
Barani
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/spell-check-vs-terms-component-tp1870214p1877233.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache


Jonathan,

sound like it makes sense.
In this case I think it is more important to size the external cache very
well, instead of Solr's.

Even when 1/5th of the requests are redundant, an external cache could not
answer the other 4/5ths and so decreasing Solr's cache would slow down the
whole application.

Since this is only a conceptual question, I really do not have got any
benchmark - data.
But if I have some, I will ask if it was possible to publish them.

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/To-cache-or-to-not-cache-tp1875289p1877245.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: AW: Some issues concerning SOLR 1.4.1

2010-11-10 Thread debtaylor


I know this post was a while ago, but it's the only one I've found which
exactly matches what we're seeing with our solr application.  We recently
upgraded to 1.4.1 and all of the issues you have listed are happening to us. 
Did you find a solution?  Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Some-issues-concerning-SOLR-1-4-1-tp930063p1877247.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr optimize operation slows my MySQL serveur

2010-11-10 Thread Skreo


Hello,

I've a Solr index with few millions of entries, so an optimize usually
takes several minutes (or at least tens of seconds). During the optimize
process, there is free RAM and the load average stays under 1.
But... the machine also hosts a MySQL server, and during the optimize
process, there are a lot of slow mysql queries (4 to 10 seconds), without
apparent reason.

I don't understand why Solr interferes with MySQL in my case. Do you have an
idea ?

Thank you !

Godefroy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-optimize-operation-slows-my-MySQL-serveur-tp1877270p1877270.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: To cache or to not cache

You know, on further reflection, I'd suggest you think (and ideally 
measure) hard about whether you even need this application-level 
solr-data-cache.


Solr is a caching machine, it's kind of what Solr does, one of the main 
focuses of Solr. A query to Solr that hits the right caches comes back 
amazingly fast.  With properly turned Solr caches for your use, and 
sufficient RAM to hold them (possibly less than you think, Solr is 
pretty efficient), I'm not sure you're going to get any benefit at all 
from trying to write your own extra cache on top of Solr.


Em wrote:

Jonathan,

sound like it makes sense.
In this case I think it is more important to size the external cache very
well, instead of Solr's.

Even when 1/5th of the requests are redundant, an external cache could not
answer the other 4/5ths and so decreasing Solr's cache would slow down the
whole application.

Since this is only a conceptual question, I really do not have got any
benchmark - data.
But if I have some, I will ask if it was possible to publish them.

Regards

Re: To cache or to not cache

PS: There's also, I think, a way to turn on HTTP-level caching for Solr, 
which I believe is caching of entire responses that match an exact Solr 
query, filled without actually touching Solr at all. But I'm not sure 
about this, because I'm always trying to make sure this HTTP-level cache 
is turned off because it messes me up, rather than looking into the 
details of it.


In general, I doubt you are going to come up with any external caches 
that work better for Solr content than the caches in Solr itself, the 
product of hundreds of developer hours of work focused on Solr 
specifically.


Jonathan Rochkind wrote:
You know, on further reflection, I'd suggest you think (and ideally 
measure) hard about whether you even need this application-level 
solr-data-cache.


Solr is a caching machine, it's kind of what Solr does, one of the main 
focuses of Solr. A query to Solr that hits the right caches comes back 
amazingly fast.  With properly turned Solr caches for your use, and 
sufficient RAM to hold them (possibly less than you think, Solr is 
pretty efficient), I'm not sure you're going to get any benefit at all 
from trying to write your own extra cache on top of Solr.


Em wrote:
  

Jonathan,

sound like it makes sense.
In this case I think it is more important to size the external cache very
well, instead of Solr's.

Even when 1/5th of the requests are redundant, an external cache could not
answer the other 4/5ths and so decreasing Solr's cache would slow down the
whole application.

Since this is only a conceptual question, I really do not have got any
benchmark - data.
But if I have some, I will ask if it was possible to publish them.

Regards

Re: Solr optimize operation slows my MySQL serveur

2010-11-10 Thread Jay Ess


On 2010-11-10 18:08, Skreo wrote:

Hello,

I've a Solr index with few millions of entries, so an optimize usually
takes several minutes (or at least tens of seconds). During the optimize
process, there is free RAM and the load average stays under 1.
But... the machine also hosts a MySQL server, and during the optimize
process, there are a lot of slow mysql queries (4 to 10 seconds), without
apparent reason.

I don't understand why Solr interferes with MySQL in my case. Do you have an
idea ?

Thank you !

Memory and disk bandwidth spikes when SolR optimizes an index.

Re: Solr optimize operation slows my MySQL serveur

2010-11-10 Thread Skreo


I just made a test :

Before the optimize :
# w
sk...@gedeon:~$ w
 18:55:43 up 22 days, 22:27,  4 users,  load average: 0,07, 0,02, 0,00
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT

# iostat 10
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   0,820,000,350,040,00   98,79

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  15,10 4,80   249,60 48   2496
sda1  3,80 0,0044,00  0440
sda2  8,70 4,80   205,60 48   2056
sda3  0,00 0,00 0,00  0  0
sdc  14,90 3,20   249,60 32   2496
sdc1  3,80 0,0044,00  0440
sdc2  8,50 3,20   205,60 32   2056
sdc3  0,00 0,00 0,00  0  0
sdb   0,00 0,00 0,00  0  0
sdb1  0,00 0,00 0,00  0  0
sdd   0,00 0,00 0,00  0  0
sdd1  0,00 0,00 0,00  0  0
md0   0,00 0,00 0,00  0  0
md2  19,70 8,00   189,60 80   1896
md1   5,10 0,0040,80  0408
dm-0 17,10 8,00   189,60 80   1896
dm-1  0,00 0,00 0,00  0  0


During the optimize :
# w
 18:57:07 up 22 days, 22:29,  4 users,  load average: 1,10, 0,25, 0,08  


   
USER TTY  FROM  LOGIN@   IDLE   JCPU   PCPU WHAT

# iostat 10
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  12,730,000,570,040,00   86,66

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  13,50 4,00   318,40 40   3184
sda1  0,90 0,0015,20  0152
sda2  9,60 4,00   303,20 40   3032
sda3  0,00 0,00 0,00  0  0
sdc  13,40 3,20   318,40 32   3184
sdc1  0,90 0,0015,20  0152
sdc2  9,50 3,20   303,20 32   3032
sdc3  0,00 0,00 0,00  0  0
sdb   0,00 0,00 0,00  0  0
sdb1  0,00 0,00 0,00  0  0
sdd   0,00 0,00 0,00  0  0
sdd1  0,00 0,00 0,00  0  0
md0   0,00 0,00 0,00  0  0
md2  23,10 7,20   285,60 72   2856
md1   1,50 0,0012,00  0120
dm-0 19,50 7,20   280,80 72   2808
dm-1  0,60 0,00 4,80  0 48

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   6,890,002,05   10,140,00   80,92

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda  65,00 0,00 58689,60  0 586896
sda1  1,00 0,0016,00  0160
sda2 62,60 0,00 58673,60  0 586736
sda3  0,00 0,00 0,00  0  0
sdc  59,60 0,80 53568,00  8 535680
sdc1  0,80 0,0014,40  0144
sdc2 57,40 0,80 53553,60  8 535536
sdc3  0,00 0,00 0,00  0  0
sdb   0,00 0,00 0,00  0  0
sdb1  0,00 0,00 0,00  0  0
sdd   0,00 0,00 0,00  0  0
sdd1  0,00 0,00 0,00  0  0
md0   0,00 0,00 0,00  0  0
md2   13822,90 0,80110571,20  81105712
md1   1,70 0,0013,60  0136
dm-0  9,30 0,8073,60  8736
dm-1  13812,20 0,00110497,60  01104976

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   2,230,001,40   28,110,00   68,26

Device:tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda 114,10 4,80106738,40 48

Best practice for emailing this list?

How do people email this list without getting spam filter problems?

Search with accent

Hi all,

Somebody knows how can I config my solr to make searches with and without
accents?

for example:

pereque and perequê


When I do it I need the same result, but its not working.

tks
--

Re: Solr optimize operation slows my MySQL serveur

2010-11-10 Thread Shawn Heisey


On 11/10/2010 11:00 AM, Skreo wrote:

I just made a test :


snip


avg-cpu:  %user   %nice %system %iowait  %steal   %idle
2,230,001,40   28,110,00   68,26


Your iowait percentage for that 10 second interval was 28%, which is 
pretty high.  Solr has to make a complete copy of the index, which means 
a lot of disk I/O.  Optimizing an index involves more than just copying 
it, though - Solr is processing every document in the old index and 
writing it into the new one.  This is even more load on the CPU.


Databases like MySQL are also resource intensive, especially in the I/O 
department.  Unless you have enough RAM to cache your MySQL databases as 
well as your entire Solr index, you're always going to run into this 
problem.


It's strongly recommended that you put Solr on dedicated hardware.  If 
you asked about this on the MySQL list, I imagine they'd make the same 
recommendation regarding their software.


Shawn

Re: scheduling imports and heartbeats


: References: 4cd8fb5a.9040...@srce.hr 001701cb7fe2$58abc660$0a0353...@com
: aanlkti=zuypu4d5q3znmob8vst8zezxh9p+cbsalz...@mail.gmail.com
: 4cd962b0.2090...@srce.hr
: aanlktimvmn7rvjks8cqdkeidyt7dr5qzs5ji4xpdu...@mail.gmail.com
: 4cd9ae1e.8080...@srce.hr
: aanlktinje_hqrfqf8uu_1k5dd9ejauc7wigmbeh13...@mail.gmail.com
: Subject: scheduling imports and heartbeats
: In-Reply-To: aanlktinje_hqrfqf8uu_1k5dd9ejauc7wigmbeh13...@mail.gmail.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

Re: Best practice for emailing this list?


Hi robo,

try to send eMail in plain-text format.
This often helps a lot!

Regards
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-practice-for-emailing-this-list-tp1877693p1877792.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Best practice for emailing this list?

I tried that as well but the original email I was trying to send about
replication and load balancing was still being marked as spam (5.8 is
above threshold).  That is when I thought I would try a very simple
email such as this one.

Is there a list of keywords to avoid?

On Wed, Nov 10, 2010 at 10:26 AM, Em mailformailingli...@yahoo.de wrote:

 Hi robo,

 try to send eMail in plain-text format.
 This often helps a lot!

 Regards
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Best-practice-for-emailing-this-list-tp1877693p1877792.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Custom Request Handler

2010-11-10 Thread Paige Cook

I was reading the in Solr Wiki about creating request handlers -
http://wiki.apache.org/solr/SolrRequestHandler and saw that there are two
different ways to create a handler:

1. Define as requestHandler name=/baz
class=my.package.AnotherCustomRequestHandler  and call via
http://localhost:8983/baz/?..
2. Define as requestHandler name=baz
class=my.package.AnotherCustomRequestHandler  and call via
http://localhost:8983/select/?qt=baz...

So I was wondering one way is preferred over the other?

Thanks,
Paige Cook

Re: Best practice for emailing this list?

2010-11-10 Thread Ken Stanley

On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote:
 How do people email this list without getting spam filter problems?


Depends on which side of the spam filter that you're referring to.
I've found that to keep these emails from entering my spam filter is
to add a rule to Gmail that says Never send to spam. As for when I
send emails, I make sure that I send my emails as plain text to avoid
getting bounce backs.

- Ken

Re: Sorting and filtering on fluctuating multi-currency price data?


: ExternalFileField can only be used for boosting. It is not a
: first-class field.

correct, but it can be used in a FunctionQuery, which means it can be 
filtered on (using frange) and (on trunk) it can be sorted on, which are 
the two needs the OP asked about...

:  : Another approach would be to use ExternalFileField and keep the price 
data,
:  : normalized to USD, outside of the index. Every time the currency rates
:  : changed, we would calculate new normalized prices for every document in 
the
:  : index.

-Hoss

Re: Best practice for emailing this list?

No matter how much I limit my other email it will not get through the
Solr mailing spam filter.  This has to be the most frustrating mailing
list I have ever tried to work with.  All I need are some answers on
replication and load balancing but I can't even get it to the list.


On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote:
 On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote:
 How do people email this list without getting spam filter problems?


 Depends on which side of the spam filter that you're referring to.
 I've found that to keep these emails from entering my spam filter is
 to add a rule to Gmail that says Never send to spam. As for when I
 send emails, I make sure that I send my emails as plain text to avoid
 getting bounce backs.

 - Ken

Re: Search with accent

I don't understand, when the user search for perequê you want the results for 
perequê and pereque?

If thats the case, any field type with ISOLatin1AccentFilterFactory should 
work. 
The accent should be removed at index time and at query time (Make sure the 
filter is being applied on both cases).

Tomás






De: Claudio Devecchi cdevec...@gmail.com
Para: Lista Solr solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 15:16:24
Asunto: Search with accent

Hi all,

Somebody knows how can I config my solr to make searches with and without
accents?

for example:

pereque and perequê


When I do it I need the same result, but its not working.

tks
--

Chinese characters - a little OT

2010-11-10 Thread Tod


Sorry, OT but its driving me nuts.

I've indexed a document with chinese characters in its title.  When I 
perform the search (that returns json) I get back the title and using 
Javascript place it into a variable that ultimately ends up as a 
dropdown of titles to choose from.  The problem is the title contains 
the literal unicode representation of the chinese characters (#20013; 
for example).


Here's the javascript:

 var optionObj=document.createElement('option');

 menuItem=titleArray[1].title;
 menuVal=titleArray[1].url;

 if((menuItem !=  )(menuItem != )(menuItem != null))
  {
   optionObj.appendChild(document.createTextNode(menuItem));
   optionObj.setAttribute('id',optId + optCnt);
   optionObj.setAttribute('target',_blank);
   optionObj.setAttribute('value',menuVal);
   optCnt++;
   selectObj.appendChild(optionObj);
  }

My hunch is I should utf-8 encode the title and then try and display the 
result but its nor working.  I still am seeing the unicode characters.


Does anyone see what I could be doing wrong?

TIA - Tod

Re: Best practice for emailing this list?

2010-11-10 Thread Ezequiel Calderara

Mmmm maybe its your mail address? :P
Weird, i didn't have any problem with it using gmail...

Send in plain text, avoid links or links... maybe that could work...

If you want, send me the mail and i will forward it to the list, just to
test!

On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote:

 No matter how much I limit my other email it will not get through the
 Solr mailing spam filter.  This has to be the most frustrating mailing
 list I have ever tried to work with.  All I need are some answers on
 replication and load balancing but I can't even get it to the list.


 On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote:
  On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote:
  How do people email this list without getting spam filter problems?
 
 
  Depends on which side of the spam filter that you're referring to.
  I've found that to keep these emails from entering my spam filter is
  to add a rule to Gmail that says Never send to spam. As for when I
  send emails, I make sure that I send my emails as plain text to avoid
  getting bounce backs.
 
  - Ken
 




-- 
__
Ezequiel.

Http://www.ironicnet.com

Adding new field after data is already indexed

2010-11-10 Thread gauravshetti


Hi,
 
 I had a few questions regarding Solr.
Say my schema file looks like
field name=folder_id type=long indexed=true stored=true/
field name=indexed type=boolean indexed=true stored=true/

and i index data on the basis of these fields. Now, incase i need to add a
new field, is there a way i can add the field without corrupting the
previous data. Is there any feature which adds a new field with a default
value to the existing records.


2) Is there any security mechanism/authorization check to prevent url like
/admin and /update to only a few users.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-already-indexed-tp1862575p1862575.html
Sent from the Solr - User mailing list archive at Nabble.com.

Dynamic creating of cores in solr

2010-11-10 Thread Nizan Grauer

Hi,

I'm not sure this is the right mail to write to, hopefully you can help or 
direct me to the right person

I'm using solr - one master with 17 slaves in the server and using solrj as the 
java client

Currently there's only one core in all of them (master and slaves) - only the 
cpaCore.

I thought about using multi-cores solr, but I have some problems with that.

I don't know in advance which cores I'd need -

When my java program runs, I call for documents to be index to a certain url, 
which contains the core name, and I might create a url based on core that is 
not yet created. For example:

Calling to index - http://localhost:8080/cpaCore  - existing core, everything 
as usual
Calling to index -  http://localhost:8080/newCore - server realizes there's no 
core newCore, creates it and indexes to it. After that - also creates the new 
core in the slaves
Calling to index - http://localhost:8080/newCore  - existing core, everything 
as usual

What I'd like to have on the server side to do is realize by itself if the 
cores exists or not, and if not  - create it

One other restriction - I can't change anything in the client side - calling to 
the server can only make the calls it's doing now - for index and search, and 
cannot make calls for cores creation via the CoreAdminHandler. All I can do is 
something in the server itself

What can I do to get it done? Write some RequestHandler? REquestProcessor? Any 
other option?

Thanks, nizan

Re: Best practice for emailing this list?

2010-11-10 Thread Ezequiel Calderara

Tried to forward the mail of robomon but had the same error:
Delivery to the following recipient failed permanently:
solr-user@lucene.apache.org
Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient
domain. We recommend contacting the other email provider for further
information about the cause of this error. The error that the other server
returned was: 552 552 spam score (5.8) exceeded threshold (state 18).
- Original message -



On Wed, Nov 10, 2010 at 4:12 PM, Ezequiel Calderara ezech...@gmail.comwrote:

 Mmmm maybe its your mail address? :P
 Weird, i didn't have any problem with it using gmail...

 Send in plain text, avoid links or links... maybe that could work...

 If you want, send me the mail and i will forward it to the list, just to
 test!

   On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote:

 No matter how much I limit my other email it will not get through the
 Solr mailing spam filter.  This has to be the most frustrating mailing
 list I have ever tried to work with.  All I need are some answers on
 replication and load balancing but I can't even get it to the list.


 On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote:
  On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote:
  How do people email this list without getting spam filter problems?
 
 
  Depends on which side of the spam filter that you're referring to.
  I've found that to keep these emails from entering my spam filter is
  to add a rule to Gmail that says Never send to spam. As for when I
  send emails, I make sure that I send my emails as plain text to avoid
  getting bounce backs.
 
  - Ken
 




 --
 __
 Ezequiel.

 Http://www.ironicnet.com http://www.ironicnet.com/




-- 
__
Ezequiel.

Http://www.ironicnet.com

Re: Best practice for emailing this list?

Thanks for all your help Ezequiel.  I cannot see anything in my email
that would make this get marked as spam. Anybody have any ideas on how
to get this fixed so I can email my questions?

robo


On Wed, Nov 10, 2010 at 11:36 AM, Ezequiel Calderara ezech...@gmail.com wrote:
 Tried to forward the mail of robomon but had the same error:
 Delivery to the following recipient failed permanently:
    solr-u...@lucene.apache.org
 Technical details of permanent failure:
 Google tried to deliver your message, but it was rejected by the recipient
 domain. We recommend contacting the other email provider for further
 information about the cause of this error. The error that the other server
 returned was: 552 552 spam score (5.8) exceeded threshold (state 18).
 - Original message -



 On Wed, Nov 10, 2010 at 4:12 PM, Ezequiel Calderara ezech...@gmail.comwrote:

 Mmmm maybe its your mail address? :P
 Weird, i didn't have any problem with it using gmail...

 Send in plain text, avoid links or links... maybe that could work...

 If you want, send me the mail and i will forward it to the list, just to
 test!

   On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote:

 No matter how much I limit my other email it will not get through the
 Solr mailing spam filter.  This has to be the most frustrating mailing
 list I have ever tried to work with.  All I need are some answers on
 replication and load balancing but I can't even get it to the list.


 On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote:
  On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote:
  How do people email this list without getting spam filter problems?
 
 
  Depends on which side of the spam filter that you're referring to.
  I've found that to keep these emails from entering my spam filter is
  to add a rule to Gmail that says Never send to spam. As for when I
  send emails, I make sure that I send my emails as plain text to avoid
  getting bounce backs.
 
  - Ken
 




 --
 __
 Ezequiel.

 Http://www.ironicnet.com http://www.ironicnet.com/




 --
 __
 Ezequiel.

 Http://www.ironicnet.com

Re: scheduling imports and heartbeats

2010-11-10 Thread Tri Nguyen

Thanks for the tip Ken.  I tried that but don't see the importing happening 
when 
I check up on the status.

Below is what's in my dataimport.properties.

#Wed Nov 10 11:36:28 PST 2010
metadataObject.last_index_time=2010-09-20 11\:12\:47
interval=1
port=8080
server=localhost
params=/select?qt\=/dataimportcommand\=full-importclean\=truecommit\=true
webapp=solr
id.last_index_time=2010-11-10 11\:36\:27
syncEnabled=1
last_index_time=2010-11-10 11\:36\:27



 




From: Ken Stanley doh...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, November 10, 2010 4:41:17 AM
Subject: Re: scheduling imports and heartbeats

On Tue, Nov 9, 2010 at 10:16 PM, Tri Nguyen tringuye...@yahoo.com wrote:
 Hi,

 Can I configure solr to schedule imports at a specified time (say once a day,
 once an hour, etc)?

 Also, does solr have some sort of heartbeat mechanism?

 Thanks,

 Tri

Tri,

If you use the DataImportHandler (DIH), you can set up a
dataimport.properties file that can be configured to import on
intervals.

http://wiki.apache.org/solr/DataImportHandler#dataimport.properties_example

As for heartbeat, you can use the ping handler (default is
/admin/ping) to check the status of the servlet.

- Ken

Re: Default file locking on trunk


: There is now a data/index with a write lock file in it. I have not
: attempted to read the index, let alone add something to it.
: I start solr again, and it cannot open the index because of the write lock.

Lance, i can't reproduce using trunk r1033664 on Linux w/ext4 -- what OS  
Filesystem are you using?

If you load http://localhost:8983/solr/admin/stats.jsp; what does it list 
for the reader and readerDir in the searcher entry? 

: Why is there a write lock file when I have not tried to index anything?

No idea ... i don't get any write locks until i actually attempt to index 
something.



-Hoss

Re: Core status uptime and startTime


: As far as I know, in the core admin page you can find when was the last time
: an index had a modification and was comitted checking the lastModified.
: But? what startTime and uptime mean?
: Thanks in advance

startTime should be when the core was created (ie: when it started) uptime 
is now-startTime (in ms).


-Hoss

RE: Dynamic creating of cores in solr

2010-11-10 Thread Bob Sandiford

We also use SolrJ, and have a dynamically created Core capability - where we 
don't know in advance what the Cores will be that we require.

We almost always do a complete index build, and if there's a previous instance 
of that index, it needs to be available during a complete index build, so we 
have two cores per index, and switch them as required at the end of an indexing 
run.

Here's a summary of how we do it (we're in an early prototype / implementation 
right now - this isn't  production quality code - as you can tell from our 
voluminous javadocs on the methods...)

1) Identify if the core exists, and if not, create it:

   /**
 * This method instantiates two SolrServer objects, solr and indexCore.  It 
requires that
 * indexName be set before calling.
 */
private void initSolrServer() throws IOException
{
String baseUrl = http://localhost:8983/solr/;;
solr = new CommonsHttpSolrServer(baseUrl);

String indexCoreName = indexName + SolrConstants.SUFFIX_INDEX; // 
SUFIX_INDEX = _INDEX
String indexCoreUrl = baseUrl + indexCoreName;

// Here we create two cores for the indexName, if they don't already 
exist - the live core used
// for searching and a second core used for indexing. After indexing, 
the two will be switched so the
// just-indexed core will become the live core. The way that core 
swapping works, the live core will always
// be named [indexName] and the indexing core will always be named 
[indexname]_INDEX, but the
// dataDir of each core will alternate between [indexName]_1 and 
[indexName]_2. 
createCoreIfNeeded(indexName, indexName + _1, solr);
createCoreIfNeeded(indexCoreName, indexName + _2, solr);
indexCore = new CommonsHttpSolrServer(indexCoreUrl);
}


   /**
 * Create a core if it does not already exists. Returns true if a new core 
was created, false otherwise.
 */
private boolean createCoreIfNeeded(String coreName, String dataDir, 
SolrServer server) throws IOException
{
boolean coreExists = true;
try
{
// SolrJ provides no direct method to check if a core exists, but 
getStatus will
// return an empty list for any core that doesn't.
CoreAdminResponse statusResponse = 
CoreAdminRequest.getStatus(coreName, server);
coreExists = statusResponse.getCoreStatus(coreName).size()  0;
if(!coreExists)
{
// Create the core
LOG.info(Creating Solr core:  + coreName);
CoreAdminRequest.Create create = new CoreAdminRequest.Create();
create.setCoreName(coreName);
create.setInstanceDir(.);
create.setDataDir(dataDir);
create.process(server);
}
}
catch (SolrServerException e)
{
e.printStackTrace();
}
return !coreExists;
}


2) Do the index, clearing it first if it's a complete rebuild:

[snip]
if (fullIndex)
{
try
{
indexCore.deleteByQuery(*:*);
}
catch (SolrServerException e)
{
e.printStackTrace();  //To change body of catch statement use 
File | Settings | File Templates.
}
}
[snip]

various logic, then (we submit batches of 100 :

[snip]
ListSolrInputDocument docList = b.getSolrInputDocumentList();
  UpdateResponse rsp;
try
{
rsp = indexCore.add(docList);
rsp = indexCore.commit();
}
catch (IOException e) 
{
LOG.warn(Error commiting documents, e);
}
catch (SolrServerException e)
{
LOG.warn(Error commiting documents, e);
}
[snip]

3) optimize, then swap cores:

private void optimizeCore()
{
try
{
indexCore.optimize();
}
catch(SolrServerException e)
{
LOG.warn(Error while optimizing core, e);
}
catch(IOException e)
{
LOG.warn(Error while optimizing core, e);
}
}

private void swapCores()
{
String liveCore = indexName;
String indexCore = indexName + SolrConstants.SUFFIX_INDEX; // 
SUFFIX_INDEX = _INDEX
LOG.info(Swapping Solr cores:  + indexCore + ,  + liveCore);
CoreAdminRequest request = new CoreAdminRequest();
request.setAction(CoreAdminAction.SWAP);
request.setCoreName(indexCore);
request.setOtherCoreName(liveCore);
try
{
request.process(solr);
}
catch (SolrServerException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}

Re: Search with accent

Tomas,

Let me try to explain better.

For example.

- I have 10 documents, where 7 have the word pereque (without accent) and 3
have the word perequê (with accent)

When I do a search pereque, solr is returning just 7, and when I do a search
perequê solr is returning 3.

But for me, these words are the same, and when I do some search for perequê
or pereque, it should show me 10 results.


About the ISOLatin you told, do you know how can I enable it?

tks,
Claudio

On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
tomasflo...@yahoo.com.ar wrote:

 I don't understand, when the user search for perequê you want the results
 for
 perequê and pereque?

 If thats the case, any field type with ISOLatin1AccentFilterFactory should
 work.
 The accent should be removed at index time and at query time (Make sure the
 filter is being applied on both cases).

 Tomás





 
 De: Claudio Devecchi cdevec...@gmail.com
 Para: Lista Solr solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 15:16:24
 Asunto: Search with accent

 Hi all,

 Somebody knows how can I config my solr to make searches with and without
 accents?

 for example:

 pereque and perequê


 When I do it I need the same result, but its not working.

 tks
 --








-- 
Claudio Devecchi
flickr.com/cdevecchi

Re: Deploying WAR from trunk, exception


:   I built the trunk and deploy the war, but cannot access the admin URL
: anymore.
: 
: Error loading class
: 'org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
: 
: This class seems to be missing?

You appear to be using an old copy of the example config that refrences 
a class that was never released. (it was added by SOLR-1268, but replaced 
with something else in SOLR-2030) 



-Hoss

Re: Search with accent

2010-11-10 Thread Savvas-Andreas Moysidis

have you tried using a TokenFilter which removes accents both at indexing
and searching time? If you index terms without accents and search the same
way you should be able to find all documents as you require.

On 10 November 2010 20:08, Claudio Devecchi cdevec...@gmail.com wrote:

 Tomas,

 Let me try to explain better.

 For example.

 - I have 10 documents, where 7 have the word pereque (without accent) and 3
 have the word perequê (with accent)

 When I do a search pereque, solr is returning just 7, and when I do a
 search
 perequê solr is returning 3.

 But for me, these words are the same, and when I do some search for perequê
 or pereque, it should show me 10 results.


 About the ISOLatin you told, do you know how can I enable it?

 tks,
 Claudio

 On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
 tomasflo...@yahoo.com.ar wrote:

  I don't understand, when the user search for perequê you want the results
  for
  perequê and pereque?
 
  If thats the case, any field type with ISOLatin1AccentFilterFactory
 should
  work.
  The accent should be removed at index time and at query time (Make sure
 the
  filter is being applied on both cases).
 
  Tomás
 
 
 
 
 
  
  De: Claudio Devecchi cdevec...@gmail.com
  Para: Lista Solr solr-user@lucene.apache.org
  Enviado: miércoles, 10 de noviembre, 2010 15:16:24
  Asunto: Search with accent
 
  Hi all,
 
  Somebody knows how can I config my solr to make searches with and without
  accents?
 
  for example:
 
  pereque and perequê
 
 
  When I do it I need the same result, but its not working.
 
  tks
  --
 
 
 
 
 



 --
 Claudio Devecchi
 flickr.com/cdevecchi

Re: Search with accent

It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are on 
that version, you should use the ASCIIFoldingFilter instead.

Like with any other filter, to use it, you have to add the filter factory to 
the 
analysis chain of the field type you are using:

filter class=solr.ASCIIFoldingFilterFactory/

Make sure you add it to the query and index analysis chain, otherwise you'll 
have extrage results.

You'll have to perform a full reindex.

Tomás





De: Claudio Devecchi cdevec...@gmail.com
Para: solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 17:08:06
Asunto: Re: Search with accent

Tomas,

Let me try to explain better.

For example.

- I have 10 documents, where 7 have the word pereque (without accent) and 3
have the word perequê (with accent)

When I do a search pereque, solr is returning just 7, and when I do a search
perequê solr is returning 3.

But for me, these words are the same, and when I do some search for perequê
or pereque, it should show me 10 results.


About the ISOLatin you told, do you know how can I enable it?

tks,
Claudio

On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
tomasflo...@yahoo.com.ar wrote:

 I don't understand, when the user search for perequê you want the results
 for
 perequê and pereque?

 If thats the case, any field type with ISOLatin1AccentFilterFactory should
 work.
 The accent should be removed at index time and at query time (Make sure the
 filter is being applied on both cases).

 Tomás





 
 De: Claudio Devecchi cdevec...@gmail.com
 Para: Lista Solr solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 15:16:24
 Asunto: Search with accent

 Hi all,

 Somebody knows how can I config my solr to make searches with and without
 accents?

 for example:

 pereque and perequê


 When I do it I need the same result, but its not working.

 tks
 --








-- 
Claudio Devecchi
flickr.com/cdevecchi

Re: Dynamic creating of cores in solr

You could use the actual built-in Solr replication feature to accomplish 
that same function -- complete re-index to a 'master', and then when 
finished, trigger replication to the 'slave', with the 'slave' being the 
live index that actually serves your applications.


I am curious if there was any reason you chose to roll your own solution 
using JSolr and dynamic creation of cores, instead of simply using the 
replication feature. Were there any downsides of using the replication 
feature for this purpose that you amerliorated through your solution?


Jonathan

Bob Sandiford wrote:

We also use SolrJ, and have a dynamically created Core capability - where we 
don't know in advance what the Cores will be that we require.

We almost always do a complete index build, and if there's a previous instance 
of that index, it needs to be available during a complete index build, so we 
have two cores per index, and switch them as required at the end of an indexing 
run.

Here's a summary of how we do it (we're in an early prototype / implementation 
right now - this isn't  production quality code - as you can tell from our 
voluminous javadocs on the methods...)

1) Identify if the core exists, and if not, create it:

   /**
 * This method instantiates two SolrServer objects, solr and indexCore.  It 
requires that
 * indexName be set before calling.
 */
private void initSolrServer() throws IOException
{
String baseUrl = http://localhost:8983/solr/;;
solr = new CommonsHttpSolrServer(baseUrl);

String indexCoreName = indexName + SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = 
_INDEX
String indexCoreUrl = baseUrl + indexCoreName;

// Here we create two cores for the indexName, if they don't already 
exist - the live core used
// for searching and a second core used for indexing. After indexing, 
the two will be switched so the
// just-indexed core will become the live core. The way that core 
swapping works, the live core will always
// be named [indexName] and the indexing core will always be named 
[indexname]_INDEX, but the
// dataDir of each core will alternate between [indexName]_1 and [indexName]_2. 
createCoreIfNeeded(indexName, indexName + _1, solr);

createCoreIfNeeded(indexCoreName, indexName + _2, solr);
indexCore = new CommonsHttpSolrServer(indexCoreUrl);
}


   /**
 * Create a core if it does not already exists. Returns true if a new core 
was created, false otherwise.
 */
private boolean createCoreIfNeeded(String coreName, String dataDir, 
SolrServer server) throws IOException
{
boolean coreExists = true;
try
{
// SolrJ provides no direct method to check if a core exists, but 
getStatus will
// return an empty list for any core that doesn't.
CoreAdminResponse statusResponse = 
CoreAdminRequest.getStatus(coreName, server);
coreExists = statusResponse.getCoreStatus(coreName).size()  0;
if(!coreExists)
{
// Create the core
LOG.info(Creating Solr core:  + coreName);
CoreAdminRequest.Create create = new CoreAdminRequest.Create();
create.setCoreName(coreName);
create.setInstanceDir(.);
create.setDataDir(dataDir);
create.process(server);
}
}
catch (SolrServerException e)
{
e.printStackTrace();
}
return !coreExists;
}


2) Do the index, clearing it first if it's a complete rebuild:

[snip]
if (fullIndex)
{
try
{
indexCore.deleteByQuery(*:*);
}
catch (SolrServerException e)
{
e.printStackTrace();  //To change body of catch statement use 
File | Settings | File Templates.
}
}
[snip]

various logic, then (we submit batches of 100 :

[snip]
ListSolrInputDocument docList = b.getSolrInputDocumentList();
  UpdateResponse rsp;
try
{
rsp = indexCore.add(docList);
rsp = indexCore.commit();
}
catch (IOException e) 
{

LOG.warn(Error commiting documents, e);
}
catch (SolrServerException e)
{
LOG.warn(Error commiting documents, e);
}
[snip]

3) optimize, then swap cores:

private void optimizeCore()
{
try
{
indexCore.optimize();
}
catch(SolrServerException e)
{
LOG.warn(Error while optimizing core, e);
}
catch(IOException e)
{
LOG.warn(Error while optimizing core, e);
}
}

private void swapCores()
{
String liveCore = indexName;

Re: Search with accent

2010-11-10 Thread Savvas-Andreas Moysidis

have you tried using a TokenFilter which removes accents both at
indexing and searching time? If you index terms without accents and
search the same
way you should be able to find all documents as you require.



On 10 November 2010 20:25, Tomas Fernandez Lobbe
tomasflo...@yahoo.com.arwrote:

 It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are
 on
 that version, you should use the ASCIIFoldingFilter instead.

 Like with any other filter, to use it, you have to add the filter factory
 to the
 analysis chain of the field type you are using:

 filter class=solr.ASCIIFoldingFilterFactory/

 Make sure you add it to the query and index analysis chain, otherwise
 you'll
 have extrage results.

 You'll have to perform a full reindex.

 Tomás




 
 De: Claudio Devecchi cdevec...@gmail.com
 Para: solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 17:08:06
 Asunto: Re: Search with accent

 Tomas,

 Let me try to explain better.

 For example.

 - I have 10 documents, where 7 have the word pereque (without accent) and 3
 have the word perequê (with accent)

 When I do a search pereque, solr is returning just 7, and when I do a
 search
 perequê solr is returning 3.

 But for me, these words are the same, and when I do some search for perequê
 or pereque, it should show me 10 results.


 About the ISOLatin you told, do you know how can I enable it?

 tks,
 Claudio

 On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
 tomasflo...@yahoo.com.ar wrote:

  I don't understand, when the user search for perequê you want the results
  for
  perequê and pereque?
 
  If thats the case, any field type with ISOLatin1AccentFilterFactory
 should
  work.
  The accent should be removed at index time and at query time (Make sure
 the
  filter is being applied on both cases).
 
  Tomás
 
 
 
 
 
  
  De: Claudio Devecchi cdevec...@gmail.com
  Para: Lista Solr solr-user@lucene.apache.org
  Enviado: miércoles, 10 de noviembre, 2010 15:16:24
  Asunto: Search with accent
 
  Hi all,
 
  Somebody knows how can I config my solr to make searches with and without
  accents?
 
  for example:
 
  pereque and perequê
 
 
  When I do it I need the same result, but its not working.
 
  tks
  --
 
 
 
 
 



 --
 Claudio Devecchi
 flickr.com/cdevecchi

RE: Dynamic creating of cores in solr

2010-11-10 Thread Bob Sandiford

Why not use replication?  Call it inexperience...

We're really early into working with and fully understanding Solr and the best 
way to approach various issues.  I did mention that this was a prototype and 
non-production code, so I'm covered, though :)

We'll take a look at the replication feature...

Thanks!

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
 Sent: Wednesday, November 10, 2010 3:26 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Dynamic creating of cores in solr
 
 You could use the actual built-in Solr replication feature to
 accomplish
 that same function -- complete re-index to a 'master', and then when
 finished, trigger replication to the 'slave', with the 'slave' being
 the
 live index that actually serves your applications.
 
 I am curious if there was any reason you chose to roll your own
 solution
 using JSolr and dynamic creation of cores, instead of simply using the
 replication feature. Were there any downsides of using the replication
 feature for this purpose that you amerliorated through your solution?
 
 Jonathan
 
 Bob Sandiford wrote:
  We also use SolrJ, and have a dynamically created Core capability -
 where we don't know in advance what the Cores will be that we require.
 
  We almost always do a complete index build, and if there's a previous
 instance of that index, it needs to be available during a complete
 index build, so we have two cores per index, and switch them as
 required at the end of an indexing run.
 
  Here's a summary of how we do it (we're in an early prototype /
 implementation right now - this isn't  production quality code - as you
 can tell from our voluminous javadocs on the methods...)
 
  1) Identify if the core exists, and if not, create it:
 
 /**
   * This method instantiates two SolrServer objects, solr and
 indexCore.  It requires that
   * indexName be set before calling.
   */
  private void initSolrServer() throws IOException
  {
  String baseUrl = http://localhost:8983/solr/;;
  solr = new CommonsHttpSolrServer(baseUrl);
 
  String indexCoreName = indexName +
 SolrConstants.SUFFIX_INDEX; // SUFIX_INDEX = _INDEX
  String indexCoreUrl = baseUrl + indexCoreName;
 
  // Here we create two cores for the indexName, if they don't
 already exist - the live core used
  // for searching and a second core used for indexing. After
 indexing, the two will be switched so the
  // just-indexed core will become the live core. The way that
 core swapping works, the live core will always
  // be named [indexName] and the indexing core will always be
 named [indexname]_INDEX, but the
  // dataDir of each core will alternate between [indexName]_1
 and [indexName]_2.
  createCoreIfNeeded(indexName, indexName + _1, solr);
  createCoreIfNeeded(indexCoreName, indexName + _2, solr);
  indexCore = new CommonsHttpSolrServer(indexCoreUrl);
  }
 
 
 /**
   * Create a core if it does not already exists. Returns true if a
 new core was created, false otherwise.
   */
  private boolean createCoreIfNeeded(String coreName, String
 dataDir, SolrServer server) throws IOException
  {
  boolean coreExists = true;
  try
  {
  // SolrJ provides no direct method to check if a core
 exists, but getStatus will
  // return an empty list for any core that doesn't.
  CoreAdminResponse statusResponse =
 CoreAdminRequest.getStatus(coreName, server);
  coreExists =
 statusResponse.getCoreStatus(coreName).size()  0;
  if(!coreExists)
  {
  // Create the core
  LOG.info(Creating Solr core:  + coreName);
  CoreAdminRequest.Create create = new
 CoreAdminRequest.Create();
  create.setCoreName(coreName);
  create.setInstanceDir(.);
  create.setDataDir(dataDir);
  create.process(server);
  }
  }
  catch (SolrServerException e)
  {
  e.printStackTrace();
  }
  return !coreExists;
  }
 
 
  2) Do the index, clearing it first if it's a complete rebuild:
 
  [snip]
  if (fullIndex)
  {
  try
  {
  indexCore.deleteByQuery(*:*);
  }
  catch (SolrServerException e)
  {
  e.printStackTrace();  //To change body of catch
 statement use File | Settings | File Templates.
  }
  }
  [snip]
 
  various logic, then (we submit batches of 100 :
 
  [snip]
  ListSolrInputDocument docList =
 b.getSolrInputDocumentList();
UpdateResponse rsp;
  try
  {

Re: Search with accent

Hi Tomas,

Do you have some example to put in schema.xml?

How can I use thes filter class?

Tks

On Wed, Nov 10, 2010 at 6:25 PM, Tomas Fernandez Lobbe 
tomasflo...@yahoo.com.ar wrote:

 It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are
 on
 that version, you should use the ASCIIFoldingFilter instead.

 Like with any other filter, to use it, you have to add the filter factory
 to the
 analysis chain of the field type you are using:

 filter class=solr.ASCIIFoldingFilterFactory/

 Make sure you add it to the query and index analysis chain, otherwise
 you'll
 have extrage results.

 You'll have to perform a full reindex.

 Tomás




 
 De: Claudio Devecchi cdevec...@gmail.com
 Para: solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 17:08:06
 Asunto: Re: Search with accent

 Tomas,

 Let me try to explain better.

 For example.

 - I have 10 documents, where 7 have the word pereque (without accent) and 3
 have the word perequê (with accent)

 When I do a search pereque, solr is returning just 7, and when I do a
 search
 perequê solr is returning 3.

 But for me, these words are the same, and when I do some search for perequê
 or pereque, it should show me 10 results.


 About the ISOLatin you told, do you know how can I enable it?

 tks,
 Claudio

 On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
 tomasflo...@yahoo.com.ar wrote:

  I don't understand, when the user search for perequê you want the results
  for
  perequê and pereque?
 
  If thats the case, any field type with ISOLatin1AccentFilterFactory
 should
  work.
  The accent should be removed at index time and at query time (Make sure
 the
  filter is being applied on both cases).
 
  Tomás
 
 
 
 
 
  
  De: Claudio Devecchi cdevec...@gmail.com
  Para: Lista Solr solr-user@lucene.apache.org
  Enviado: miércoles, 10 de noviembre, 2010 15:16:24
  Asunto: Search with accent
 
  Hi all,
 
  Somebody knows how can I config my solr to make searches with and without
  accents?
 
  for example:
 
  pereque and perequê
 
 
  When I do it I need the same result, but its not working.
 
  tks
  --
 
 
 
 
 



 --
 Claudio Devecchi
 flickr.com/cdevecchi








-- 
Claudio Devecchi
flickr.com/cdevecchi

Re: Search with accent

That's what the ASCIIFoldingFilter does, it removes the accents, that's why you 
have to add it to the query analisis chain and to the index analysis chain, to 
search the same way you index. 



You can see how it works from the Analysis page on Solr Admin.






De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com
Para: solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 17:27:24
Asunto: Re: Search with accent

have you tried using a TokenFilter which removes accents both at
indexing and searching time? If you index terms without accents and
search the same
way you should be able to find all documents as you require.



On 10 November 2010 20:25, Tomas Fernandez Lobbe
tomasflo...@yahoo.com.arwrote:

 It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you are
 on
 that version, you should use the ASCIIFoldingFilter instead.

 Like with any other filter, to use it, you have to add the filter factory
 to the
 analysis chain of the field type you are using:

 filter class=solr.ASCIIFoldingFilterFactory/

 Make sure you add it to the query and index analysis chain, otherwise
 you'll
 have extrage results.

 You'll have to perform a full reindex.

 Tomás




 
 De: Claudio Devecchi cdevec...@gmail.com
 Para: solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 17:08:06
 Asunto: Re: Search with accent

 Tomas,

 Let me try to explain better.

 For example.

 - I have 10 documents, where 7 have the word pereque (without accent) and 3
 have the word perequê (with accent)

 When I do a search pereque, solr is returning just 7, and when I do a
 search
 perequê solr is returning 3.

 But for me, these words are the same, and when I do some search for perequê
 or pereque, it should show me 10 results.


 About the ISOLatin you told, do you know how can I enable it?

 tks,
 Claudio

 On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
 tomasflo...@yahoo.com.ar wrote:

  I don't understand, when the user search for perequê you want the results
  for
  perequê and pereque?
 
  If thats the case, any field type with ISOLatin1AccentFilterFactory
 should
  work.
  The accent should be removed at index time and at query time (Make sure
 the
  filter is being applied on both cases).
 
  Tomás
 
 
 
 
 
  
  De: Claudio Devecchi cdevec...@gmail.com
  Para: Lista Solr solr-user@lucene.apache.org
  Enviado: miércoles, 10 de noviembre, 2010 15:16:24
  Asunto: Search with accent
 
  Hi all,
 
  Somebody knows how can I config my solr to make searches with and without
  accents?
 
  for example:
 
  pereque and perequê
 
 
  When I do it I need the same result, but its not working.
 
  tks
  --
 
 
 
 
 



 --
 Claudio Devecchi
 flickr.com/cdevecchi

Re: Search with accent

Ok tks,

I'm new with solr, my doubt is how can I enable theses feature. Or these
feature is already working by default?

Is this something to config on my schema.xml?

Tks!!


On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe 
tomasflo...@yahoo.com.ar wrote:

 That's what the ASCIIFoldingFilter does, it removes the accents, that's why
 you
 have to add it to the query analisis chain and to the index analysis chain,
 to
 search the same way you index.



 You can see how it works from the Analysis page on Solr Admin.





 
 De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 17:27:24
 Asunto: Re: Search with accent

 have you tried using a TokenFilter which removes accents both at
 indexing and searching time? If you index terms without accents and
 search the same
 way you should be able to find all documents as you require.



 On 10 November 2010 20:25, Tomas Fernandez Lobbe
 tomasflo...@yahoo.com.arwrote:

  It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you
 are
  on
  that version, you should use the ASCIIFoldingFilter instead.
 
  Like with any other filter, to use it, you have to add the filter factory
  to the
  analysis chain of the field type you are using:
 
  filter class=solr.ASCIIFoldingFilterFactory/
 
  Make sure you add it to the query and index analysis chain, otherwise
  you'll
  have extrage results.
 
  You'll have to perform a full reindex.
 
  Tomás
 
 
 
 
  
  De: Claudio Devecchi cdevec...@gmail.com
  Para: solr-user@lucene.apache.org
  Enviado: miércoles, 10 de noviembre, 2010 17:08:06
  Asunto: Re: Search with accent
 
  Tomas,
 
  Let me try to explain better.
 
  For example.
 
  - I have 10 documents, where 7 have the word pereque (without accent) and
 3
  have the word perequê (with accent)
 
  When I do a search pereque, solr is returning just 7, and when I do a
  search
  perequê solr is returning 3.
 
  But for me, these words are the same, and when I do some search for
 perequê
  or pereque, it should show me 10 results.
 
 
  About the ISOLatin you told, do you know how can I enable it?
 
  tks,
  Claudio
 
  On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
  tomasflo...@yahoo.com.ar wrote:
 
   I don't understand, when the user search for perequê you want the
 results
   for
   perequê and pereque?
  
   If thats the case, any field type with ISOLatin1AccentFilterFactory
  should
   work.
   The accent should be removed at index time and at query time (Make sure
  the
   filter is being applied on both cases).
  
   Tomás
  
  
  
  
  
   
   De: Claudio Devecchi cdevec...@gmail.com
   Para: Lista Solr solr-user@lucene.apache.org
   Enviado: miércoles, 10 de noviembre, 2010 15:16:24
   Asunto: Search with accent
  
   Hi all,
  
   Somebody knows how can I config my solr to make searches with and
 without
   accents?
  
   for example:
  
   pereque and perequê
  
  
   When I do it I need the same result, but its not working.
  
   tks
   --
  
  
  
  
  
 
 
 
  --
  Claudio Devecchi
  flickr.com/cdevecchi
 
 
 
 








-- 
Claudio Devecchi
flickr.com/cdevecchi

Re: Search with accent

You have to modify the field type you are using in your schema.xml file. This 
is 
the text field type of Solr 1.4.1 exmple with this filter added:

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English 
protected=protwords.txt/
  /analyzer
/fieldType








De: Claudio Devecchi cdevec...@gmail.com
Para: solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 17:44:01
Asunto: Re: Search with accent

Ok tks,

I'm new with solr, my doubt is how can I enable theses feature. Or these
feature is already working by default?

Is this something to config on my schema.xml?

Tks!!


On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe 
tomasflo...@yahoo.com.ar wrote:

 That's what the ASCIIFoldingFilter does, it removes the accents, that's why
 you
 have to add it to the query analisis chain and to the index analysis chain,
 to
 search the same way you index.



 You can see how it works from the Analysis page on Solr Admin.





 
 De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com
 Para: solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 17:27:24
 Asunto: Re: Search with accent

 have you tried using a TokenFilter which removes accents both at
 indexing and searching time? If you index terms without accents and
 search the same
 way you should be able to find all documents as you require.



 On 10 November 2010 20:25, Tomas Fernandez Lobbe
 tomasflo...@yahoo.com.arwrote:

  It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you
 are
  on
  that version, you should use the ASCIIFoldingFilter instead.
 
  Like with any other filter, to use it, you have to add the filter factory
  to the
  analysis chain of the field type you are using:
 
  filter class=solr.ASCIIFoldingFilterFactory/
 
  Make sure you add it to the query and index analysis chain, otherwise
  you'll
  have extrage results.
 
  You'll have to perform a full reindex.
 
  Tomás
 
 
 
 
  
  De: Claudio Devecchi cdevec...@gmail.com
  Para: solr-user@lucene.apache.org
  Enviado: miércoles, 10 de noviembre, 2010 17:08:06
  Asunto: Re: Search with accent
 
  Tomas,
 
  Let me try to explain better.
 
  For example.
 
  - I have 10 documents, where 7 have the word pereque (without accent) and
 3
  have the word perequê (with accent)
 
  When I do a search pereque, solr is returning just 7, and when I do a
  search
  perequê solr is returning 3.
 
  But for me, these words are the same, and when I do some search for
 perequê
  or pereque, it should show me 10 results.
 
 
  About the ISOLatin you told, do you know how can I enable it?
 
  tks,
  Claudio
 
  On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
  tomasflo...@yahoo.com.ar wrote:
 
   I don't understand, when the user search for perequê you want the
 results
   for
   perequê and pereque?
  
   If thats the case, any field type with ISOLatin1AccentFilterFactory
  should
   work.
   The accent should be removed at index time and at query time (Make sure
  the
   filter is being applied on both cases).
  
   Tomás
  
  
  
  
  
   
   De: Claudio Devecchi cdevec...@gmail.com
   Para: Lista Solr solr-user@lucene.apache.org
   Enviado: miércoles, 10 de noviembre, 2010 15:16:24
   Asunto: Search with accent
  
   Hi all,
  
   Somebody knows how can I config my solr to make searches with and
 without
   accents?
  
   for example:
  
   pereque and perequê
  
  
   When I do it I need the same result, but its not working.
  
   tks
   --
  
  
  
  
  
 
 
 
  --

Re: Best practice for emailing this list?

2010-11-10 Thread Upayavira

In the header is a line saying what rules your message matched. That'll
let you know what about your message was causing your mails to be
rejected.

Upayavira

On Wed, 10 Nov 2010 11:42 -0800, robo - robom...@gmail.com wrote:
 Thanks for all your help Ezequiel.  I cannot see anything in my email
 that would make this get marked as spam. Anybody have any ideas on how
 to get this fixed so I can email my questions?
 
 robo
 
 
 On Wed, Nov 10, 2010 at 11:36 AM, Ezequiel Calderara ezech...@gmail.com
 wrote:
  Tried to forward the mail of robomon but had the same error:
  Delivery to the following recipient failed permanently:
     solr-u...@lucene.apache.org
  Technical details of permanent failure:
  Google tried to deliver your message, but it was rejected by the recipient
  domain. We recommend contacting the other email provider for further
  information about the cause of this error. The error that the other server
  returned was: 552 552 spam score (5.8) exceeded threshold (state 18).
  - Original message -
 
 
 
  On Wed, Nov 10, 2010 at 4:12 PM, Ezequiel Calderara 
  ezech...@gmail.comwrote:
 
  Mmmm maybe its your mail address? :P
  Weird, i didn't have any problem with it using gmail...
 
  Send in plain text, avoid links or links... maybe that could work...
 
  If you want, send me the mail and i will forward it to the list, just to
  test!
 
    On Wed, Nov 10, 2010 at 3:59 PM, robo - robom...@gmail.com wrote:
 
  No matter how much I limit my other email it will not get through the
  Solr mailing spam filter.  This has to be the most frustrating mailing
  list I have ever tried to work with.  All I need are some answers on
  replication and load balancing but I can't even get it to the list.
 
 
  On Wed, Nov 10, 2010 at 10:17 AM, Ken Stanley doh...@gmail.com wrote:
   On Wed, Nov 10, 2010 at 1:11 PM, robo - robom...@gmail.com wrote:
   How do people email this list without getting spam filter problems?
  
  
   Depends on which side of the spam filter that you're referring to.
   I've found that to keep these emails from entering my spam filter is
   to add a rule to Gmail that says Never send to spam. As for when I
   send emails, I make sure that I send my emails as plain text to avoid
   getting bounce backs.
  
   - Ken
  
 
 
 
 
  --
  __
  Ezequiel.
 
  Http://www.ironicnet.com http://www.ironicnet.com/
 
 
 
 
  --
  __
  Ezequiel.
 
  Http://www.ironicnet.com

Re: Search with accent

thx so much tomas, I'll test now.



On Wed, Nov 10, 2010 at 6:47 PM, Tomas Fernandez Lobbe 
tomasflo...@yahoo.com.ar wrote:

 You have to modify the field type you are using in your schema.xml file.
 This is
 the text field type of Solr 1.4.1 exmple with this filter added:

fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0
 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
 protected=protwords.txt/
  /analyzer
/fieldType







 
 De: Claudio Devecchi cdevec...@gmail.com
 Para: solr-user@lucene.apache.org
 Enviado: miércoles, 10 de noviembre, 2010 17:44:01
 Asunto: Re: Search with accent

 Ok tks,

 I'm new with solr, my doubt is how can I enable theses feature. Or these
 feature is already working by default?

 Is this something to config on my schema.xml?

 Tks!!


 On Wed, Nov 10, 2010 at 6:40 PM, Tomas Fernandez Lobbe 
 tomasflo...@yahoo.com.ar wrote:

  That's what the ASCIIFoldingFilter does, it removes the accents, that's
 why
  you
  have to add it to the query analisis chain and to the index analysis
 chain,
  to
  search the same way you index.
 
 
 
  You can see how it works from the Analysis page on Solr Admin.
 
 
 
 
 
  
  De: Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com
  Para: solr-user@lucene.apache.org
  Enviado: miércoles, 10 de noviembre, 2010 17:27:24
  Asunto: Re: Search with accent
 
  have you tried using a TokenFilter which removes accents both at
  indexing and searching time? If you index terms without accents and
  search the same
  way you should be able to find all documents as you require.
 
 
 
  On 10 November 2010 20:25, Tomas Fernandez Lobbe
  tomasflo...@yahoo.com.arwrote:
 
   It looks like ISOLatin1AccentFilter is deprecated on Solr 1.4.1, If you
  are
   on
   that version, you should use the ASCIIFoldingFilter instead.
  
   Like with any other filter, to use it, you have to add the filter
 factory
   to the
   analysis chain of the field type you are using:
  
   filter class=solr.ASCIIFoldingFilterFactory/
  
   Make sure you add it to the query and index analysis chain, otherwise
   you'll
   have extrage results.
  
   You'll have to perform a full reindex.
  
   Tomás
  
  
  
  
   
   De: Claudio Devecchi cdevec...@gmail.com
   Para: solr-user@lucene.apache.org
   Enviado: miércoles, 10 de noviembre, 2010 17:08:06
   Asunto: Re: Search with accent
  
   Tomas,
  
   Let me try to explain better.
  
   For example.
  
   - I have 10 documents, where 7 have the word pereque (without accent)
 and
  3
   have the word perequê (with accent)
  
   When I do a search pereque, solr is returning just 7, and when I do a
   search
   perequê solr is returning 3.
  
   But for me, these words are the same, and when I do some search for
  perequê
   or pereque, it should show me 10 results.
  
  
   About the ISOLatin you told, do you know how can I enable it?
  
   tks,
   Claudio
  
   On Wed, Nov 10, 2010 at 5:00 PM, Tomas Fernandez Lobbe 
   tomasflo...@yahoo.com.ar wrote:
  
I don't understand, when the user search for perequê you want the
  results
for
perequê and pereque?
   
If thats the case, any field type with ISOLatin1AccentFilterFactory
   should
work.
The accent should be removed at index time and at query time (Make
 sure
   the
filter is being applied on both cases).
   
Tomás
   
   
   
   
   

De: Claudio Devecchi cdevec...@gmail.com
Para: Lista Solr solr-user@lucene.apache.org
Enviado: miércoles, 10 de noviembre, 2010 15:16:24
Asunto: Search with accent

RE: Adding new field after data is already indexed

2010-11-10 Thread Robert Petersen

1)  Just put the new field in the schema and stop/start solr.  Documents
in the index will not have the field until you reindex them but it won't
hurt anything.

2)  Just turn off their handlers in solrconfig is all I think that
takes.

-Original Message-
From: gauravshetti [mailto:gaurav.she...@tcs.com] 
Sent: Monday, November 08, 2010 5:21 AM
To: solr-user@lucene.apache.org
Subject: Adding new field after data is already indexed


Hi,
 
 I had a few questions regarding Solr.
Say my schema file looks like
field name=folder_id type=long indexed=true stored=true/
field name=indexed type=boolean indexed=true stored=true/

and i index data on the basis of these fields. Now, incase i need to add
a
new field, is there a way i can add the field without corrupting the
previous data. Is there any feature which adds a new field with a
default
value to the existing records.


2) Is there any security mechanism/authorization check to prevent url
like
/admin and /update to only a few users.

-- 
View this message in context:
http://lucene.472066.n3.nabble.com/Adding-new-field-after-data-is-alread
y-indexed-tp1862575p1862575.html
Sent from the Solr - User mailing list archive at Nabble.com.

Facet showing MORE results than expected when its selected?


A facet shows the amount of results that match with that facet, e.g. New
York (433) So when the facet is clicked, you'd expect that amount of
results (433).

However, I have a facet Hotel en Restaurant (321), that, when clicked
shows 370 results! :s


1st query:
http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1


This is (part) of the resultset of my first query
lst name=facet_counts
lst name=facet_queries/
lst name=facet_fields
lst name=themes_raw
int name=Hotel en Restaurant321/int
/lst
/lst
lst name=facet_dates/
lst name=facet_ranges/
/lst



Now when I click the facet Hotel en Restaurant, 
it fires my second query:
http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:Hotel
en
Restaurantq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1

I would expect 321, however I get 370!


schema.xml
field name=themes type=text indexed=true stored=true
multiValued=true  /
field name=themes_raw type=string indexed=true stored=true
multiValued=true/
copyField source=themes dest=themes_raw/
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1878828.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Facet showing MORE results than expected when its selected?

2010-11-10 Thread Bob Sandiford

Shouldn't the second query have the clause:

   fq=themes_raw:Hotel en Restaurant

instead of:

   fq=themes:Hotel en Restaurant

Otherwise you're mixing apples (themes_raw) and oranges (themes).

(Notice how I cleverly extended the restaurant theme to be food related :))

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com 


 -Original Message-
 From: PeterKerk [mailto:vettepa...@hotmail.com]
 Sent: Wednesday, November 10, 2010 4:34 PM
 To: solr-user@lucene.apache.org
 Subject: Facet showing MORE results than expected when its selected?
 
 
 A facet shows the amount of results that match with that facet, e.g.
 New
 York (433) So when the facet is clicked, you'd expect that amount of
 results (433).
 
 However, I have a facet Hotel en Restaurant (321), that, when clicked
 shows 370 results! :s
 
 
 1st query:
 http://localhost:8983/solr/db/select/?indent=onfacet=trueq=*:*start=
 0rows=25fl=id,title,themesfacet.field=themes_rawfacet.mincount=1
 
 
 This is (part) of the resultset of my first query
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields
 lst name=themes_raw
   int name=Hotel en Restaurant321/int
 /lst
 /lst
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 
 
 
 Now when I click the facet Hotel en Restaurant,
 it fires my second query:
 http://localhost:8983/solr/db/select/?indent=onfacet=truefq=themes:Ho
 tel
 en
 Restaurantq=*:*start=0rows=25fl=id,title,themesfacet.field=themes_
 rawfacet.mincount=1
 
 I would expect 321, however I get 370!
 
 
 schema.xml
 field name=themes type=text indexed=true stored=true
 multiValued=true  /
 field name=themes_raw type=string indexed=true stored=true
 multiValued=true/
 copyField source=themes dest=themes_raw/
 --
 View this message in context: http://lucene.472066.n3.nabble.com/Facet-
 showing-MORE-results-than-expected-when-its-selected-
 tp1878828p1878828.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Crawling with nutch and mapping fields to solr

2010-11-10 Thread Jean-Luc Thiebaut

Hi

I'm fairly new to solr but I have it configured, along with nutch, as per
this tutorial http://ubuntuforums.org/showthread.php?p=9596257.

Nutch is crawling and injecting documents into solr as expected, however, I
want to break the data down further so what ends up in solr is a bit more
granular.

Can anyone explain in simple terms how I might go about parsing the data I
get from nutch and mapping it to custom fields? Ideally I'd like to be able
to pull out meta-data from the source HTML and map it to specific fields in
solr.

I hope I'm in the right place to ask this question. Any help would be much
appreciated.

Jean-Luc

RE: Facet showing MORE results than expected when its selected?


LOL, very clever indeed ;)

The thing is: when I select the amount of records matching the theme 'Hotel
en Restaurant' in my db, I end up with 321 records. So that is correct. I
dont know where the 370 is coming from.

Now when I change the query to this: fq=themes_raw:Hotel en Restaurant 
I end up with 110 records...(another number even :s)

What I did notice, is that this only happens on multi-word facets Hotel en
Restaurant being a 3 word facet. The facets work correct on a facet named
Cafe, so I suspect it has something to do with the tokenization.

As you can see, I'm using text and string.
For compleness Im posting definition of those in my schema.xml as well:

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true /
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879163.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet showing MORE results than expected when its selected?

I've had that sort of thing happen from 'corrupting' my index, by 
changing my schema.xml without re-indexing.


If you change field types or other things in schema.xml, you need to 
reindex all your data. (You can add brand new fields or types without 
having to re-index, but most other changes will require a re-index).


Could that be it?

PeterKerk wrote:

LOL, very clever indeed ;)

The thing is: when I select the amount of records matching the theme 'Hotel
en Restaurant' in my db, I end up with 321 records. So that is correct. I
dont know where the 370 is coming from.

Now when I change the query to this: fq=themes_raw:Hotel en Restaurant 
I end up with 110 records...(another number even :s)


What I did notice, is that this only happens on multi-word facets Hotel en
Restaurant being a 3 word facet. The facets work correct on a facet named
Cafe, so I suspect it has something to do with the tokenization.

As you can see, I'm using text and string.
For compleness Im posting definition of those in my schema.xml as well:

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true /

Re: Facet showing MORE results than expected when its selected?


Nope, I restarted my server to reload schema.xml, and did a reindex, as I've
done a thousand times before, but still the same behaviour :(
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879218.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet showing MORE results than expected when its selected?

2010-11-10 Thread Geert-Jan Brits

Another option :  assuming themes_raw is type 'string' (couldn't get that
nugget of info for 100%) it could be that you're seeing a difference in nr
of results between the 110 for fq:themes_raw and 321 from your db, because
fieldtype:string (thus themes_raw)  is case-sensitive while (depending on
your db-setup) querying your db is case-insensitive, which could explain the
larger nr of hits for your db as well.

Cheers,
Geert-Jan


2010/11/10 Jonathan Rochkind rochk...@jhu.edu

 I've had that sort of thing happen from 'corrupting' my index, by changing
 my schema.xml without re-indexing.

 If you change field types or other things in schema.xml, you need to
 reindex all your data. (You can add brand new fields or types without having
 to re-index, but most other changes will require a re-index).

 Could that be it?


 PeterKerk wrote:

 LOL, very clever indeed ;)

 The thing is: when I select the amount of records matching the theme
 'Hotel
 en Restaurant' in my db, I end up with 321 records. So that is correct. I
 dont know where the 370 is coming from.

 Now when I change the query to this: fq=themes_raw:Hotel en Restaurant I
 end up with 110 records...(another number even :s)

 What I did notice, is that this only happens on multi-word facets Hotel
 en
 Restaurant being a 3 word facet. The facets work correct on a facet named
 Cafe, so I suspect it has something to do with the tokenization.

 As you can see, I'm using text and string.
 For compleness Im posting definition of those in my schema.xml as well:

fieldType name=text class=solr.TextField
 positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


 fieldType name=string class=solr.StrField sortMissingLast=true
 omitNorms=true /

Re: Facet showing MORE results than expected when its selected?


Nope, thats not possible either since the themename is stored in the database
in table [themes] only once, and other locations refer to it using the link
table [location_themes]

simple DB scheme using a link table:

[themes]
id name

[location_themes]
locationid themeid

[locations]
id name etc etc

PS. I posted definition of fields and tokenizers above if you want to have a
look at it :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879292.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet showing MORE results than expected when its selected?

2010-11-10 Thread Erick Erickson

I was playing around with the example Solr app, and I get different
results when I specify something like fq=manu_exact:ASUS Computer Inc.
and fq=manu_exact:ASUS Computer Inc.

The latter gives many more matches, which looks kinda familiar.

Silly season stuff, I know, but thought I'd mention it...

Best
Erick

On Wed, Nov 10, 2010 at 5:32 PM, Geert-Jan Brits gbr...@gmail.com wrote:

 Another option :  assuming themes_raw is type 'string' (couldn't get that
 nugget of info for 100%) it could be that you're seeing a difference in nr
 of results between the 110 for fq:themes_raw and 321 from your db, because
 fieldtype:string (thus themes_raw)  is case-sensitive while (depending on
 your db-setup) querying your db is case-insensitive, which could explain
 the
 larger nr of hits for your db as well.

 Cheers,
 Geert-Jan


 2010/11/10 Jonathan Rochkind rochk...@jhu.edu

  I've had that sort of thing happen from 'corrupting' my index, by
 changing
  my schema.xml without re-indexing.
 
  If you change field types or other things in schema.xml, you need to
  reindex all your data. (You can add brand new fields or types without
 having
  to re-index, but most other changes will require a re-index).
 
  Could that be it?
 
 
  PeterKerk wrote:
 
  LOL, very clever indeed ;)
 
  The thing is: when I select the amount of records matching the theme
  'Hotel
  en Restaurant' in my db, I end up with 321 records. So that is correct.
 I
  dont know where the 370 is coming from.
 
  Now when I change the query to this: fq=themes_raw:Hotel en Restaurant
 I
  end up with 110 records...(another number even :s)
 
  What I did notice, is that this only happens on multi-word facets Hotel
  en
  Restaurant being a 3 word facet. The facets work correct on a facet
 named
  Cafe, so I suspect it has something to do with the tokenization.
 
  As you can see, I'm using text and string.
  For compleness Im posting definition of those in my schema.xml as well:
 
 fieldType name=text class=solr.TextField
  positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
  synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords_dutch.txt/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords_dutch.txt/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
  protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType
 
 
  fieldType name=string class=solr.StrField sortMissingLast=true
  omitNorms=true /

Re: Facet showing MORE results than expected when its selected?


O wow, the quotes did the trick...thanks! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-showing-MORE-results-than-expected-when-its-selected-tp1878828p1879335.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet showing MORE results than expected when its selected?

Good call. Alternately, for facet limiting,  you may find it simpler, 
easier (and very very slightly more efficient for Solr) to use either 
the raw or field query parsers, that don't do the pre-tokenization 
that the standard query parser does, which is what is making the quotes 
required.


fq={!field f=solr_field}My Multi-Word Value
fq={!raw f=solr_field}Multi-Word Value

(Still URL-escape though, not shown above for clarity).

These ways you also won't have to worry about if one of your values 
accidentally includes a literal double quote, or something like that. 
For a non-tokenized String field like we're talking about here, !field 
and !raw, I think, will be effectively identical.


http://wiki.apache.org/solr/SolrQuerySyntax#Other_built-in_useful_query_parsers

Jonathan

PeterKerk wrote:

O wow, the quotes did the trick...thanks! :)

Re: Solr optimize operation slows my MySQL serveur

2010-11-10 Thread Skreo


Thank you for your answers.
Isn't it possible to tune Solr to use less disc bandwidth (involving a
longer optimization) ?

I moved Solr on the unused HDD, and the problem is solved !
Fortunately I have this separate disk...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-optimize-operation-slows-my-MySQL-serveur-tp1877270p1879527.html
Sent from the Solr - User mailing list archive at Nabble.com.

Concatenate multiple tokens into one

2010-11-10 Thread Robert Gründler

Hi,

i've created the following filterchain in a field type, the idea is to use it 
for autocompletion purposes:

tokenizer class=solr.WhitespaceTokenizerFactory/ !-- create tokens 
separated by whitespace --
filter class=solr.LowerCaseFilterFactory/ !-- lowercase everything --
filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt 
enablePositionIncrements=true /  !-- throw out stopwords --
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) 
replacement= replace=all /  !-- throw out all everything except a-z --

!-- actually, here i would like to join multiple tokens together again, to 
provide one token for the EdgeNGramFilterFactory --

filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / 
!-- create edgeNGram tokens for autocomplete matches --

With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple 
tokens on input strings with whitespaces in it. This leads to the following 
results:
Input Query: George Cloo
Matches:
- George Harrison
- John Clooridge
- George Smith
-George Clooney
- etc

However, only George Clooney should match in the autocompletion use case.
Therefore, i'd like to add a filter before the EdgeNGramFilterFactory, which 
concatenates all the tokens generated by the WhitespaceTokenizerFactory.
Are there filters which can do such a thing?

If not, are there examples how to implement a custom TokenFilter?

thanks!

-robert

RE: Concatenate multiple tokens into one

Are you sure you really want to throw out stopwords for your use case?  I don't 
think autocompletion will work how you want if you do. 

And if you don't... then why use the WhitespaceTokenizer and then try to jam 
the tokens back together? Why not just NOT tokenize in the first place. Use the 
KeywordTokenizer, which really should be called the NonTokenizingTokenizer, 
becaues it doesn't tokenize at all, it just creates one token from the entire 
input string. 

Then lowercase, remove whitespace (or not), do whatever else you want to do to 
your single token to normalize it, and then edgengram it. 

If you include whitespace in the token, then when making your queries for 
auto-complete, be sure to use a query parser that doesn't do 
pre-tokenization, the 'field' query parser should work well for this. 

Jonathan




From: Robert Gründler [rob...@dubture.com]
Sent: Wednesday, November 10, 2010 6:39 PM
To: solr-user@lucene.apache.org
Subject: Concatenate multiple tokens into one

Hi,

i've created the following filterchain in a field type, the idea is to use it 
for autocompletion purposes:

tokenizer class=solr.WhitespaceTokenizerFactory/ !-- create tokens 
separated by whitespace --
filter class=solr.LowerCaseFilterFactory/ !-- lowercase everything --
filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt 
enablePositionIncrements=true /  !-- throw out stopwords --
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) 
replacement= replace=all /  !-- throw out all everything except a-z --

!-- actually, here i would like to join multiple tokens together again, to 
provide one token for the EdgeNGramFilterFactory --

filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / 
!-- create edgeNGram tokens for autocomplete matches --

With that kind of filterchain, the EdgeNGramFilterFactory will receive multiple 
tokens on input strings with whitespaces in it. This leads to the following 
results:
Input Query: George Cloo
Matches:
- George Harrison
- John Clooridge
- George Smith
-George Clooney
- etc

However, only George Clooney should match in the autocompletion use case.
Therefore, i'd like to add a filter before the EdgeNGramFilterFactory, which 
concatenates all the tokens generated by the WhitespaceTokenizerFactory.
Are there filters which can do such a thing?

If not, are there examples how to implement a custom TokenFilter?

thanks!

-robert

Re: Concatenate multiple tokens into one

2010-11-10 Thread Robert Gründler


On Nov 11, 2010, at 1:12 AM, Jonathan Rochkind wrote:

 Are you sure you really want to throw out stopwords for your use case?  I 
 don't think autocompletion will work how you want if you do. 

in our case i think it makes sense. the content is targetting the electronic 
music / dj scene, so we have a lot of words like DJ or featuring which
make sense to throw out of the query. Also searches for the beastie boys and 
beastie boys should return a match in the autocompletion.

 
 And if you don't... then why use the WhitespaceTokenizer and then try to jam 
 the tokens back together? Why not just NOT tokenize in the first place. Use 
 the KeywordTokenizer, which really should be called the 
 NonTokenizingTokenizer, becaues it doesn't tokenize at all, it just creates 
 one token from the entire input string. 

I started out with the KeywordTokenizer, which worked well, except the StopWord 
problem.

For now, i've come up with a quick-and-dirty custom ConcatFilter, which does 
what i'm after:

public class ConcatFilter extends TokenFilter {

private TokenStream tstream;

protected ConcatFilter(TokenStream input) {
super(input);
this.tstream = input;
}

@Override
public Token next() throws IOException {

Token token = new Token();
StringBuilder builder = new StringBuilder();

TermAttribute termAttribute = (TermAttribute) 
tstream.getAttribute(TermAttribute.class);
TypeAttribute typeAttribute = (TypeAttribute) 
tstream.getAttribute(TypeAttribute.class);

boolean incremented = false;

while (tstream.incrementToken()) {

if (typeAttribute.type().equals(word)) {
builder.append(termAttribute.term());   

}
incremented = true;
}

token.setTermBuffer(builder.toString());

if (incremented == true)
return token;

return null;
}
}

I'm not sure if this is a safe way to do this, as i'm not familar with the 
whole solr/lucene implementation after all.


best


-robert




 
 Then lowercase, remove whitespace (or not), do whatever else you want to do 
 to your single token to normalize it, and then edgengram it. 
 
 If you include whitespace in the token, then when making your queries for 
 auto-complete, be sure to use a query parser that doesn't do 
 pre-tokenization, the 'field' query parser should work well for this. 
 
 Jonathan
 
 
 
 
 From: Robert Gründler [rob...@dubture.com]
 Sent: Wednesday, November 10, 2010 6:39 PM
 To: solr-user@lucene.apache.org
 Subject: Concatenate multiple tokens into one
 
 Hi,
 
 i've created the following filterchain in a field type, the idea is to use it 
 for autocompletion purposes:
 
 tokenizer class=solr.WhitespaceTokenizerFactory/ !-- create tokens 
 separated by whitespace --
 filter class=solr.LowerCaseFilterFactory/ !-- lowercase everything --
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /  !-- throw out 
 stopwords --
 filter class=solr.PatternReplaceFilterFactory pattern=([^a-z]) 
 replacement= replace=all /  !-- throw out all everything except a-z --
 
 !-- actually, here i would like to join multiple tokens together again, to 
 provide one token for the EdgeNGramFilterFactory --
 
 filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 
 / !-- create edgeNGram tokens for autocomplete matches --
 
 With that kind of filterchain, the EdgeNGramFilterFactory will receive 
 multiple tokens on input strings with whitespaces in it. This leads to the 
 following results:
 Input Query: George Cloo
 Matches:
 - George Harrison
 - John Clooridge
 - George Smith
 -George Clooney
 - etc
 
 However, only George Clooney should match in the autocompletion use case.
 Therefore, i'd like to add a filter before the EdgeNGramFilterFactory, which 
 concatenates all the tokens generated by the WhitespaceTokenizerFactory.
 Are there filters which can do such a thing?
 
 If not, are there examples how to implement a custom TokenFilter?
 
 thanks!
 
 -robert

Replication with slaves and load balancing questions