external jar as processor

2012-02-07 Thread nagarjuna
hi everybody i have the following entities. i added the jar file into
webinf/lib folder and i dont know how to specify the field names in the
schema.xml pls help me anybody 

entity processor=com.xxx.solr.handler.dataimport.FeedbackProcessor
url=http://test.xxx.com; appKey=qto9gjtI68pi7JRxVZ8Z
lastUpdate=${dataimporter.last_index_time} /
 entity processor=com.xxx.solr.handler.dataimport.AnswersProcessor
url=http://abcs.xxx.com; pageSize=500
lastUpdate=${dataimporter.last_index_time} /

--
View this message in context: 
http://lucene.472066.n3.nabble.com/external-jar-as-processor-tp3721915p3721915.html
Sent from the Solr - User mailing list archive at Nabble.com.


more sql-like commands for solr

2012-02-07 Thread Li Li
hi all,
we have used solr to provide searching service in many products. I
found for each product, we have to do some configurations and query
expressions.
our users are not used to this. they are familiar with sql and they may
describe like this: I want a query that can search books whose title
contains java, and I will group these books by publishing year and order by
matching score and freshness, the weight of score is 2 and the weight of
freshness is 1.
maybe they will be happy if they can use sql like statements to convey
their needs.
select * from books where title contains java group by pub_year order
by score^2, freshness^1
also they may like they can insert or delete documents by delete from
books where title contains java and pub_year between 2011 and 2012.
we can define some language similar to sql and translate the to solr
query string such as .../select/?q=+title:java^2 +pub_year:2011
this may be equivalent to apache hive for hadoop.


Re: Parallel indexing in Solr

2012-02-07 Thread Sami Siren
On Mon, Feb 6, 2012 at 5:55 PM, Per Steffensen st...@designware.dk wrote:
 Sami Siren skrev:

 On Mon, Feb 6, 2012 at 2:53 PM, Per Steffensen st...@designware.dk
 wrote:




 Actually right now, I am trying to find our what my bottleneck is. The
 setup
 is more complex, than I would bother you with, but basically I have
 servers
 with 80-90% IO-wait and only 5-10% real CPU usage. It might not be a
 Solr-related problem, I am investigating different things, but just
 wanted
 to know a little more about how Jetty/Solr works in order to make a
 qualified guess.



 What kind of/how many discs do you have for your shards? ..also what
 kind of server are you experimenting with?


 Grrr, thats where I have a little fight with operations. For now they gave
 me one (fairly big) machine with XenServer. I create my machines as Xen
 VM's on top of that. One of the things I dont like about this (besides that
 I dont trust Xen to do its virtualization right, or at least not provide me
 with correct readings on IO) is that disk space is assigned from an iSCSI
 connected SAN that they all share (including the line out there). But for
 now actually it doesnt look like disk IO problems. It looks like
 networks-bottlenecks (but to some extend they all also shard network) among
 all the components in our setup - our client plus Lily stack (HDFS, HBase,
 ZK, Lily Server, Solr etc). Well it is complex, but anyways ...


You could try to isolate the bottleneck by testing the indexing speed
from the local machine hosting Solr. Also tools like iostat or sar
might give you more details about the disk side.

--
 Sami Siren


Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Robert Brown
I'm still finding matches across newlines

index...

i am fluent
german racing

search...

fluent german 

Any suggestions?  I've currently got this in wdftypes.txt for
WordDelimiterfilterfactory


\u000A = ALPHANUM
\u000B = ALPHANUM
\u000C = ALPHANUM
\u000D = ALPHANUM
# \u000D\u000A = ALPHA
\u0085 = ALPHANUM
\u2028 = ALPHANUM
\u2029 = ALPHANUM

\u2424 = ALPHANUM





---

IntelCompute
Web Design  Local Online Marketing

http://www.intelcompute.com

On Mon, 6 Feb 2012 04:10:18 -0800 (PST), Ahmet Arslan
iori...@yahoo.com wrote:
 My fear is what will then happen with
 highlighting if I use re-mapping?
 
 What do you mean by re-mapping?



Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Ahmet Arslan
 I'm still finding matches across
 newlines
 
 index...
 
 i am fluent
 german racing
 
 search...
 
 fluent german 
 
 Any suggestions?  

You can use a multiValued field for this. Split your document according to new 
line at client side.

arri am fluent/arr
arrgerman racing/arr

positionIncrementGap=100 will prevent query fluent german to match.

Or, may be you can inject artificial tokens via 

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory

Your document becomes : i am fluent NEWLINE german racing


Typical Cache Values

2012-02-07 Thread Pranav Prakash
Based on the hit ratio of my caches, they seem to be pretty low. Here they
are. What are typical values of yours production setup? What are some of
the things that can be done to improve the ratios?

queryResultCache

lookups : 3234602
hits : 496
hitratio : 0.00
inserts : 3234239
evictions : 3230143
size : 4096
warmupTime : 8886
cumulative_lookups : 3465734
cumulative_hits : 526
cumulative_hitratio : 0.00
cumulative_inserts : 3465208
cumulative_evictions : 3457151


documentCache

lookups : 17647360
hits : 11935609
hitratio : 0.67
inserts : 5711851
evictions : 5707755
size : 4096
warmupTime : 0
cumulative_lookups : 19009142
cumulative_hits : 12813630
cumulative_hitratio : 0.67
cumulative_inserts : 6195512
cumulative_evictions : 6187460


fieldValueCache

lookups : 0
hits : 0
hitratio : 0.00
inserts : 0
evictions : 0
size : 0
warmupTime : 0
cumulative_lookups : 0
cumulative_hits : 0
cumulative_hitratio : 0.00
cumulative_inserts : 0
cumulative_evictions : 0


filterCache

lookups : 30059278
hits : 28813869
hitratio : 0.95
inserts : 1245744
evictions : 1245232
size : 512
warmupTime : 28005
cumulative_lookups : 32155745
cumulative_hits : 30845811
cumulative_hitratio : 0.95
cumulative_inserts : 1309934
cumulative_evictions : 1309245




*Pranav Prakash*

temet nosce

Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
Google http://www.google.com/profiles/pranny


Re: Parallel indexing in Solr

2012-02-07 Thread Per Steffensen



You could try to isolate the bottleneck by testing the indexing speed
from the local machine hosting Solr. Also tools like iostat or sar
might give you more details about the disk side.
  
Yes, I am doing different stuff to isolate bottleneck. Im also profiling 
JVM. And I am using iostat, top and sar already. Thanks.
This questions was originally just to get an early indication of whether 
or not Jetty was at all designed for parallel production-like 
processing. Now I believe it is, until I prove that it does not live up 
to my requirements.

Thanks!

--
 Sami Siren

  




Re: Symbols in synonyms

2012-02-07 Thread Erick Erickson
You're probably looking at a custom tokenizer and/or filter chain here. Or
at least creatively combining the ones that exist. The admin/analysis
page will be your friend.

Even if you define these as synonyms, the rest of the analysis chain may
break them up so you really have to look at the effects of the entire
analysis chain. I'd start with a really simple one (not the stock ones) and
build up. Especially beware of WordDelimiterFilterFactory for instance

Best
Erick

On Mon, Feb 6, 2012 at 4:39 AM, Robert Brown r...@intelcompute.com wrote:
 is it good practice, common, or even possible to put symbols in my list of
 synonyms?

 I'm having trouble indexing and searching for AE, with it being split on
 the .

 we already convert .net to dotnet, but don't want to store every combination
 of 2 letters, AE, ME, etc.




 --

 IntelCompute
 Web Design  Local Online Marketing

 http://www.intelcompute.com



Re: Phonetic search and matching

2012-02-07 Thread Erick Erickson
What happens if you do NOT inject? Setting  inject=false
stores only the phonetic reduction, not the original text. In that
case your false match on 13 would go away

Not sure what that means for the rest of your app though.

Best
Erick

On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann
dirk.hoegem...@googlemail.com wrote:
 Hi,

 I have a question on phonetic search and matching in solr.
 In our application all the content of an article is written to a full-text
 search field, which provides stemming and a phonetic filter (cologne
 phonetic for german).
 This is the relevant part of the configuration for the index analyzer
 (search is analogous):

        tokenizer class=solr.StandardTokenizerFactory/
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
        filter class=solr.LowerCaseFilterFactory/
        filter class=solr.SnowballPorterFilterFactory language=German2
 /
        filter class=solr.PhoneticFilterFactory
 encoder=ColognePhonetic inject=true/
        filter class=solr.RemoveDuplicatesTokenFilterFactory /

 Unfortunately this results sometimes in strange, but also explainable,
 matches.
 For example:

 Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.

 This results in a match, if we search for puf  as the result of the
 phonetic filter for this is 13.
 (As a consequence the 13 is then also highlighted)

 Does anyone has an idea how to handle this in a reasonable way that a
 search for puf does not match 13 in the content?

 Thanks in advance!

 Dirk


Re: Improving performance for SOLR geo queries?

2012-02-07 Thread Erick Erickson
So the obvious question is what is your
performance like without the distance filters?

Without that knowledge, we have no clue whether
the modifications you've made had any hope of
speeding up your response times

As for the docs, any improvements you'd like to
contribute would be happily received

Best
Erick

2012/2/6 Matthias Käppler matth...@qype.com:
 Hi,

 we need to perform fast geo lookups on an index of ~13M places, and
 were running into performance problems here with SOLR. We haven't done
 a lot of query optimization / SOLR tuning up until now so there's
 probably a lot of things we're missing. I was wondering if you could
 give me some feedback on the way we do things, whether they make
 sense, and especially why a supposed optimization we implemented
 recently seems to have no effect, when we actually thought it would
 help a lot.

 What we do is this: our API is built on a Rails stack and talks to
 SOLR via a Ruby wrapper. We have a few filters that almost always
 apply, which we put in filter queries. Filter cache hit rate is
 excellent, about 97%, and cache size caps at 10k filters (max size is
 32k, but it never seems to reach that many, probably because we
 replicate / delta update every few minutes). Still, geo queries are
 slow, about 250-500msec on average. We send them with cache=false, so
 as to not flood the fq cache and cause undesirable evictions.

 Now our idea was this: while the actual geo queries are poorly
 cacheable, we could clearly identify geographical regions which are
 more often queried than others (naturally, since we're a user driven
 service). Therefore, we dynamically partition Earth into a static grid
 of overlapping boxes, where the grid size (the distance of the nodes)
 depends on the maximum allowed search radius. That way, for every user
 query, we would always be able to identify a single bounding box that
 covers it. This larger bounding box (200km edge length) we would send
 to SOLR as a cached filter query, along with the actual user query
 which would still be sent uncached. Ex:

 User asks for places in 10km around 49.14839,8.5691, then what we will
 send to SOLR is something like this:

 fq={!bbox cache=false d=10 sfield=location_ll pt=49.14839,8.5691}
 fq={!bbox cache=true d=100.0 sfield=location_ll
 pt=49.4684836290799,8.31165802979391} -- this one we derive
 automatically

 That way SOLR would intersect the two filters and return the same
 results as when only looking at the smaller bounding box, but keep the
 larger box in cache and speed up subsequent geo queries in the same
 regions. Or so we thought; unfortunately this approach did not help
 query execution times get better, at all.

 Question is: why does it not help? Shouldn't it be faster to search on
 a cached bbox with only a few hundred thousand places? Is it a good
 idea to make these kinds of optimizations in the app layer (we do this
 as part of resolving the SOLR query in Ruby), and does it make sense
 at all? We're not sure what kind of optimizations SOLR already does in
 its query planner. The documentation is (sorry) miserable, and
 debugQuery yields no insight into which optimizations are performed.
 So this has been a hit and miss game for us, which is very ineffective
 considering that it takes considerable time to build these kinds of
 optimizations in the app layer.

 Would be glad to hear your opinions / experience around this.

 Thanks!

 --
 Matthias Käppler
 Lead Developer API  Mobile

 Qype GmbH
 Großer Burstah 50-52
 20457 Hamburg
 Telephone: +49 (0)40 - 219 019 2 - 160
 Skype: m_kaeppler
 Email: matth...@qype.com

 Managing Director: Ian Brotherston
 Amtsgericht Hamburg
 HRB 95913

 This e-mail and its attachments may contain confidential and/or
 privileged information. If you are not the intended recipient (or have
 received this e-mail in error) please notify the sender immediately
 and destroy this e-mail and its attachments. Any unauthorized copying,
 disclosure or distribution of this e-mail and  its attachments is
 strictly forbidden. This notice also applies to future messages.


Re: Realtime profile data

2012-02-07 Thread Erick Erickson
You have several options:
1 if you can go to trunk (bleeding edge, I admit), you can
 get into the near real time (NRT) stuff.
2 You could maintain essentially a post-filter step where
  your app maintains a list of deleted messages and
 removes them from the response. This will cause
 some of your counts (e.g. facets, grouping) to be slightly
 off
3 Train your users to expect whatever latency you've
  built into the system (i.e. indexing, commit and replication)

Best
Erick

On Mon, Feb 6, 2012 at 10:42 AM, Pawel Rog pawelro...@gmail.com wrote:
 Hello. I have some problem which i'd like to solve using solr. I have
 user profile which has some kind of messages in it. User can filter
 messages, sort them etc. The problem is with delete operation. If user
 click on message to delete it it's very hard to update index of solr
 in real time. When user deletes message, it will be still visible.
 Have you idea how to solve problem with removing data?


Re: Commit call - ReadTimeoutException - usage scenario for big update requests and the ioexception case

2012-02-07 Thread Erick Erickson
Right, I suspect you're hitting merges. How often are you
committing? In other words, why are you committing explicitly?
It's often better to use commitWithin on the add command
and just let Solr do its work without explicitly committing.

Going forward, this is fixed in trunk by the DocumentWriterPerThread
improvements.

Best
Erick

On Mon, Feb 6, 2012 at 11:09 AM, Torsten Krah
tk...@fachschaft.imn.htwk-leipzig.de wrote:
 Hi,

 i wonder if it is possible to commit data to solr without having to
 catch SockedReadTimeout Exceptions.

 I am calling commit(false, false) using a streaming server instance -
 but i still have to wait  30 seconds and catch the timeout from http
 method.
 I does not matter if its 30 or 60, it will fail depending on how long it
 takes until the update request is processed, or can i tweak things here?

 So whats the way to go here? Any other option or must i fetch those
 exception and go on like done now.
 The operation itself does finish successful - later on when its done -
 on server side and all stuff is committed and searchable.


 regards

 Torsten


Re: Typical Cache Values

2012-02-07 Thread Erick Erickson
See below...

On Tue, Feb 7, 2012 at 8:21 AM, Pranav Prakash pra...@gmail.com wrote:
 Based on the hit ratio of my caches, they seem to be pretty low. Here they
 are. What are typical values of yours production setup? What are some of
 the things that can be done to improve the ratios?

 queryResultCache

 lookups : 3234602
 hits : 496
 hitratio : 0.00
 inserts : 3234239
 evictions : 3230143
 size : 4096
 warmupTime : 8886
 cumulative_lookups : 3465734
 cumulative_hits : 526
 cumulative_hitratio : 0.00
 cumulative_inserts : 3465208
 cumulative_evictions : 3457151



This is not unusual, but there's also not much reason to give this much
memory in your case. This is the cache that is hit when a user pages
through result set. Your numbers would seem to indicate one of two things:
1 your window is smaller than 2 pages, see solrconfig.xml,
queryResultWindowSize
or
2 your users are rarely going to the next page.

this cache isn't doing you much good, but then it's also not using that
much in the way of resources.

 documentCache

 lookups : 17647360
 hits : 11935609
 hitratio : 0.67
 inserts : 5711851
 evictions : 5707755
 size : 4096
 warmupTime : 0
 cumulative_lookups : 19009142
 cumulative_hits : 12813630
 cumulative_hitratio : 0.67
 cumulative_inserts : 6195512
 cumulative_evictions : 6187460


Again, this is actually quite reasonable. This cache
is used to hold document data, and often doesn't have
a great hit ratio. It is necessary though, it saves quite
a bit of disk seeks when servicing a single query.


 fieldValueCache

 lookups : 0
 hits : 0
 hitratio : 0.00
 inserts : 0
 evictions : 0
 size : 0
 warmupTime : 0
 cumulative_lookups : 0
 cumulative_hits : 0
 cumulative_hitratio : 0.00
 cumulative_inserts : 0
 cumulative_evictions : 0


Not doing much in the way of faceting, are you?


 filterCache

 lookups : 30059278
 hits : 28813869
 hitratio : 0.95
 inserts : 1245744
 evictions : 1245232
 size : 512
 warmupTime : 28005
 cumulative_lookups : 32155745
 cumulative_hits : 30845811
 cumulative_hitratio : 0.95
 cumulative_inserts : 1309934
 cumulative_evictions : 1309245



Not a bad hit ratio here, this is where
fq filters are stored. One caution here;
it is better to break out your filter
queries where possible into small chunks.
Rather than write fq=field1:val1 AND field2:val2,
it's better to write fq=field1:val1fq=field2:val2
Think of this cache as a map with the query
as the key. If you write the fq the first way above,
subsequent fqs for either half won't use the cache.

Best
Erick



 *Pranav Prakash*

 temet nosce

 Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com |
 Google http://www.google.com/profiles/pranny


Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Shyam Bhaskaran
Hi,

We are using Solr 4.0 along with FVH and there is an issue we are facing while 
highlighting.
For our requirement we want the highlighted search result should start with the 
beginning of the sentence and needed help to get this done.

As of now this is not happening and the highlighted output comes up first in 
most scenarios.

I have tried using the parameter boundaryScanner but still not getting the 
desired required result.

Below is the configuration we are using.

   boundaryScanner name=simple class=solr.highlight.SimpleBoundaryScanner 
default=true
 lst name=defaults
   str name=hl.bs.maxScan10/str
   str name=hl.bs.chars.,!? #9;#10;#13;/str
 /lst
   /boundaryScanner

I need help in getting the display of highlighted search result and it should 
start with the beginning of the sentence that contains the search string.

-Shyam


Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Koji Sekiguchi

(12/02/08 0:50), Shyam Bhaskaran wrote:

Hi,

We are using Solr 4.0 along with FVH and there is an issue we are facing while 
highlighting.
For our requirement we want the highlighted search result should start with the 
beginning of the sentence and needed help to get this done.

As of now this is not happening and the highlighted output comes up first in 
most scenarios.

I have tried using the parameter boundaryScanner but still not getting the 
desired required result.

Below is the configuration we are using.

boundaryScanner name=simple class=solr.highlight.SimpleBoundaryScanner 
default=true
  lst name=defaults
str name=hl.bs.maxScan10/str
str name=hl.bs.chars.,!?#9;#10;#13;/str
  /lst
/boundaryScanner

I need help in getting the display of highlighted search result and it should 
start with the beginning of the sentence that contains the search string.


Please provide more detail info, e.g. field data that you indexed and 
undesirable snippet
you currently got.

And have you tried BreakIteratorBoundaryScanner with hl.bs.type=SENTENCE?

koji
--
http://www.rondhuit.com/en/


RE: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Shyam Bhaskaran
Hi Koji,

I have tried using hl.bs.type=SENTENCE and still no improvement.

We are storing PDF extracted content in the field which has termVectors enabled.

Example the field contains the following data extracted from PDF 

User-defined resolution functions. The synthesis tool only supports the
resolution functions for std_logic and std_logic_vector.

Slices with range indices that do not evaluate to constants 

When I search for the term std_logic - following is the highlighted snippet 
displayed

functions for emstd_logic/em and std_logic_vector. * Slices with range 
indices that do not evaluate to constants


As you can see the highlighted term does not start from the beginning of 
sentence, why is this and how can I achieve this.


-Shyam 


Re: Realtime profile data

2012-02-07 Thread Pawel Rog
Thank you. I'll try NRT and some post-filter :)


On Tue, Feb 7, 2012 at 3:09 PM, Erick Erickson erickerick...@gmail.com wrote:
 You have several options:
 1 if you can go to trunk (bleeding edge, I admit), you can
     get into the near real time (NRT) stuff.
 2 You could maintain essentially a post-filter step where
      your app maintains a list of deleted messages and
     removes them from the response. This will cause
     some of your counts (e.g. facets, grouping) to be slightly
     off
 3 Train your users to expect whatever latency you've
      built into the system (i.e. indexing, commit and replication)

 Best
 Erick

 On Mon, Feb 6, 2012 at 10:42 AM, Pawel Rog pawelro...@gmail.com wrote:
 Hello. I have some problem which i'd like to solve using solr. I have
 user profile which has some kind of messages in it. User can filter
 messages, sort them etc. The problem is with delete operation. If user
 click on message to delete it it's very hard to update index of solr
 in real time. When user deletes message, it will be still visible.
 Have you idea how to solve problem with removing data?


Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Robert Brown
This all seems a bit too much work for such a real-world scenario?


---

IntelCompute
Web Design  Local Online Marketing

http://www.intelcompute.com


On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan
iori...@yahoo.com wrote:
 I'm still finding matches across
 newlines

 index...

 i am fluent
 german racing

 search...

 fluent german

 Any suggestions? 
 
 You can use a multiValued field for this. Split your document
 according to new line at client side.
 
 arri am fluent/arr
 arrgerman racing/arr
 
 positionIncrementGap=100 will prevent query fluent german to match.
 
 Or, may be you can inject artificial tokens via 
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory
 
 Your document becomes : i am fluent NEWLINE german racing



Re: solrcore.properties

2012-02-07 Thread darul

Walter Underwood wrote
 
 Looking at SOLR-1335 and the wiki, I'm not quite sure of the final
 behavior for this.
 
 These properties are per-core, and not visible in other cores, right?
 
 

yes it is.


Walter Underwood wrote
 
 
 Are variables substituted in solr.xml, so I can swap in different
 properties files for dev, test, and prod? Like this:
 
 core name=mary properties=conf/solrcore-${env:dev}.properties/
 
 If that does not work, what are the best practices for managing
 dev/test/prod configs for Solr?
 
 

As you can see here http://wiki.apache.org/solr/CoreAdmin I am not sure you
can set a property file to be loaded per core with this variable syntax.
Does someone may confirm ?

What we have made here is a maven project, some variable properties in
.properties or .xml solr configuration files. Then while generating project,
we use maven profile to generate dev/prod...distribution.

Wish it can help you,

Jul


--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcore-properties-tp3720446p3723212.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Commit call - ReadTimeoutException - usage scenario for big update requests and the ioexception case

2012-02-07 Thread Torsten Krah

Am 07.02.2012 15:12, schrieb Erick Erickson:

Right, I suspect you're hitting merges.


Guess so.

How often are you

committing?


One time, after all work is done.

In other words, why are you committing explicitly?

It's often better to use commitWithin on the add command
and just let Solr do its work without explicitly committing.


Tika does extract my docs and i'll fetch the results (memory, disk) - 
externally.
I all went ok like expected, i'll take those docs and add it to my solr 
server instance.
After i am done with add + deletes i'll do commit. One commit for all 
those docs - adding and deleting.
If something went wrong before or between adding, update or deleting 
docs, i do call rollback and all is like before (i am doing the update 
from one source only so i can be sure that no one can call commit in 
between).


CommitWithin will break my possibility to rollback things, that why i 
want to explicitly call commit here.




Going forward, this is fixed in trunk by the DocumentWriterPerThread
improvements.


Will this be backported to upcoming 3.6?



Best
Erick

On Mon, Feb 6, 2012 at 11:09 AM, Torsten Krah
tk...@fachschaft.imn.htwk-leipzig.de  wrote:

Hi,

i wonder if it is possible to commit data to solr without having to
catch SockedReadTimeout Exceptions.

I am calling commit(false, false) using a streaming server instance -
but i still have to wait  30 seconds and catch the timeout from http
method.
I does not matter if its 30 or 60, it will fail depending on how long it
takes until the update request is processed, or can i tweak things here?

So whats the way to go here? Any other option or must i fetch those
exception and go on like done now.
The operation itself does finish successful - later on when its done -
on server side and all stuff is committed and searchable.


regards

Torsten





smime.p7s
Description: S/MIME Kryptografische Unterschrift


Re: Phonetic search and matching

2012-02-07 Thread Dirk Högemann
Thanks Erick.
In the first place we thought of removing numbers with a pattern filter.
Setting inject to false will have the same effect
If we want to be able to search for numbers in the content this solution
will not work,but another field without phonetic filtering and searching in
both fields would be ok,right?

Dirk
Am 07.02.2012 14:01 schrieb Erick Erickson erickerick...@gmail.com:

 What happens if you do NOT inject? Setting  inject=false
 stores only the phonetic reduction, not the original text. In that
 case your false match on 13 would go away

 Not sure what that means for the rest of your app though.

 Best
 Erick

 On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann
 dirk.hoegem...@googlemail.com wrote:
  Hi,
 
  I have a question on phonetic search and matching in solr.
  In our application all the content of an article is written to a
 full-text
  search field, which provides stemming and a phonetic filter (cologne
  phonetic for german).
  This is the relevant part of the configuration for the index analyzer
  (search is analogous):
 
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory
 language=German2
  /
 filter class=solr.PhoneticFilterFactory
  encoder=ColognePhonetic inject=true/
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
  Unfortunately this results sometimes in strange, but also explainable,
  matches.
  For example:
 
  Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.
 
  This results in a match, if we search for puf  as the result of the
  phonetic filter for this is 13.
  (As a consequence the 13 is then also highlighted)
 
  Does anyone has an idea how to handle this in a reasonable way that a
  search for puf does not match 13 in the content?
 
  Thanks in advance!
 
  Dirk



Missing search result...

2012-02-07 Thread Tim Hibbs
Hi, all...

I have a small problem retrieving the full set of query responses I need
and would appreciate any help.

I have a query string as follows:

+((Title:sales) (+Title:sales) (TOC:sales) (+TOC:sales)
(Keywords:sales) (+Keywords:sales) (text:sales) (+text:sales)
(sales)) +(RepType:WRO Revenue Services) +(ContentType:SOP
ContentType:Key Concept) -(Topics:Backup)

The query is intended to be:

MUST have at least one of:
- exact phrase in field Title
- all of the phrase words in field Title
- exact phrase in field TOC
- all of the phrase words in field TOC
- exact phrase in field Keywords
- all of the phrase words in field Keywords
- exact phrase in field text
- all of the phrase words in field text
- any of the phrase words in field text

MUST have WRO Revenue Services in field RepType
MUST have at least one of:
- SOP in field ContentType
- Key Concept in field ContentType
MUST NOT have Backup in field Topics

It's almost working, but it misses a couple of items that contain a
single occurrence of the word sale in a indexed field. The indexed
field containing that single occurrence is named UrlContent.

schema.xml

UrlContent is defined as:
field name=UrlContent type=text indexed=true stored=false
required=false omitNorms=false/

Copyfields are as follows:
copyField source=Title dest=text/
copyField source=Keywords dest=text/
copyField source=TOC dest=text/
copyField source=Overview dest=text/
copyField source=UrlContent dest=text/

Thanks,
Tim Hibbs


RE: Multi word synonyms

2012-02-07 Thread Zac Smith
I suppose I could translate every user query to include the term with quotes.

e.g. if someone searches for stock syrup I send a query like:
q=stock syrup OR stock syrup

Seems like a bit of a hack though, is there a better way of doing this?

Zac

-Original Message-
From: Zac Smith 
Sent: Sunday, February 05, 2012 7:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Thanks for the response. This almost worked, I created a new field using the 
KeywordTokenizerFactory as you suggested. The only problem was that searches 
only found documents when quotes were used. 
E.g. 
synonyms.txt setup like this:
simple syrup,sugar syrup,stock syrup

I indexed a document with the value 'simple syrup'. Searches only found the 
document when using quotes:
e.g.
simple syrup or stock syrup matched
simple syrup (no quotes) did not match

Here is the field I created:
fieldType name=synonym_searcher class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt /  
tokenizer class=solr.KeywordTokenizerFactory 
/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true 
tokenizerFactory=solr.KeywordTokenizerFactory /
filter class=solr.LowerCaseFilterFactory /  

/analyzer
analyzer type=query
charFilter 
class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt /
tokenizer class=solr.KeywordTokenizerFactory /  

filter class=solr.LowerCaseFilterFactory /  

/analyzer
/fieldType

Any ideas? Also, I am using dismax and solr 3.5.0.

Thanks
Zac

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Sunday, February 05, 2012 5:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Multi word synonyms

Your query analyser will tokenize simple sirup into simple and sirup
and wont match on simple syrup in the synonyms.txt

So you have to change the query analyzer into KeywordTokenizerFactory as well.

It might be idea to make a field for synonyms only with this tokenizer and 
another field to search on and use dismax. Never tried this though.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3717215.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Erick Erickson
Well, this is a common approach. Someone has to split up the
input as sentences (whatever they are). Putting them in multi-valued
fields is trivial.

Then you confine things to within sentences, then you start searching
phrases with a slop less than your incrementGap...

Best
Erick

On Tue, Feb 7, 2012 at 12:27 PM, Robert Brown r...@intelcompute.com wrote:
 This all seems a bit too much work for such a real-world scenario?


 ---

 IntelCompute
 Web Design  Local Online Marketing

 http://www.intelcompute.com


 On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan
 iori...@yahoo.com wrote:
 I'm still finding matches across
 newlines

 index...

 i am fluent
 german racing

 search...

 fluent german

 Any suggestions?

 You can use a multiValued field for this. Split your document
 according to new line at client side.

 arri am fluent/arr
 arrgerman racing/arr

 positionIncrementGap=100 will prevent query fluent german to match.

 Or, may be you can inject artificial tokens via

 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory

 Your document becomes : i am fluent NEWLINE german racing



Re: Phonetic search and matching

2012-02-07 Thread Erick Erickson
Yes, you could do that. I guess numbers will give you trouble
under all circumstances.

You may be able to do something like search against your non-
phonetic field with higher boosts to preferentially do those
matches.

Best
Erick

On Tue, Feb 7, 2012 at 2:30 PM, Dirk Högemann
dirk.hoegem...@googlemail.com wrote:
 Thanks Erick.
 In the first place we thought of removing numbers with a pattern filter.
 Setting inject to false will have the same effect
 If we want to be able to search for numbers in the content this solution
 will not work,but another field without phonetic filtering and searching in
 both fields would be ok,right?

 Dirk
 Am 07.02.2012 14:01 schrieb Erick Erickson erickerick...@gmail.com:

 What happens if you do NOT inject? Setting  inject=false
 stores only the phonetic reduction, not the original text. In that
 case your false match on 13 would go away

 Not sure what that means for the rest of your app though.

 Best
 Erick

 On Mon, Feb 6, 2012 at 5:44 AM, Dirk Högemann
 dirk.hoegem...@googlemail.com wrote:
  Hi,
 
  I have a question on phonetic search and matching in solr.
  In our application all the content of an article is written to a
 full-text
  search field, which provides stemming and a phonetic filter (cologne
  phonetic for german).
  This is the relevant part of the configuration for the index analyzer
  (search is analogous):
 
         tokenizer class=solr.StandardTokenizerFactory/
         filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=0
  catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
         filter class=solr.LowerCaseFilterFactory/
         filter class=solr.SnowballPorterFilterFactory
 language=German2
  /
         filter class=solr.PhoneticFilterFactory
  encoder=ColognePhonetic inject=true/
         filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
  Unfortunately this results sometimes in strange, but also explainable,
  matches.
  For example:
 
  Content field indexes the following String: Donnerstag von 13 bis 17 Uhr.
 
  This results in a match, if we search for puf  as the result of the
  phonetic filter for this is 13.
  (As a consequence the 13 is then also highlighted)
 
  Does anyone has an idea how to handle this in a reasonable way that a
  search for puf does not match 13 in the content?
 
  Thanks in advance!
 
  Dirk



RE: Multi word synonyms

2012-02-07 Thread O. Klein
Isn't that what autoGeneratePhraseQueries=true is for?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3723886.html
Sent from the Solr - User mailing list archive at Nabble.com.


I want to specify multiple facet prefixes per field

2012-02-07 Thread Yuhao
I simulated a hierarchical faceting browsing scheme using facet.prefix.  
However, it seems there can only be one facet.prefix per field.  For OR 
queries, the browsing scheme requires multiple facet prefixes.  For example:

fq=facet1:term1 OR facet1:term2 OR facet1:term3

Something like the above is very powerful.  For the hierarchical browsing, at 
this point what I want is to show the child terms (one level down) of term1, 
term2 and term3 (but not term4, term5 or term6).  Now, if I add a facet.prefix, 
say f.facet1.facet.prefix=term1, it would give me all the child terms of 
term1, but I also want the children of child 2 and child 3.

So what I want is to be able to do something like this: 
f.facet1.facet.prefix=term1 OR term2 OR term3. 

Is there a way to accomplish what I'm looking for?


RE: Multi word synonyms

2012-02-07 Thread Zac Smith
It doesn't seem to do it for me. My field type is:
fieldType name=synonym_searcher class=solr.TextField 
positionIncrementGap=100 autoGeneratePhraseQueries=true
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory 
/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true 
tokenizerFactory=solr.KeywordTokenizerFactory /
/analyzer
analyzer type=query 
tokenizer class=solr.KeywordTokenizerFactory /  
  
/analyzer
/fieldType

I am using edismax and solr 3.5 and multi word values can only be matched when 
using quotes.

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 07, 2012 12:49 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Isn't that what autoGeneratePhraseQueries=true is for?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3723886.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Erik Hatcher
A custom tokenizer/tokenfilter could set the position increment when a newline 
comes through as well. 

   Erik

On Feb 7, 2012, at 15:28, Erick Erickson erickerick...@gmail.com wrote:

 Well, this is a common approach. Someone has to split up the
 input as sentences (whatever they are). Putting them in multi-valued
 fields is trivial.
 
 Then you confine things to within sentences, then you start searching
 phrases with a slop less than your incrementGap...
 
 Best
 Erick
 
 On Tue, Feb 7, 2012 at 12:27 PM, Robert Brown r...@intelcompute.com wrote:
 This all seems a bit too much work for such a real-world scenario?
 
 
 ---
 
 IntelCompute
 Web Design  Local Online Marketing
 
 http://www.intelcompute.com
 
 
 On Tue, 7 Feb 2012 05:11:01 -0800 (PST), Ahmet Arslan
 iori...@yahoo.com wrote:
 I'm still finding matches across
 newlines
 
 index...
 
 i am fluent
 german racing
 
 search...
 
 fluent german
 
 Any suggestions?
 
 You can use a multiValued field for this. Split your document
 according to new line at client side.
 
 arri am fluent/arr
 arrgerman racing/arr
 
 positionIncrementGap=100 will prevent query fluent german to match.
 
 Or, may be you can inject artificial tokens via
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceCharFilterFactory
 
 Your document becomes : i am fluent NEWLINE german racing
 


RE: Multi word synonyms

2012-02-07 Thread O. Klein
Well, if you want both multi word and single words I guess you will have to
create another field :) Or make queries like you suggested.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3724009.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Multi word synonyms

2012-02-07 Thread Zac Smith
Are you able to explain how I would create another field to fit my scenario?

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 07, 2012 1:28 PM
To: solr-user@lucene.apache.org
Subject: RE: Multi word synonyms

Well, if you want both multi word and single words I guess you will have to 
create another field :) Or make queries like you suggested.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-word-synonyms-tp3716292p3724009.html
Sent from the Solr - User mailing list archive at Nabble.com.




URI Encoding with Solr and Weblogic

2012-02-07 Thread Elisabeth Adler

Hi,

I try to get Solr 3.3.0 to process Arabic search requests using its 
admin interface. I have successfully managed to set it up on Tomcat 
using the URIEncoding attribute but fail miserably on WebLogic 10.


Invoking the URL http://localhost:7012/solr/select/?q=?  returns the 
XML below:

response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
str name=qتÙ?Ù?ئة/str
/lst
/lst
result name=response numFound=0 start=0/
/response

The search term is just gibberish. Running the query through Luke or 
Tomcat returns the expected result and renders the search term correctly.


I have tried to change the URI encoding and JVM default encoding by 
setting the following start up arguments in WebLogic: 
-Dfile.encoding=UTF-8 -Dweblogic.http.URIDecodeEncoding=UTF-8. I can see 
them being set through Solr's admin interface. They don't have any 
impact though.


I am running out of ideas on how to get this working. Any thoughts and 
pointers are much appreciated.


Thanks,
Elisabeth



Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Koji Sekiguchi

(12/02/08 1:54), Shyam Bhaskaran wrote:

Hi Koji,

I have tried using hl.bs.type=SENTENCE and still no improvement.

We are storing PDF extracted content in the field which has termVectors enabled.

Example the field contains the following data extracted from PDF

User-defined resolution functions. The synthesis tool only supports the
resolution functions for std_logic and std_logic_vector.

Slices with range indices that do not evaluate to constants 

When I search for the term std_logic - following is the highlighted snippet 
displayed

functions foremstd_logic/em  and std_logic_vector. * Slices with range indices 
that do not evaluate to constants


As you can see the highlighted term does not start from the beginning of 
sentence, why is this and how can I achieve this.


Hi Shyam,

Can you try to set hl.bs.chars=.!? and hl.bs.maxScan=100 or larger number.
SimpleBoudaryScanner will scan the stored data to back and forth from the
highlighted terms until meet those setting.

http://wiki.apache.org/solr/HighlightingParameters#hl.bs.maxScan

koji
--
http://www.rondhuit.com/en/


Re: Which Tokeniser (and/or filter)

2012-02-07 Thread Chris Hostetter

: This all seems a bit too much work for such a real-world scenario?

You haven't really told us what your scenerio is.

You said you want to split tokens on whitespace, full-stop (aka: 
period) and comma only, but then in response to some suggestions you added 
comments other things that you never mentioned previously...

1) evidently you don't want the . in foo.net to cause a split in tokens?
2) evidently you not only want token splits on newlines, but also 
positition gaps to prevent phrases matching across newlines.

...these are kind of important details that affect suggestions people 
might give you.

can you please provide some concrete examples of hte types of data you 
have, the types of queries you want them to match, and the types of 
queries you *don't* want to match?


-Hoss


struggling with solr.WordDelimiterFilterFactory and periods . or dots

2012-02-07 Thread geeky2
hello all,

i am struggling with getting solr.WordDelimiterFilterFactory to behave as is
indicated in the solr book (Smiley) on page 54.

the example in the books reads like this:


Here is an example exercising all options:
WiFi-802.11b to Wi, Fi, WiFi, 802, 11, 80211, b, WiFi80211b


essentially - i have the same requirement with embedded periods and need to
return a successful search on a field, even if the user does NOT enter the
period.

i have a field, itemNo that can contain periods ..

example content in the itemNo field:

B12.0123

when the user searches on this field, they need to be able to enter an
itemNo without the period, and still find the item.

example:

user enters: B120123 and a document is returned with B12.0123.


unfortunately, the search will NOT return the appropriate document, if the
user enters B120123.

however - the search does work if the user enters B12 0123 (a space in place
of the period).

can someone help me understand what is missing from my configuration?


this is snipped from my schema.xml file


  fields
 ...
field name=itemNo type=text indexed=true stored=true/
 ...
  /fields




fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
*filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/*
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.KeywordMarkerFilterFactory
protected=protwords.txt/
filter class=solr.PorterStemFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/struggling-with-solr-WordDelimiterFilterFactory-and-periods-or-dots-tp3724822p3724822.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Koji Sekiguchi
It seems a bug to me
Can you open a ticket? Thank you

Koji Sekiguchi from iPhone

On 2012/02/08, at 13:32, Shyam Bhaskaran shyam.bhaska...@synopsys.com wrote:

 Hi Koji,
 
 Thanks for the response when I use hl.bs.chars=.!? and hl.bs.maxScan=200 I 
 see improvements, below is the highlighted value
 
 The synthesis tool only supports the resolution functions for 
 emstd_logic/em and std_logic_vector.
 
 
 But in other cases I also see that some of the words break in between as 
 shown below
 
 Original text:  How Are Clock Gating Checks Inferred
 
 When searching for the term clock the highlighted text is displayed as show 
 below
 
 w Are emClock/em Gating Checks Inferred
 
 As you can see only w is displayed from the word How.
 
 This issue goes away when I use .bs.chars=.!? #9;#10;#13; but it creates 
 issue of highlighting not from the beginning of the sentence.
 
 Is there a way whereby I can have highlighting working in all cases.
 
 
 -Shyam
 


RE: Display of highlighted search result should start with the beginning of the sentence that contains the search string.

2012-02-07 Thread Shyam Bhaskaran
Hi Koji,

Thanks for the response when I use hl.bs.chars=.!? and hl.bs.maxScan=200 I 
see improvements, below is the highlighted value

The synthesis tool only supports the resolution functions for 
emstd_logic/em and std_logic_vector.


But in other cases I also see that some of the words break in between as shown 
below

Original text:  How Are Clock Gating Checks Inferred

When searching for the term clock the highlighted text is displayed as show 
below

w Are emClock/em Gating Checks Inferred

As you can see only w is displayed from the word How.

This issue goes away when I use .bs.chars=.!? #9;#10;#13; but it creates 
issue of highlighting not from the beginning of the sentence.

Is there a way whereby I can have highlighting working in all cases.


-Shyam



How to use nested query in fq?

2012-02-07 Thread Yandong Yao
Hi Guys,

I am using Solr 3.5, and would like to use a fq like
'getField(getDoc(uuid:workspace_${workspaceId})),  isPublic):true?

- workspace_${workspaceId}:  workspaceId is indexed field.
- getDoc(uuid:concat(workspace_, workspaceId):  return the document whose
uuid is workspace_${workspaceId}
- getField(getDoc(uuid:workspace_${workspaceId})),  isPublic):  return
the matched document's isPublic field

The use case is that I have workspace objects and workspace contains many
sub-objects, such as work files, comments, datasets and so on. And
workspace has a 'isPublic' field. If this field is true, then all
registered user could access this workspace and all its sub-objects.
Otherwise, only workspace member could access this workspace and its
sub-objects.

So I want to use fq to determine whether document in question belongs to
public workspace or not.  Is it possible?

If not, how to implement similar feature like this? implement a
ValueSourcePlugin? any guidance or example on this?

Or is there any better solutions?


It is possible to add 'isPublic' field to all sub-objects, while it makes
indexing update more complex. so try to find better solution.

Thanks very much in advance!

Regards,
Yandong


Re: is there any practice to load index into RAM to accelerate solr performance?

2012-02-07 Thread Lance Norskog
Experience has shown that it is much faster to run Solr with a small
amount of memory and let the rest of the ram be used by the operating
system disk cache. That is, the OS is very good at keeping the right
disk blocks in memory, much better than Solr.

How much RAM is in the server and how much RAM does the JVM get? How
big are the documents, and how large is the term index for your
searches? How many documents do you get with each search? And, do you
use filter queries- these are very powerful at limiting searches.

2012/2/7 James ljatreey...@163.com:
 Is there any practice to load index into RAM to accelerate solr performance?
 The over all documents is about 100 million. The search time around 100ms. I 
 am seeking some method to accelerate the respond time for solr.
 Just check that there is some practice use SSD disk. And SSD is also cost 
 much, just want to know is there some method like to load the index file in 
 RAM and keep the RAM index and disk index synchronized. Then I can search on 
 the RAM index.



-- 
Lance Norskog
goks...@gmail.com


Re: Typical Cache Values

2012-02-07 Thread Pranav Prakash

 *
 *
 This is not unusual, but there's also not much reason to give this much
 memory in your case. This is the cache that is hit when a user pages
 through result set. Your numbers would seem to indicate one of two things:
 1 your window is smaller than 2 pages, see solrconfig.xml,
queryResultWindowSize
 or
 2 your users are rarely going to the next page.

 this cache isn't doing you much good, but then it's also not using that
 much in the way of resources.



True it is. Although the queryResultWindowSize is 30, I will be reducing it
to 4 or so. And yes, we have observed that mostly people don't go beyond
the first page



  documentCache
 
  lookups : 17647360
  hits : 11935609
  hitratio : 0.67
  inserts : 5711851
  evictions : 5707755
  size : 4096
  warmupTime : 0
  cumulative_lookups : 19009142
  cumulative_hits : 12813630
  cumulative_hitratio : 0.67
  cumulative_inserts : 6195512
  cumulative_evictions : 6187460
 

 Again, this is actually quite reasonable. This cache
 is used to hold document data, and often doesn't have
 a great hit ratio. It is necessary though, it saves quite
 a bit of disk seeks when servicing a single query.

 
  fieldValueCache
 
  lookups : 0
  hits : 0
  hitratio : 0.00
  inserts : 0
  evictions : 0
  size : 0
  warmupTime : 0
  cumulative_lookups : 0
  cumulative_hits : 0
  cumulative_hitratio : 0.00
  cumulative_inserts : 0
  cumulative_evictions : 0
 

 Not doing much in the way of faceting, are you?


No. We don't facet results


 
  filterCache
 
  lookups : 30059278
  hits : 28813869
  hitratio : 0.95
  inserts : 1245744
  evictions : 1245232
  size : 512
  warmupTime : 28005
  cumulative_lookups : 32155745
  cumulative_hits : 30845811
  cumulative_hitratio : 0.95
  cumulative_inserts : 1309934
  cumulative_evictions : 1309245
 
 

 Not a bad hit ratio here, this is where
 fq filters are stored. One caution here;
 it is better to break out your filter
 queries where possible into small chunks.
 Rather than write fq=field1:val1 AND field2:val2,
 it's better to write fq=field1:val1fq=field2:val2
 Think of this cache as a map with the query
 as the key. If you write the fq the first way above,
 subsequent fqs for either half won't use the cache.


That was a great advise. We do use the former approach but going forward we
would stick to the latter one.

Thanks,

Pranav


Re: Chinese Phonetic search

2012-02-07 Thread Li Li
you can convert Chinese words to pinyin and use n-gram to search phonetic
similar words

On Wed, Feb 8, 2012 at 11:10 AM, Floyd Wu floyd...@gmail.com wrote:

 Hi there,

 Does anyone here ever implemented phonetic search especially with
 Chinese(traditional/simplified) using SOLR or Lucene?

 Please share some thought or point me a possible solution. (hint me search
 keywords)

 I've searched and read lot of related articles but have no luck.

 Many thanks.

 Floyd