date:20100928

RE: Is Solr right for my business situation ?

2010-09-28 Thread Sharma, Raghvendra

Thanks for the responses people.

@Grant  

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements. 

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.  

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable. 

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?
 
 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

 
 The existing spatial search has some serious problems and is deprecated.
 
 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.
 
 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

 
 wunder
 
 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion 
 and updation of data, the querying is where our main concerns lie.  Since we 
 have spatial data, the index building takes hours and hours for such tables.
 
 That's when we thought of moving away from standard rdbms and thought of 
 trying something different and fast. 
 My last week has been spent in a journey reading through bigtable to hadoop 
 to hbase, to hive and then finally landed on solr. As far as I am in my 
 tests, it looks pretty good, but I have a few unanswered questions still. 
 Trying this group for them  :)  (I am sure I can find some answers if I 
 read/google more on the topic, but now I m being lazy and feel asking the 
 people who are already using it/or perhaps developing it is a better bet).
 
 1. Can I get my solr instance to load data (fresh data for indexing) from a 
 stream (imagine a mq kind of queue, or similar) ?

Yes, with a little bit of work.

 2. Can I host my solr instance to use hbase as the database/file system 
 (read HDFS) ?

Probably, but I doubt it will be fast.  Local disk is usually the best.  100+ M 
rows is large but not unreasonable.

 3. are there somewhere any reports available (as in benchmarks ) for a solr 
 instance's performance ? 

You can probably search the web for these.  I've personally seen several 
installs w/ 1B+ docs and subsecond search and faceting and heard of others.  
You might look at the stuff the Hathi trust has put up.  

 4. are there any APIs available which might help me

RE: Is Solr right for my business situation ?

2010-09-28 Thread Jonathan Rochkind

Staging the data in a non-Solr store sounds like a potentially reasonable 
idea to me. You might want to consider a NoSQL store of some kind like MongoDB 
perhaps, instead of an rdbms. 

The way to think about Solr is not as a store or a database -- it's an index 
for serving your application. That's also the way to think about how to get 
your multiple tables in there -- denormalize, denormalize, denormalize.  You 
need to think about what you actually need to search over, and build your index 
to serve that efficiently, rather than thinking about normalization or data 
modelling the way we are used to with rdbms's, it's a different way of 
thinking.  

A Solr index basically gives you one collection of documents. But the documents 
can all have different fields -- so you _could_ (but probably don't want to) 
essentially put all your tables in there with unique fields --they're all in 
the same index, they're all just documents, but some have a table1_title and 
table1_author, and others have no data in those fields but a table2_productName 
and a table2_price.  Then if you want to query on just one type of thing, you 
just query on those fields.  Except... you don't get any joins.  Which is why 
you probably don't want to do that after all, it probably won't serve your 
needs. 

Figuring out the right way to model your data in Solr can be tricky, and it is 
sometimes hard to do exactly what you want. Solr isn't an rdbms, and in some 
ways isn't as powerful as an rdbms -- in the sense of being as flexible with 
what kinds of queries you can run on any given data.   What it does is give you 
very fast access to inverted index lookups and set combinations and facetting 
that would be very hard to do efficiently in an rdbms. It is a trade-off.  But 
there's not really a general answer to how do I take these dozen rdbms tables 
and store them in Solr the best way? -- it depends on what kinds of searching 
you need to support and the nature of your data. 

From: Sharma, Raghvendra [sraghven...@corelogic.com]
Sent: Tuesday, September 28, 2010 2:15 AM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

Thanks for the responses people.

@Grant

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements.

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable.

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?

 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release


 The existing spatial search has some serious problems and is deprecated.

 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.

 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table,

Re: Re:The search response time is too loong

2010-09-28 Thread newsam

I guess you are correct. We used the default SOLR cache configuration. I will 
change the cache configuration.

BTW, I want to deploy several shards from the existing 8G index file, such as 
4G per shards. Is there any tool to generate two shards from one 8G index file?

From: kenf_nc ken.fos...@realestate.com
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Re:The search response time is too loong
Date: Mon, 27 Sep 2010 05:37:25 -0700 (PDT)

mem usage is over 400M, do you mean Tomcat mem size? If you don't give your
cache sizes enough room to grow you will choke the performance. You should
adjust your Tomcat settings to let the cache grow to at least 1GB or better
would be 2GB. You may also want to look into 
http://wiki.apache.org/solr/SolrCaching warming the cache  to make the first
time call a little faster. 

For comparison, I also have about 8GB in my index but only 2.8 million
documents. My search query times on a smaller box than you specify are 6533
milliseconds on an unwarmed (newly rebooted) instance. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
Sent from the Solr - User mailing list archive at Nabble.com.

What's the difference between TokenizerFactory, Tokenizer, Analyzer?

2010-09-28 Thread Andy

Could someone help me to understand the differences between TokenizerFactory, 
Tokenizer,  Analyzer?

Specifically, I'm interested in implementing auto-complete for tags that could 
contain both English  Chinese. I read this article 
(http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/).
 In the article KeywordTokenizerFactory is used as tokenizer. I thought I'd try 
replacing that with CJKTokenizer. 2 questions:

1) KeywordTokenizerFactory seems to be a tokenizer factory while CJKTokenizer 
seems to be just a tokenizer. Are they the same type of things at all? 
Could I just replace 
tokenizer class=solr.KeywordTokenizerFactory/
with
tokenizer class=org.apache.lucene.analysis.cjk.CJKTokenizer/
??

2) I'm also interested in trying out SmartChineseAnalyzer 
(http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html)
However SmartChineseAnalyzer doesn't offer a separate tokenizer. It's just an 
analyzer and that's it. How do I use it in Solr?

Thanks.
Andy

Re: Is Solr right for our project?

2010-09-28 Thread Mike Thomsen

Interesting. So what you are saying, though, is that at the moment it
is NOT there?

On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent
jan@cominvent.com wrote:
 Solr will match this in version 3.1 which is the next major release.
 Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions
 Coming to a trunk near you - see 
 https://issues.apache.org/jira/browse/SOLR-1873

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com

 On 27. sep. 2010, at 17.44, Mike Thomsen wrote:

 (I apologize in advance if I missed something in your documentation,
 but I've read through the Wiki on the subject of distributed searches
 and didn't find anything conclusive)

 We are currently evaluating Solr and Autonomy. Solr is attractive due
 to its open source background, following and price. Autonomy is
 expensive, but we know for a fact that it can handle our distributed
 search requirements perfectly.

 What we need to know is if Solr has capabilities that match or roughly
 approximate Autonomy's Distributed Search Handler. What it does it
 acts as a front-end for all of Autonomy's IDOL search servers (which
 correspond in this scenario to Solr shards). It is configured to know
 what is on each shard, which servers hold each shard and intelligently
 farms out queries based on that configuration. There is no need to
 specify which IDOL servers to hit while querying; the DiSH just knows
 where to go. Additionally, I believe in cases where an index piece is
 mirrored, it also monitors server health and falls back intelligently
 on other backup instances of a shard/index piece based on that.

 I'd appreciate it if someone can give me a frank explanation of where
 Solr stands in this area.

 Thanks,

 Mike

Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts

I can find the answer but is this problem solved in Solr 1.4.1 ?

Thx for your answers.

is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Li Li

hi all
I want to speed up search time for my application. In a query, the
time is largly used in reading postlist(io with frq files) and
calculate scores and collect result(cpu, with Priority Queue). IO is
hardly optimized or already part optimized by nio. So I want to use
multithreads to utilize cpu. of course, it may be decrease QPS, but
the response time will also decrease-- that what I want. Because cpu
is easily obtained compared to faster hard disk.
I read the codes of searching roughly and find it's not an easy
task to modify search process. So I want to use other easy method .
One is use solr distributed search and dispatch documents to many
shards. but due to the network and global idf problem,it seems not a
good method for me.
Another one is to modify the index structure and averagely
dispatch frq files.
e.gterm1 - doc1,doc2, doc3,doc4,doc5 in _1.frq
I create to 2 indexes with
term1-doc1,doc3,doc5
term1-doc2,doc4
when searching, I create 2 threads with 2 PriorityQueues to
collect top N docs and merging their results
Is the 2nd idea feasible? Or any one has related idea? thanks.

Re: is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Michael McCandless

This is an excellent idea!

And, desperately needed.

It's high time Lucene can take advantage of concurrency when running a
single query.  Machines have tons of cores these days!  (My dev box
has 24!).

Note that one simple way to do this is use ParallelMultiSearcher: it
uses one thread per segment in your index.

But, note that [perversely] this means if your index is optimized you
get no concurrency gain!  So, you have to create your index w/ a
carefully picked maxMergeDocs/MB to ensure you can use concurrency.

I don't like having concurrency tied to index structure.  So a better
approach would be to have each thread pull its own Scorer for the same
query, but then each one does a .advance to it's chunk of the index,
and then iterates from there.  Then merge PQs in the end just like
MultiSearcher.

Mike

On Tue, Sep 28, 2010 at 7:24 AM, Li Li fancye...@gmail.com wrote:
 hi all
    I want to speed up search time for my application. In a query, the
 time is largly used in reading postlist(io with frq files) and
 calculate scores and collect result(cpu, with Priority Queue). IO is
 hardly optimized or already part optimized by nio. So I want to use
 multithreads to utilize cpu. of course, it may be decrease QPS, but
 the response time will also decrease-- that what I want. Because cpu
 is easily obtained compared to faster hard disk.
    I read the codes of searching roughly and find it's not an easy
 task to modify search process. So I want to use other easy method .
    One is use solr distributed search and dispatch documents to many
 shards. but due to the network and global idf problem,it seems not a
 good method for me.
    Another one is to modify the index structure and averagely
 dispatch frq files.
    e.g    term1 - doc1,doc2, doc3,doc4,doc5 in _1.frq
    I create to 2 indexes with
            term1-doc1,doc3,doc5
            term1-doc2,doc4
    when searching, I create 2 threads with 2 PriorityQueues to
 collect top N docs and merging their results
    Is the 2nd idea feasible? Or any one has related idea? thanks.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: is multi-threads searcher feasible idea to speed up?

2010-09-28 Thread Li Li

yes, there is a multisearcher in lucene. but it's idf in 2 indexes are
not global. maybe I can modify it and also the index like:
term1  df=5 doc1 doc3 doc5
term1  df=5 doc2 doc4

2010/9/28 Li Li fancye...@gmail.com:
 hi all
    I want to speed up search time for my application. In a query, the
 time is largly used in reading postlist(io with frq files) and
 calculate scores and collect result(cpu, with Priority Queue). IO is
 hardly optimized or already part optimized by nio. So I want to use
 multithreads to utilize cpu. of course, it may be decrease QPS, but
 the response time will also decrease-- that what I want. Because cpu
 is easily obtained compared to faster hard disk.
    I read the codes of searching roughly and find it's not an easy
 task to modify search process. So I want to use other easy method .
    One is use solr distributed search and dispatch documents to many
 shards. but due to the network and global idf problem,it seems not a
 good method for me.
    Another one is to modify the index structure and averagely
 dispatch frq files.
    e.g    term1 - doc1,doc2, doc3,doc4,doc5 in _1.frq
    I create to 2 indexes with
            term1-doc1,doc3,doc5
            term1-doc2,doc4
    when searching, I create 2 threads with 2 PriorityQueues to
 collect top N docs and merging their results
    Is the 2nd idea feasible? Or any one has related idea? thanks.

Re: Search Interface

2010-09-28 Thread Antonio Calo'


 Hi

You could try to use the Velocity framework to build GUIs in a  quick 
and efficent manner.


Solr come with a velocity handler already integrated that could be the 
best solution in your case:


http://wiki.apache.org/solr/VelocityResponseWriter

Also take these hints on the same topic: 
http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/


there is also a webinar about rapid prototyping with solr:

http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681

Hope this help

Antonio


Il 28/09/2010 4.35, Claudio Devecchi ha scritto:

Hi everybody,

I`m implementing my first solr engine for conceptual tests, I`m crawling my
wiki intranet to make some searches, the engine is working fine already, but
I need some interface to make my searchs.
Somebody knows where can I find some search interface just for
customizations?

Tks

Re: Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Erick Erickson

Please explain what you want to *do*, your message is so terse it makes it
really hard to figure out what you're asking. A couple of example queries
would help a lot.

Best
Erick

On Tue, Sep 28, 2010 at 5:53 AM, Patrick Sauts patrick.via...@gmail.comwrote:

 I can find the answer but is this problem solved in Solr 1.4.1 ?

 Thx for your answers.

RE: Limitations of prohibited clausses in sub-expression - pure negative query

2010-09-28 Thread Patrick Sauts

Maybe SOLR-80 jira issue ?

 

As written in Solr 1.4 book; pure negative query doesn't work correctly .
you have to add 'AND *:* '

 

thx

 

 

 

From: Patrick Sauts [mailto:patrick.via...@gmail.com] 
Sent: mardi 28 septembre 2010 11:53
To: 'solr-user@lucene.apache.org'
Subject: Limitations of prohibited clausses in sub-expression - pure
negative query

 

I can find the answer but is this problem solved in Solr 1.4.1 ?

Thx for your answers.

Re: Is Solr right for our project?

2010-09-28 Thread Jan Høydahl / Cominvent

Yes, in the latest released version (1.4.1), there is a shards= parameter but 
the client needs to fill it, i.e. the client needs to know what servers are 
indexers, searchers, shard masters and shard replicas...

The SolrCloud stuff is still not committed and only available as a patch right 
now. However, we encourage you to do a test install based on TRUNK+SOLR-1873 
and give it a try. But we cannot guarantee that the APIs will not change in the 
released version (hopefully 3.1 sometime this year).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. sep. 2010, at 10.44, Mike Thomsen wrote:

 Interesting. So what you are saying, though, is that at the moment it
 is NOT there?
 
 On Mon, Sep 27, 2010 at 9:06 PM, Jan Høydahl / Cominvent
 jan@cominvent.com wrote:
 Solr will match this in version 3.1 which is the next major release.
 Read this page: http://wiki.apache.org/solr/SolrCloud for feature 
 descriptions
 Coming to a trunk near you - see 
 https://issues.apache.org/jira/browse/SOLR-1873
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 27. sep. 2010, at 17.44, Mike Thomsen wrote:
 
 (I apologize in advance if I missed something in your documentation,
 but I've read through the Wiki on the subject of distributed searches
 and didn't find anything conclusive)
 
 We are currently evaluating Solr and Autonomy. Solr is attractive due
 to its open source background, following and price. Autonomy is
 expensive, but we know for a fact that it can handle our distributed
 search requirements perfectly.
 
 What we need to know is if Solr has capabilities that match or roughly
 approximate Autonomy's Distributed Search Handler. What it does it
 acts as a front-end for all of Autonomy's IDOL search servers (which
 correspond in this scenario to Solr shards). It is configured to know
 what is on each shard, which servers hold each shard and intelligently
 farms out queries based on that configuration. There is no need to
 specify which IDOL servers to hit while querying; the DiSH just knows
 where to go. Additionally, I believe in cases where an index piece is
 mirrored, it also monitors server health and falls back intelligently
 on other backup instances of a shard/index piece based on that.
 
 I'd appreciate it if someone can give me a frank explanation of where
 Solr stands in this area.
 
 Thanks,
 
 Mike

RE: Need help with spellcheck city name

2010-09-28 Thread Dyer, James

You might want to look at SOLR-2010.  This patch works with the collation 
feature, having it test the collations it returns to ensure they'll return 
hits.  So if a user types san jos it will know that the combination san 
jose is in the index and san ojos is not.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: Monday, September 27, 2010 7:45 PM
To: solr-user@lucene.apache.org
Cc: erickerick...@gmail.com
Subject: Re: Need help with spellcheck city name

No, I checked, there is a city called Swan in Iowa.  So, it is getting from the 
city index, so is Clerk.  But why does it favor Swan than San?  Spellcheck get 
weird after I treat city name as one token.  If I do it in the old way, it let 
San go, and correct Jos as Ojos instead of Jose because Ojos is ranked as #1 
and 
Jose at the middle.  Any more suggestions?  Rank it by frequency first then 
score doesn't work neither.  


 


From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 5:24:25 PM
Subject: Re: Need help with spellcheck city name

Hmmm, did you rebuild your spelling index after the config changes?

And it really looks like somehow you're getting results from a field other
than city. Are you also sure that your cityname field is of type
autocomplete1?

Shooting in the dark here, but these results are so weird that I suspect
it's
something fundamental

Best
Erick

On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett 
savannah_becket...@yahoo.com wrote:

 No, it doesn't work, I got weird result. I set my city name field to be
 parsed
 as a token as following:

        fieldType name=autocomplete1 class=solr.TextField
 positionIncrementGap=100
          analyzer type=index
            tokenizer class=solr.KeywordTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
          /analyzer
          analyzer type=query
            tokenizer class=solr.KeywordTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
          /analyzer
        /fieldType

 I got following result for spellcheck:

 lstname=spellcheck
 -    lstname=suggestions
 -        lstname=san
              intname=numFound1/int
              intname=startOffset0/int
              intname=endOffset3/int
 -            arrname=suggestion
                  strswan/str
          /arr
      /lst
 -        lstname=clar
              intname=numFound1/int
              intname=startOffset4/int
        intname=endOffset8/int
                arrname=suggestion
          strclark/str
      /arr
      /lst
  /lst





 
 From: Tom Hill solr-l...@worldware.com
 To: solr-user@lucene.apache.org
 Sent: Mon, September 27, 2010 3:52:48 PM
 Subject: Re: Need help with spellcheck city name

 Maybe process the city name as a single token?

 On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
 savannah_becket...@yahoo.com wrote:
  Hi,
   I have city name as a text field, and I want to do spellcheck on it.  I
 use
  setting in http://wiki.apache.org/solr/SpellCheckComponent
 
  If I setup city name as text field and do spell check on San Jos for
 San
 Jose,
  I get suggestion for Jos as ojos.  I checked the extendedresult and I
 found
  that Jose is in the middle of all 10 suggestions in term of score and
  frequency.  I then set city name as string field, and spell check again,
 I got
  Van for San and Ross for Jos, which is weird because San is correct.
 
 
  How do you setup spellchecker to spellcheck city names?  City name can
 have
  multiple words.
  Thanks.

Re: What's the difference between TokenizerFactory, Tokenizer, Analyzer?

2010-09-28 Thread Ahmet Arslan

 1) KeywordTokenizerFactory seems to be a tokenizer
 factory while CJKTokenizer seems to be just a tokenizer.
 Are they the same type of things at all? 
 Could I just replace 
 tokenizer class=solr.KeywordTokenizerFactory/
 with
 tokenizer
 class=org.apache.lucene.analysis.cjk.CJKTokenizer/
 ??


You should use org.apache.solr.analysis.CJKTokenizerFactory instead.


 2) I'm also interested in trying out SmartChineseAnalyzer
 (http://lucene.apache.org/java/2_9_0/api/contrib-smartcn/org/apache/lucene/analysis/cn/smart/SmartChineseAnalyzer.html)
 However SmartChineseAnalyzer doesn't offer a separate
 tokenizer. It's just an analyzer and that's it. How do I use
 it in Solr?

You can use lucene analyzer directly in solr:

fieldType name=chineese_text class=solr.TextField
  analyzer 
class=org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer/
/fieldType

Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent

Hi,

Have anyone written any conditional functions yet for use in Function Queries?

I see the use for a function which can run different sub functions depending on 
the value of a field.

Say you have three documents:
A: title=Sports car, color=red
B: title=Boring car, color=green
B: title=Big car, color=black

Now we have a requirement to boost red cars over green and green cars over 
black.

The only way I have found to do this today is (ab)using the map() function. 
DisMax syntax:
q=carbf=sum(map(query($qr),0,0,0,100.0),map(query($qg),0,0,0,50.0))qr=color:redqg=color:green

But I suspect this is expensive in terms of two sub queries being applied and 
scored.

An elegant way to achieve the same would be through a new native if() or case() 
function:
q=carbf=if(color==red; 100; if(color==green; 50; 0))
OR
q=carbf=case(color, red:100, green:sum(30,20))

What do you think?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai

Hi,

I am using Solr 1.4.1 with Nutch to index some of our intranet content.
In Solrconfig.xml, default request handler is set to standard. I am
planning to change that to use dismax as the request handler but when I
set default=true for dismax - Solr does not return any results - I get
results only when I comment out str name=defTypedismax/str. 

This works
  requestHandler name=standard class=solr.SearchHandler
default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=fl*/str
   str name=qftitle^20.0 pagedescription^15.0/str
   str name=version2.1/str
 /lst
  /requestHandler

DOES NOT WORK
  requestHandler name=dismax class=solr.SearchHandler
default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str

THIS WORKS
  requestHandler name=dismax class=solr.SearchHandler
default=true
lst name=defaults
!-- str name=defTypedismax/str --
 str
name=echoParamsexplicit/str

Please let me know what I am doing wrong here. 

Sai Thumuluri
Sr. Member - Application Staff
IT Intranet  Knoweldge Mgmt. Systems
614 560-8041 (Desk)
614 327-7200 (Mobile)

Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch

Are you removing the standard default requestHandler when you do this? Or
are you specifying two requestHandler's with default=true ?

-L

On Tue, Sep 28, 2010 at 11:14 AM, Thumuluri, Sai 
sai.thumul...@verizonwireless.com wrote:

 Hi,

 I am using Solr 1.4.1 with Nutch to index some of our intranet content.
 In Solrconfig.xml, default request handler is set to standard. I am
 planning to change that to use dismax as the request handler but when I
 set default=true for dismax - Solr does not return any results - I get
 results only when I comment out str name=defTypedismax/str.

 This works
  requestHandler name=standard class=solr.SearchHandler
 default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=fl*/str
   str name=qftitle^20.0 pagedescription^15.0/str
   str name=version2.1/str
 /lst
  /requestHandler

 DOES NOT WORK
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str

 THIS WORKS
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 !-- str name=defTypedismax/str --
 str
 name=echoParamsexplicit/str

 Please let me know what I am doing wrong here.

 Sai Thumuluri
 Sr. Member - Application Staff
 IT Intranet  Knoweldge Mgmt. Systems
 614 560-8041 (Desk)
 614 327-7200 (Mobile)

RE: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai

I removed default=true from standard request handler

-Original Message-
From: Luke Crouch [mailto:lcro...@geek.net] 
Sent: Tuesday, September 28, 2010 12:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Dismax Request handler and Solrconfig.xml

Are you removing the standard default requestHandler when you do this?
Or
are you specifying two requestHandler's with default=true ?

-L

On Tue, Sep 28, 2010 at 11:14 AM, Thumuluri, Sai 
sai.thumul...@verizonwireless.com wrote:

 Hi,

 I am using Solr 1.4.1 with Nutch to index some of our intranet
content.
 In Solrconfig.xml, default request handler is set to standard. I am
 planning to change that to use dismax as the request handler but when
I
 set default=true for dismax - Solr does not return any results - I
get
 results only when I comment out str name=defTypedismax/str.

 This works
  requestHandler name=standard class=solr.SearchHandler
 default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=fl*/str
   str name=qftitle^20.0 pagedescription^15.0/str
   str name=version2.1/str
 /lst
  /requestHandler

 DOES NOT WORK
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str

 THIS WORKS
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 !-- str name=defTypedismax/str --
 str

name=echoParamsexplicit/str

 Please let me know what I am doing wrong here.

 Sai Thumuluri
 Sr. Member - Application Staff
 IT Intranet  Knoweldge Mgmt. Systems
 614 560-8041 (Desk)
 614 327-7200 (Mobile)

Re: Conditional Function Queries

2010-09-28 Thread Yonik Seeley

On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent
jan@cominvent.com wrote:
 Have anyone written any conditional functions yet for use in Function Queries?

Nope - but it makes sense and has been on my list of things to do for
a long time.

-Y
http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

SolrException: Bad Request

2010-09-28 Thread Pavel Minchenkov

Hi,
I'm getting a rather strange exception after long web server idle (TomCat
7.0.2). If I immediately run the same request -- no errors are occurred. In
what may be the problem? All server settings are defaults.

Exception:

my stack trace
...
at sun.reflect.GeneratedMethodAccessor101.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:173)
at
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:89)
at
org.apache.cxf.jaxws.JAXWSMethodInvoker.invoke(JAXWSMethodInvoker.java:60)
at
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:75)
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:58)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
org.apache.cxf.workqueue.SynchronousExecutor.execute(SynchronousExecutor.java:37)
at
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:106)
at
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:243)
at
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:110)
at
org.apache.cxf.transport.servlet.ServletDestination.invoke(ServletDestination.java:98)
at
org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:423)
at
org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:178)
at
org.apache.cxf.transport.servlet.AbstractCXFServlet.invoke(AbstractCXFServlet.java:142)
at
org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:179)
at
org.apache.cxf.transport.servlet.AbstractHTTPServlet.doPost(AbstractHTTPServlet.java:103)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
at
org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:159)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:243)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:201)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:163)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:108)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:556)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:401)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:242)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:267)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:245)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:260)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing
query
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
at
com.gramit.services.searching.SearchingService.search(SearchingService.java:186)
... 57 more
Caused by: org.apache.solr.common.SolrException: Bad Request

Bad Request

request: 
http://127.0.0.1/solr/select?q=кофеhttp://127.0.0.1/solr/select?q=%D0%BA%D0%BE%D1%84%D0%B5fq=lat:[55.16728264288879
TO 56.437558186276114] AND lng:[36.47475305185914
TO
38.735977228049315]spellcheck=truespellcheck.count=1spellcheck.collate=truespellcheck.q=кофе
start=0rows=10sort=dist(2,lat,lng,55.8076049,37.5869184)
ascfacet=truefacet.limit=5facet.mincount=1facet.field=marketplaceCfg_idfacet.field=productCfg_idstats=truestats.field=pricewt=javabinversion=1
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
... 59 more

Thanks.

-- 
Pavel Minchenkov

Using separate Analyzers for querying and indexing.

2010-09-28 Thread James Norton

Hello,

I am migrating from a pure Lucene application to using solr.  For legacy 
reasons I must support a somewhat obscure query feature: lowercase words in the 
query should match lowercase or uppercase in the index, while uppercase words 
in the query should only match uppercase words in the index.

To do this with Lucene we created a custom Analyzer and custom TokenFilter.  
During indexing, the custom TokenFilter duplicates uppercase tokens as 
lowercase ones and sets their offsets to make them appear in same position as 
the upper case token, i.e., you get two tokens for every uppercase token.  Then 
at query time a normal (case sensitive) analyzer is used so that lowercase 
tokens will match either upper or lower, while the uppercase will only match 
uppercase.

I have looked through the documentation and I see how to specify the Analyzer 
in the schema.xml file that is used for indexing, but I don't know how to 
specify that a different Analyzer (the case sensitive one) should be used for 
queries.

Is this possible?

Thanks,

James

Re: Using separate Analyzers for querying and indexing.

2010-09-28 Thread Luke Crouch

Yeah. You can specify two analyzers in the same fieldType:

fieldType name=... class=...
analyzer type=index
...
/analyzer
analyzer type=query
...
/analyzer
/fieldType

-L

On Tue, Sep 28, 2010 at 2:31 PM, James Norton jnor...@yellowbrix.comwrote:

 Hello,

 I am migrating from a pure Lucene application to using solr.  For legacy
 reasons I must support a somewhat obscure query feature: lowercase words in
 the query should match lowercase or uppercase in the index, while uppercase
 words in the query should only match uppercase words in the index.

 To do this with Lucene we created a custom Analyzer and custom TokenFilter.
  During indexing, the custom TokenFilter duplicates uppercase tokens as
 lowercase ones and sets their offsets to make them appear in same position
 as the upper case token, i.e., you get two tokens for every uppercase token.
  Then at query time a normal (case sensitive) analyzer is used so that
 lowercase tokens will match either upper or lower, while the uppercase will
 only match uppercase.

 I have looked through the documentation and I see how to specify the
 Analyzer in the schema.xml file that is used for indexing, but I don't know
 how to specify that a different Analyzer (the case sensitive one) should be
 used for queries.

 Is this possible?

 Thanks,

 James

Re: Re:The search response time is too loong

2010-09-28 Thread Lance Norskog

Copy the index. Delete half of the documents. Optimize.
Copy the index. Delete the other half of the documents. Optimize.

2010/9/28 newsam new...@zju.edu.cn:
 I guess you are correct. We used the default SOLR cache configuration. I will 
 change the cache configuration.

 BTW, I want to deploy several shards from the existing 8G index file, such as 
 4G per shards. Is there any tool to generate two shards from one 8G index 
 file?

From: kenf_nc ken.fos...@realestate.com
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Re:The search response time is too loong
Date: Mon, 27 Sep 2010 05:37:25 -0700 (PDT)


mem usage is over 400M, do you mean Tomcat mem size? If you don't give your
cache sizes enough room to grow you will choke the performance. You should
adjust your Tomcat settings to let the cache grow to at least 1GB or better
would be 2GB. You may also want to look into
http://wiki.apache.org/solr/SolrCaching warming the cache  to make the first
time call a little faster.

For comparison, I also have about 8GB in my index but only 2.8 million
documents. My search query times on a smaller box than you specify are 6533
milliseconds on an unwarmed (newly rebooted) instance.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Lance Norskog
goks...@gmail.com

Re: Search Interface

2010-09-28 Thread Lance Norskog

There is already a simple Velocity app. Just hit
http://localhost:8983/solr/browse.
You can configure some handy parameters to make walkable facets in
solrconfig.xml.

On Tue, Sep 28, 2010 at 5:23 AM, Antonio Calo' anton.c...@gmail.com wrote:
  Hi

 You could try to use the Velocity framework to build GUIs in a  quick and
 efficent manner.

 Solr come with a velocity handler already integrated that could be the best
 solution in your case:

 http://wiki.apache.org/solr/VelocityResponseWriter

 Also take these hints on the same topic:
 http://www.lucidimagination.com/blog/2009/11/04/solritas-solr-1-4s-hidden-gem/

 there is also a webinar about rapid prototyping with solr:

 http://www.slideshare.net/erikhatcher/rapid-prototyping-with-solr-4312681

 Hope this help

 Antonio


 Il 28/09/2010 4.35, Claudio Devecchi ha scritto:

 Hi everybody,

 I`m implementing my first solr engine for conceptual tests, I`m crawling
 my
 wiki intranet to make some searches, the engine is working fine already,
 but
 I need some interface to make my searchs.
 Somebody knows where can I find some search interface just for
 customizations?

 Tks





-- 
Lance Norskog
goks...@gmail.com

Best way to check Solr index for completeness

2010-09-28 Thread Dmitriy Shvadskiy

Hello,
What would be the best way to check Solr index against original system
(Database) to make sure index is up to date? I can use Solr fields like Id
and timestamp to check against appropriate fields in database. Our index
currently contains over 2 mln documents across several cores. Pulling all
documents from Solr index via search (1000 docs at a time) is very slow. Is
there a better way to do it?

Thanks,
Dmitriy

Re: Best way to check Solr index for completeness

2010-09-28 Thread Luke Crouch

Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
simply select the most recent updated record from the db and check to make
sure the corresponding solr doc has the same timestamp?

-L

On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy dshvads...@gmail.comwrote:

 Hello,
 What would be the best way to check Solr index against original system
 (Database) to make sure index is up to date? I can use Solr fields like Id
 and timestamp to check against appropriate fields in database. Our index
 currently contains over 2 mln documents across several cores. Pulling all
 documents from Solr index via search (1000 docs at a time) is very slow. Is
 there a better way to do it?

 Thanks,
 Dmitriy

Re: Concurrent access to EmbeddedSolrServer

2010-09-28 Thread Reuben A Christie



  
  
we learned it hard way, Wish I had read this
  before
  http://wiki.apache.org/solr/EmbeddedSolr
  
  it is not threadsafe. start seeing concurrent modification
  exception as soon as within 100 Samples, when you load it with
  more than 1 Concurrent Users ( I have tested it using jmeter) 
  
  best,
  Reuben

On 12/9/2009 12:47 PM, Jon Poulton wrote:

  Hi there,
I'm about to start implementing some code which will access a Solr instance via a ThreadPool concurrently. I've been looking at the solrj API docs ( particularly http://lucene.apache.org/solr/api/index.html?org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html )  and I just want to make sure what I have in mind makes sense. The Javadoc is a bit sparse, so I thought I'd ask a couple of questions here.


1)  I'm assuming that EmbeddedSolrServer can be accessed concurrently by several threads at once for add, delete and query operations (on the SolrServer parent interface). Is that right? I don't have to enforce single-threaded access?

2)  What happens if multiple threads simultaneously call commit?

3)  What happens if multiple threads simultaneously call optimize?

4)  Both commit and optimise have optional parameters called "waitFlush" and "waitSearcher". These are undocumented in the Javadoc. What do they signify?

Thanks in advance for any help.

Cheers

Jon




--

Re: Best way to check Solr index for completeness

2010-09-28 Thread dshvadskiy


That will certainly work for most recent updates but I need to compare entire
index.

Dmitriy

Luke Crouch wrote:
 
 Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
 simply select the most recent updated record from the db and check to make
 sure the corresponding solr doc has the same timestamp?
 
 -L
 
 On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
 dshvads...@gmail.comwrote:
 
 Hello,
 What would be the best way to check Solr index against original system
 (Database) to make sure index is up to date? I can use Solr fields like
 Id
 and timestamp to check against appropriate fields in database. Our index
 currently contains over 2 mln documents across several cores. Pulling all
 documents from Solr index via search (1000 docs at a time) is very slow.
 Is
 there a better way to do it?

 Thanks,
 Dmitriy

 
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: is EmbeddedSolrServer thread safe ?

2010-09-28 Thread Reuben A Christie

 No it is not same for EmbeddedSolrServer, we learned it hard way, I 
guess you would have also learned it by now.



at SolrJ wiki page : http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

CommonsHttpSolrServer is thread-safe and if you are using the following 
constructor,
you *MUST* re-use the same instance for all requests. ...

But is it the same for EmbeddedSolrServer ?

Best regards

Jean-François



--
Reuben Christie
 -^-
 °v°
/(_)\
 ^ ^

RE: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Thumuluri, Sai

Can I please get some help here? I am in a tight timeline to get this
done - any ideas/suggestions would be greatly appreciated. 

-Original Message-
From: Thumuluri, Sai [mailto:sai.thumul...@verizonwireless.com] 
Sent: Tuesday, September 28, 2010 12:15 PM
To: solr-user@lucene.apache.org
Subject: Dismax Request handler and Solrconfig.xml
Importance: High

Hi,

I am using Solr 1.4.1 with Nutch to index some of our intranet content.
In Solrconfig.xml, default request handler is set to standard. I am
planning to change that to use dismax as the request handler but when I
set default=true for dismax - Solr does not return any results - I get
results only when I comment out str name=defTypedismax/str. 

This works
  requestHandler name=standard class=solr.SearchHandler
default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=fl*/str
   str name=qftitle^20.0 pagedescription^15.0/str
   str name=version2.1/str
 /lst
  /requestHandler

DOES NOT WORK
  requestHandler name=dismax class=solr.SearchHandler
default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str

THIS WORKS
  requestHandler name=dismax class=solr.SearchHandler
default=true
lst name=defaults
!-- str name=defTypedismax/str --
 str
name=echoParamsexplicit/str

Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch

What you have is exactly what I have on 1.4.0:

  requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str

And it has worked fine. We copied our solrconfig.xml from the examples and
changed them for our purposes. You might compare your solrconfig.xml to some
of the examples.

-L

On Tue, Sep 28, 2010 at 4:19 PM, Thumuluri, Sai 
sai.thumul...@verizonwireless.com wrote:

 Can I please get some help here? I am in a tight timeline to get this
 done - any ideas/suggestions would be greatly appreciated.

 -Original Message-
 From: Thumuluri, Sai [mailto:sai.thumul...@verizonwireless.com]
 Sent: Tuesday, September 28, 2010 12:15 PM
 To: solr-user@lucene.apache.org
 Subject: Dismax Request handler and Solrconfig.xml
 Importance: High

 Hi,

 I am using Solr 1.4.1 with Nutch to index some of our intranet content.
 In Solrconfig.xml, default request handler is set to standard. I am
 planning to change that to use dismax as the request handler but when I
 set default=true for dismax - Solr does not return any results - I get
 results only when I comment out str name=defTypedismax/str.

 This works
  requestHandler name=standard class=solr.SearchHandler
 default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=fl*/str
   str name=qftitle^20.0 pagedescription^15.0/str
   str name=version2.1/str
 /lst
  /requestHandler

 DOES NOT WORK
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str

 THIS WORKS
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 !-- str name=defTypedismax/str --
 str
 name=echoParamsexplicit/str

Re: Dismax Request handler and Solrconfig.xml

2010-09-28 Thread Luke Crouch

I notice we don't have the default=true, instead we manually specify
qt=dismax in our queries. HTH.

-L

On Tue, Sep 28, 2010 at 4:24 PM, Luke Crouch lcro...@geek.net wrote:

 What you have is exactly what I have on 1.4.0:


   requestHandler name=dismax class=solr.SearchHandler 
 lst name=defaults
  str name=defTypedismax/str

 And it has worked fine. We copied our solrconfig.xml from the examples and
 changed them for our purposes. You might compare your solrconfig.xml to some
 of the examples.

 -L


 On Tue, Sep 28, 2010 at 4:19 PM, Thumuluri, Sai 
 sai.thumul...@verizonwireless.com wrote:

 Can I please get some help here? I am in a tight timeline to get this
 done - any ideas/suggestions would be greatly appreciated.

 -Original Message-
 From: Thumuluri, Sai [mailto:sai.thumul...@verizonwireless.com]
 Sent: Tuesday, September 28, 2010 12:15 PM
 To: solr-user@lucene.apache.org
 Subject: Dismax Request handler and Solrconfig.xml
 Importance: High

 Hi,

 I am using Solr 1.4.1 with Nutch to index some of our intranet content.
 In Solrconfig.xml, default request handler is set to standard. I am
 planning to change that to use dismax as the request handler but when I
 set default=true for dismax - Solr does not return any results - I get
 results only when I comment out str name=defTypedismax/str.

 This works
  requestHandler name=standard class=solr.SearchHandler
 default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=fl*/str
   str name=qftitle^20.0 pagedescription^15.0/str
   str name=version2.1/str
 /lst
  /requestHandler

 DOES NOT WORK
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str

 THIS WORKS
  requestHandler name=dismax class=solr.SearchHandler
 default=true
lst name=defaults
 !-- str name=defTypedismax/str --
 str
 name=echoParamsexplicit/str

Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj

All,

 

I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?

 

In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
group=truegroup.filed=sig group=truegroup.filed=digest to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.

 

All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.

 

Thanks so much in advance for your help.





Here is my configuration:

 

SolrConfig.xml









updateRequestProcessorChain name=dedupe

processor



class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory



  bool name=enabledtrue/bool

  str name=signatureFieldsig/str

  bool name=overwriteDupesfalse/bool

  str



name=signatureClassorg.apache.solr.update.processor.Lookup3Signature

/str 

  str name=fieldsdigest/str

  /processor

processor class=solr.LogUpdateProcessorFactory /

processor class=solr.RunUpdateProcessorFactory /

  /updateRequestProcessorChain





requestHandler name=/update

class=solr.XmlUpdateRequestHandler 

   lst name=defaults

 str name=update.processordedupe/str

   /lst

 /requestHandler



Schema.xml





field name=sig type=string stored=true
indexed=true

multiValued=true /

 

Thanks so much for your help

Re: Using separate Analyzers for querying and indexing.

2010-09-28 Thread James Norton


Excellent, exactly what I needed.

Thanks,

James

On Sep 28, 2010, at 4:28 PM, Luke Crouch wrote:

 Yeah. You can specify two analyzers in the same fieldType:
 
 fieldType name=... class=...
 analyzer type=index
 ...
 /analyzer
 analyzer type=query
 ...
 /analyzer
 /fieldType
 
 -L
 
 On Tue, Sep 28, 2010 at 2:31 PM, James Norton jnor...@yellowbrix.comwrote:
 
 Hello,
 
 I am migrating from a pure Lucene application to using solr.  For legacy
 reasons I must support a somewhat obscure query feature: lowercase words in
 the query should match lowercase or uppercase in the index, while uppercase
 words in the query should only match uppercase words in the index.
 
 To do this with Lucene we created a custom Analyzer and custom TokenFilter.
 During indexing, the custom TokenFilter duplicates uppercase tokens as
 lowercase ones and sets their offsets to make them appear in same position
 as the upper case token, i.e., you get two tokens for every uppercase token.
 Then at query time a normal (case sensitive) analyzer is used so that
 lowercase tokens will match either upper or lower, while the uppercase will
 only match uppercase.
 
 I have looked through the documentation and I see how to specify the
 Analyzer in the schema.xml file that is used for indexing, but I don't know
 how to specify that a different Analyzer (the case sensitive one) should be
 used for queries.
 
 Is this possible?
 
 Thanks,
 
 James

Re: Conditional Function Queries

2010-09-28 Thread Jan Høydahl / Cominvent

Ok, I created the issues:

IF function: SOLR-2136
AND, OR, NOT: SOLR-2137

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 28. sep. 2010, at 19.36, Yonik Seeley wrote:

 On Tue, Sep 28, 2010 at 11:33 AM, Jan Høydahl / Cominvent
 jan@cominvent.com wrote:
 Have anyone written any conditional functions yet for use in Function 
 Queries?
 
 Nope - but it makes sense and has been on my list of things to do for
 a long time.
 
 -Y
 http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

multiple local indexes

2010-09-28 Thread Brent Palmer

In our application, we need to be able to search across multiple local 
indexes.  We need this not so much for performance reasons, but because 
of the particular needs of our project.  But the indexes, while sharing 
the same schema can be vary different in terms of size and distribution 
of documents.  By that I mean that some indexes may have a lot more 
documents about some topic while others will have more documents about 
other topics.  We want to be able add documents to the individual 
indexes as well.  I can provide more detail about our project is 
necessary.  Thus, the Distributed Search feature with shards in 
different cores seems to be an obvious solution except for the 
limitation of distributed idf.


First, I want to make sure my understanding about the distributed idf 
limitation are correct:  If your documents are spread across your shards 
evenly, then the distribution of terms across the individual shards can 
be assumed to be even enough not to matter.  If, as in our case, the 
shards are not very uniform, then this limitation is magnified.  Even 
though simplistic, do I have the basic idea?


We have hacked together something that allows us to read from multiple 
indexes, but it isn't really a long-term solution.  It's just sort of 
shoe-horned in there.   Here are some notes from the programmer who 
worked on this:
  Two custom files: EgranaryIndexReaderFactory.java and 
EgranaryIndexReader.java

  EgranaryIndexReader.java
  No real work is done here. This class extends 
lucene.index.MultiReader and overrides the directory() and getVersion() 
methods inherited from IndexReader.
  These methods don't  make sense for a MultiReader as they only return 
a single value. However, Solr expects Readers to have these methods. 
directory() was
  overridden to return a call to directory() on the first reader in the 
subreader list. The same was done for getVersion(). This hack makes any 
use of these methods

  by Solr somewhat pointless.

  EgranaryIndexReaderFactory.java
  Overrides the newReader(Directory indexDir, boolean readOnly) method
  The expected behavior of this method is to construct a Reader from 
the index at indexDir.
  However, this method ignores indexDir and reads a list of indexDirs 
from the solrconfig.xml file.
  These indices are used to create a list of lucene.index.IndexReader 
classes. This list is then used to create the EgranaryIndexReader.


So the second questions is: Does anybody have other ideas about how we 
might solve this problem?  Is distributed search still our best bet?


Thanks for your thoughts!
Brent

RE: Solr Deduplication and Field Collpasing

2010-09-28 Thread Markus Jelsma

You could create a custom update processor that adds a digest field for newly 
added documents that do not have the digest field themselves. This way, the 
documents that are not added by Nutch get a proper non-empty digest field so 
the deduplication processor won't create the same empty hash and overwrite 
those. Or you could extend 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory so it skips 
documents with an empty digest field. I'd think the latter would be the 
quickest route but correct me if i'm wrong.

 

Cheers,
 
-Original message-
From: Nemani, Raj raj.nem...@turner.com
Sent: Tue 28-09-2010 23:28
To: solr-user@lucene.apache.org; 
Subject: Solr Deduplication and Field Collpasing

All,



I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?



In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
group=truegroup.filed=sig group=truegroup.filed=digest to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.



All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.



Thanks so much in advance for your help.





Here is my configuration:



SolrConfig.xml

               

               

               

               

               updateRequestProcessorChain name=dedupe

                   processor

               

class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory

               

                     bool name=enabledtrue/bool

                     str name=signatureFieldsig/str

                     bool name=overwriteDupesfalse/bool

                     str

               

name=signatureClassorg.apache.solr.update.processor.Lookup3Signature

               /str 

                 str name=fieldsdigest/str

                 /processor

                   processor class=solr.LogUpdateProcessorFactory /

                   processor class=solr.RunUpdateProcessorFactory /

                 /updateRequestProcessorChain

               

               

               requestHandler name=/update

class=solr.XmlUpdateRequestHandler 

                  lst name=defaults

                    str name=update.processordedupe/str

                  /lst

                /requestHandler

               

               Schema.xml

               

               

               field name=sig type=string stored=true
indexed=true

               multiValued=true /



Thanks so much for your help

RE: multiple local indexes

2010-09-28 Thread Jonathan Rochkind

Honestly, I think just putting everything in the same index is your best bet.  
Are you sure your particular needs of your project can't be served by one 
combined index?  You can certainly still query on just a portion of the index 
when needed using fq -- you can even create a request handler (or multiple 
request handlers) with invariant or appends to force that all queries 
through that request handler have a fixed fq. 

From: Brent Palmer [br...@widernet.org]
Sent: Tuesday, September 28, 2010 6:04 PM
To: solr-user@lucene.apache.org
Subject: multiple local indexes

In our application, we need to be able to search across multiple local
indexes.  We need this not so much for performance reasons, but because
of the particular needs of our project.  But the indexes, while sharing
the same schema can be vary different in terms of size and distribution
of documents.  By that I mean that some indexes may have a lot more
documents about some topic while others will have more documents about
other topics.  We want to be able add documents to the individual
indexes as well.  I can provide more detail about our project is
necessary.  Thus, the Distributed Search feature with shards in
different cores seems to be an obvious solution except for the
limitation of distributed idf.

First, I want to make sure my understanding about the distributed idf
limitation are correct:  If your documents are spread across your shards
evenly, then the distribution of terms across the individual shards can
be assumed to be even enough not to matter.  If, as in our case, the
shards are not very uniform, then this limitation is magnified.  Even
though simplistic, do I have the basic idea?

We have hacked together something that allows us to read from multiple
indexes, but it isn't really a long-term solution.  It's just sort of
shoe-horned in there.   Here are some notes from the programmer who
worked on this:
   Two custom files: EgranaryIndexReaderFactory.java and
EgranaryIndexReader.java
   EgranaryIndexReader.java
   No real work is done here. This class extends
lucene.index.MultiReader and overrides the directory() and getVersion()
methods inherited from IndexReader.
   These methods don't  make sense for a MultiReader as they only return
a single value. However, Solr expects Readers to have these methods.
directory() was
   overridden to return a call to directory() on the first reader in the
subreader list. The same was done for getVersion(). This hack makes any
use of these methods
   by Solr somewhat pointless.

   EgranaryIndexReaderFactory.java
   Overrides the newReader(Directory indexDir, boolean readOnly) method
   The expected behavior of this method is to construct a Reader from
the index at indexDir.
   However, this method ignores indexDir and reads a list of indexDirs
from the solrconfig.xml file.
   These indices are used to create a list of lucene.index.IndexReader
classes. This list is then used to create the EgranaryIndexReader.

So the second questions is: Does anybody have other ideas about how we
might solve this problem?  Is distributed search still our best bet?

Thanks for your thoughts!
Brent

Re: Solr Deduplication and Field Collpasing

2010-09-28 Thread Nemani, Raj

I have the digest field already in the schema because the index is shared 
between nutch docs and others.  I do not know if the second approach is the 
quickest in my case.

I can set the digest value to something unique for non nutch documets easily (I 
have an I'd field that I can use to populate the digest field during indxing of 
new non_nutch documets.  I have custom tool that does the indexing of these 
docs).  But I have more than3 millon documents in the index already that I 
don't want start over with new indexing again if I don't have to. Is there a 
way I can update the digest field with the value from the corresponding I'd 
field using solr? 

Thanks
Raj

- Original Message -
From: Markus Jelsma markus.jel...@buyways.nl
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Tue Sep 28 18:19:17 2010
Subject: RE: Solr Deduplication and Field Collpasing

You could create a custom update processor that adds a digest field for newly 
added documents that do not have the digest field themselves. This way, the 
documents that are not added by Nutch get a proper non-empty digest field so 
the deduplication processor won't create the same empty hash and overwrite 
those. Or you could extend 
org.apache.solr.update.processor.SignatureUpdateProcessorFactory so it skips 
documents with an empty digest field. I'd think the latter would be the 
quickest route but correct me if i'm wrong.

 

Cheers,
 
-Original message-
From: Nemani, Raj raj.nem...@turner.com
Sent: Tue 28-09-2010 23:28
To: solr-user@lucene.apache.org; 
Subject: Solr Deduplication and Field Collpasing

All,



I have setup Nutch to submit the crawl results to Solr index.  I have
some duplicates in the documents generated by the Nutch crawl.  There is
filed 'digest' that Nutch generates that is same for those documents
that are duplicates.  While setting up the the dedupe processor in the
Solr config file, I have used this 'Digest' field in the following
way(see below for config details).  Since my index has documents other
than the ones generated by Nutch I cannot use 'overwritedupes=true
because for non-Nutch generated documents the digest field will not be
populated and I found that Solr deletes every one of those documents
that do not have the digest filed populated. Probably because they all
will have the same 'sig' filed value generated based on an 'empty'
digest field forcing Solr to delete everything?



In any case, given the scenario I though I would set
'overwritedupes=false' and use field collapsing based on digest or sig
filed but I could not get filed collapsing to work.  Based on the wiki
documentation I was adding the query string
group=truegroup.filed=sig group=truegroup.filed=digest to my
over all query in admin console and I still got the duplicate documents
in the results.  Is there anything special I need to do to get field
collapsing working?  I am running Solr 1.4.



All this is because Nutch thinks that (url *is* the unique id for the
nutch document)

http://mysite.mydomain.com/index.html and http://mysite/index.html (the
difference is only in the alias and for an internal site both are valid)
are different documents depending on how the link is setup.  This is
reason for me to try deduplication.  I cannot submit SolrDedup command
from Nutch because non-Nutch generated documents do not have digest
filed populated and I read on the mailing lists that this will cause the
SolrDedup initiated from Nutch to fail.  This forced me to do try
deduplication on Solr side.



Thanks so much in advance for your help.





Here is my configuration:



SolrConfig.xml

               

               

               

               

               updateRequestProcessorChain name=dedupe

                   processor

               

class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory

               

                     bool name=enabledtrue/bool

                     str name=signatureFieldsig/str

                     bool name=overwriteDupesfalse/bool

                     str

               

name=signatureClassorg.apache.solr.update.processor.Lookup3Signature

               /str 

                 str name=fieldsdigest/str

                 /processor

                   processor class=solr.LogUpdateProcessorFactory /

                   processor class=solr.RunUpdateProcessorFactory /

                 /updateRequestProcessorChain

               

               

               requestHandler name=/update

class=solr.XmlUpdateRequestHandler 

                  lst name=defaults

                    str name=update.processordedupe/str

                  /lst

                /requestHandler

               

               Schema.xml

Re: Re:The search response time is too loong

2010-09-28 Thread newsam

Thx. I will let you know the latest status.
From: Lance Norskog goks...@gmail.com
Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org, newsam new...@zju.edu.cn
Subject: Re: Re:The search response time is too loong
Date: Tue, 28 Sep 2010 13:34:53 -0700

Copy the index. Delete half of the documents. Optimize.
Copy the index. Delete the other half of the documents. Optimize.

2010/9/28 newsam
:
 I guess you are correct. We used the default SOLR cache configuration. I 
 will change the cache configuration.

 BTW, I want to deploy several shards from the existing 8G index file, such 
 as 4G per shards. Is there any tool to generate two shards from one 8G index 
 file?

From: kenf_nc

Reply-To: solr-user@lucene.apache.org
To: solr-user@lucene.apache.org
Subject: Re: Re:The search response time is too loong
Date: Mon, 27 Sep 2010 05:37:25 -0700 (PDT)

mem usage is over 400M, do you mean Tomcat mem size? If you don't give your
cache sizes enough room to grow you will choke the performance. You should
adjust your Tomcat settings to let the cache grow to at least 1GB or better
would be 2GB. You may also want to look into
http://wiki.apache.org/solr/SolrCaching warming the cache to make the first
time call a little faster.

For comparison, I also have about 8GB in my index but only 2.8 million
documents. My search query times on a smaller box than you specify are 6533
milliseconds on an unwarmed (newly rebooted) instance.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
Sent from the Solr - User mailing list archive at Nabble.com.

-- 
Lance Norskog
goks...@gmail.com

Re: Best way to check Solr index for completeness

2010-09-28 Thread Erick Erickson

Have you looked at SOLRs TermComponent? Assuming you have a unique key,
I think you could use TermsComponent to walk that field for comparing
against
your database rather then getting all the documents.

HTH
Erick

On Tue, Sep 28, 2010 at 5:11 PM, dshvadskiy dshvads...@gmail.com wrote:


 That will certainly work for most recent updates but I need to compare
 entire
 index.

 Dmitriy

 Luke Crouch wrote:
 
  Is there a 1:1 ratio of db records to solr documents? If so, couldn't you
  simply select the most recent updated record from the db and check to
 make
  sure the corresponding solr doc has the same timestamp?
 
  -L
 
  On Tue, Sep 28, 2010 at 3:48 PM, Dmitriy Shvadskiy
  dshvads...@gmail.comwrote:
 
  Hello,
  What would be the best way to check Solr index against original system
  (Database) to make sure index is up to date? I can use Solr fields like
  Id
  and timestamp to check against appropriate fields in database. Our index
  currently contains over 2 mln documents across several cores. Pulling
 all
  documents from Solr index via search (1000 docs at a time) is very slow.
  Is
  there a better way to do it?
 
  Thanks,
  Dmitriy
 
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Best-way-to-check-Solr-index-for-completeness-tp1598626p1598733.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to tell whether a plugin is loaded?

2010-09-28 Thread Chris Hostetter


: then in method createParser() add the following:
: 
: req.getCore().getInfoRegistry().put(getName(), this);

that doesn't seem like a good idea -- createParser will be called every 
time a string needs to be parsed, you're overwriting the same entry in the 
infoRegistry over and over and over again.

I would just put that logic in your init() method (make sure to put the 
QParserPlugin in the registry, not the individual QParser instances)

: I wonder though whether it'd be useful if Solr QParserPlugin did 
: implement SolrInfoMBean by default already...

I agree ... i think that was an oversight when QParser was added.

There's an open issue for it, but no one has had a chance to get arround 
to it yet...

https://issues.apache.org/jira/browse/SOLR-1428



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!

Why the query performance is so different for queries?

2010-09-28 Thread newsam

Hi guys,

I have posted a thread The search response time is too long. 
 

The SOLR searcher instance is deployed with Tomcat 5.5.21.  
.
The index file is 8.2G. The doc num is 6110745. DELL Server has Intel(R) 
Xeon(TM) CPU (4 cores) 3.00GHZ and 6G RAM.

In SOLR back-end, query=key:* costs almost 60s while query=*:* only needs 
500ms. Another case is query=product_name_title:*, which costs 7s. I am 
confused about the query performance. Do you have any suggestions?

btw, the cache setting is as follows:

filterCache: 256, 256, 0
queryResultCache: 1024, 512, 128
documentCache: 16384, 4096, n/a 

Thanks.

Solr with example Jetty and score problem

2010-09-28 Thread Floyd Wu

Hi there

I have a problem, the situation is when I issue a query to single instance,
Solr response XML like following
as you can see, the score is normal(float name=score value=...)
===
 response
lst name=responseHeader
int name=status0/int
int name=QTime23/int
lst name=params
str name=fl_l_title,score/str
str name=start0/str
str name=q_l_unique_key:12/str
str name=hl.fl*/str
str name=hltrue/str
str name=rows999/str
/lst
/lst
result name=response numFound=1 start=0 maxScore=1.9808292
doc
float name=score1.9808292/float
str name=_l_titleGTest/str
/doc
/result
lst name=highlighting
lst name=12
arr name=_l_unique_key
strem12/em/str
/arr
/lst
/lst
/response

===

But when I issue the query with shard(two instances), the response XML will
be like following.
as you can see, that score has bee tranfer to a element arr of doc
===
 response
lst name=responseHeader
int name=status0/int
int name=QTime64/int
lst name=params
str name=shardslocalhost:8983/solr/core0,172.16.6.35:8983/solr/str
str name=fl_l_title,score/str
str name=start0/str
str name=q_l_unique_key:12/str
str name=hl.fl*/str
str name=hltrue/str
str name=rows999/str
/lst
/lst
result name=response numFound=1 start=0 maxScore=1.9808292
doc
str name=_l_titleGtest/str
arr name=score
float name=score1.9808292/float
/arr
/doc
/result
lst name=highlighting
lst name=12
arr name=_l_unique_key
strem12/em/str
/arr
/lst
/lst
/response

===
My Schema.xml like following

   field name=_l_unique_key type=string indexed=true stored=true
required=true omitNorms=true/
   field name=_l_read_permission type=string indexed=true
stored=true omitNorms=true multiValued=true/
   field name=_l_title type=text indexed=true stored=true
omitNorms=false termVectors=true termPositions=true
termOffsets=true/
   field name=_l_summary type=text indexed=true stored=true
omitNorms=false termVectors=true termPositions=true
termOffsets=true/
   field name=_l_body type=text indexed=true stored=true
multiValued=true termVectors=true termPositions=true
termOffsets=true omitNorms=false/

   dynamicField name=* type=text indexed=true stored=true
multiValued=true termVectors=true
termPositions=true
termOffsets=true omitNorms=false/
 /fields
 uniqueKey_l_unique_key/uniqueKey
 defaultSearchField_l_body/defaultSearchField

I don't really know what happended. Is my schema problem or is the behavior
of Solr?
please help on this.

Re: multiple local indexes

2010-09-28 Thread Brent Palmer

 Thanks for your comments, Jonathon.  Here is some information that 
gives a brief overview of the eGranary Platform in order to quickly 
outline the need for a solution for bringing multiple indexes into one 
searchable collection.


http://www.widernet.org/egranary/info/multipleIndexes

Thanks,
Brent


On 9/28/2010 5:40 PM, Jonathan Rochkind wrote:

Honestly, I think just putting everything in the same index is your best bet.  Are you sure your 
particular needs of your project can't be served by one combined index?  You can certainly still 
query on just a portion of the index when needed using fq -- you can even create a request handler (or 
multiple request handlers) with invariant or appends to force that all queries 
through that request handler have a fixed fq.

From: Brent Palmer [br...@widernet.org]
Sent: Tuesday, September 28, 2010 6:04 PM
To: solr-user@lucene.apache.org
Subject: multiple local indexes

In our application, we need to be able to search across multiple local
indexes.  We need this not so much for performance reasons, but because
of the particular needs of our project.  But the indexes, while sharing
the same schema can be vary different in terms of size and distribution
of documents.  By that I mean that some indexes may have a lot more
documents about some topic while others will have more documents about
other topics.  We want to be able add documents to the individual
indexes as well.  I can provide more detail about our project is
necessary.  Thus, the Distributed Search feature with shards in
different cores seems to be an obvious solution except for the
limitation of distributed idf.

First, I want to make sure my understanding about the distributed idf
limitation are correct:  If your documents are spread across your shards
evenly, then the distribution of terms across the individual shards can
be assumed to be even enough not to matter.  If, as in our case, the
shards are not very uniform, then this limitation is magnified.  Even
though simplistic, do I have the basic idea?

We have hacked together something that allows us to read from multiple
indexes, but it isn't really a long-term solution.  It's just sort of
shoe-horned in there.   Here are some notes from the programmer who
worked on this:
Two custom files: EgranaryIndexReaderFactory.java and
EgranaryIndexReader.java
EgranaryIndexReader.java
No real work is done here. This class extends
lucene.index.MultiReader and overrides the directory() and getVersion()
methods inherited from IndexReader.
These methods don't  make sense for a MultiReader as they only return
a single value. However, Solr expects Readers to have these methods.
directory() was
overridden to return a call to directory() on the first reader in the
subreader list. The same was done for getVersion(). This hack makes any
use of these methods
by Solr somewhat pointless.

EgranaryIndexReaderFactory.java
Overrides the newReader(Directory indexDir, boolean readOnly) method
The expected behavior of this method is to construct a Reader from
the index at indexDir.
However, this method ignores indexDir and reads a list of indexDirs
from the solrconfig.xml file.
These indices are used to create a list of lucene.index.IndexReader
classes. This list is then used to create the EgranaryIndexReader.

So the second questions is: Does anybody have other ideas about how we
might solve this problem?  Is distributed search still our best bet?

Thanks for your thoughts!
Brent

46 matches

Mail list logo