Re:The search response time is too loong

2010-09-27 Thread newsam
We used SOLR 1.4. All queries were excuted in SOLR back-end. I guess that I/O 
operations consume the time too much.

From: newsam new...@zju.edu.cn
Reply-To: solr-user@lucene.apache.orgnewsam new...@zju.edu.cn
To: solr-user@lucene.apache.org
Subject: Re:The search response time is too loong
Date: Mon, 27 Sep 2010 16:05:49 +0800

I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the 
response time is too long. Here is my scenario:
1. The index file is 8.2G. The doc num is 6110745.
2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem.

I used Key:* to query all records by localhost:8080. The response time is 
68703 milliseconds. The cpu load is 50% and mem useage is over 400M.

Any comments are welcomed.


 

RE: spellcheck on multiple fields?

2010-09-27 Thread Markus Jelsma
You can use copyField to get multiple fields in the field you use for spell 
checking, don't forget to set it to multiValued. 
 
-Original message-
From: Savannah Beckett savannah_becket...@yahoo.com
Sent: Mon 27-09-2010 10:08
To: solr-user@lucene.apache.org; 
Subject: spellcheck on multiple fields?

Is it possible to do spellcheck on multiple fields in my solr index?  If so, 
how?  The following setup works for only one field:
    
lst name=spellchecker
  str name=namedefault/str
  str name=classnamesolr.IndexBasedSpellChecker/str
  str name=fieldmyfield/str
  str name=spellcheckIndexDir./spellchecker1/str
  str name=accuracy0.5/str
  str name=buildOnCommittrue/str
    /lst


Thanks.


      

Re: TokenFilter that removes payload ?

2010-09-27 Thread Teruhiko Kurosaka
Robert  Erik,
I appreciate your suggestions but we use Type for other purpose.
Also, the product is out and we can't change the design so easily.

So it seems the conclusion there is no such TokenFilter.
I'll write one.

Thanks.

On Sep 27, 2010, at 1:00 PM, Robert Muir wrote:

 On Sun, Sep 26, 2010 at 11:49 PM, Teruhiko Kurosaka k...@basistech.comwrote:
 
 
 As I understand it, payloads go to the Lucene index.
 In most cases, the part-of-speech tags are not used if
 retrieved by the search applications.  So they shouldn't
 go to the index.  So I'd like to know if there is an
 existing TokenFilter that does this.  Otherwise, I'd like
 to write one.
 
 
 I agree with Erick, I think a better approach would be to put the part of
 speech tags into another attribute.
 
 For example, you can put them in TypeAttribute, which is not stored in the
 index by default.
 Then, if the user wants to store them in the index, they just add
 TypeAsPayloadTokenFilterFactory, which copies the type into the payload...
 but otherwise they would not be stored.
 
 -- 
 Robert Muir
 rcm...@gmail.com


T. Kuro Kurosaka, 415-227-9600x122, 617-386-7122(direct)





Re: Solr UIMA integration

2010-09-27 Thread maheshkumar

Hi Tommaso,

All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator,
Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn

AlchemyAPIAnnotator:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator
OpenCalaisAnnotator:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator
Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger
WhitespaceTokenizer:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer

solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima

I am using the the latest Solr version checkout from svn i guess it is
greater than 1.4.1.

Tommaso, is it possible for you to upload all the dependency jar @
http://code.google.com/p/solr-uima/downloads/list.

Thanks
Mahesh




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multi-lingual auto-complete?

2010-09-27 Thread Andy
I want to provide auto-complete to users when they're inputting tags. The 
auto-complete tag suggestions would be based on tags that are already in the 
system.

Multiple tags are separated by commas. A single tag could contain multiple 
words such as Apple computer.

One issue is that a tag could be in multiple languages, including both 
languages (e.g. English, French) that use whitespace as word separator and 
languages that don't (e.g. CJK)

An example of such a multi-lingual tag is Apple 电脑.

If a user types apple, I'd like the autocomplete suggestions to include both 
Apple computer (ie. matches are case insensitive) and green apple (ie. 
matches aren't restricted to prefixes). And a user typing 电脑 should match 
Apple 电脑.

Is it possible to do that? I read the article:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/

In that article KeywordTokenizerFactor is used. If I changed it to CJKTokenizer 
would that work? 

With an input of Apple 电脑, what would CJKTokenizer produce?

-is it Apple, 电, 脑 ?
or
- is it A, p, p, l, e, 电, 脑 ?

Any help would be greatly appreciated.

Andy





Re: Concurrent DB updates and delta import misses few records

2010-09-27 Thread Shawn Heisey
 You could get it from Solr, yes.  That didn't even occur to me because 
when I was designing my scripts, I didn't yet have a fully integrated 
Solr index. :)  With hindsight, I still wouldn't get it from Solr.  I 
would lose some flexibility and ease of administration.


It's certainly possible to store all build-related tracking information 
in the database.  The build system for our old search product did it 
that way.  I decided to go with simple text files in an NFS-mounted 
directory for the rewrite.  It's easier for me to administer, just ssh 
to a server and examine or modify simple one-line text files.  On the 
script side, the files get read into a Perl hash.  With the old system, 
I found it cumbersome to go through the database interfaces.  The only 
thing that's still in the database is the delete table, because it is 
populated by triggers on the metadata table.





On 9/23/2010 12:48 AM, Shashikant Kore wrote:

Thanks for the pointer, Shawn.  It, definitely, is useful.

I am wondering if you could retrieve minDid from the solr rather than
storing it externally. Max id from Solr index and max id from DB should
define the lower and upper thresholds, respectively, of the delta range. Am
I missing something?




Re: Re:The search response time is too loong

2010-09-27 Thread kenf_nc

mem usage is over 400M, do you mean Tomcat mem size? If you don't give your
cache sizes enough room to grow you will choke the performance. You should
adjust your Tomcat settings to let the cache grow to at least 1GB or better
would be 2GB. You may also want to look into 
http://wiki.apache.org/solr/SolrCaching warming the cache  to make the first
time call a little faster. 

For comparison, I also have about 8GB in my index but only 2.8 million
documents. My search query times on a smaller box than you specify are 6533
milliseconds on an unwarmed (newly rebooted) instance. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re:The search response time is too loong

2010-09-27 Thread Timothy Potter
Also, how many rows are you requesting at one time? I've seen cases where
the query time is blazing fast and the response writing is terribly slow
because of too many documents being sent in the response.

On Mon, Sep 27, 2010 at 6:37 AM, kenf_nc ken.fos...@realestate.com wrote:


 mem usage is over 400M, do you mean Tomcat mem size? If you don't give
 your
 cache sizes enough room to grow you will choke the performance. You should
 adjust your Tomcat settings to let the cache grow to at least 1GB or better
 would be 2GB. You may also want to look into
 http://wiki.apache.org/solr/SolrCaching warming the cache  to make the
 first
 time call a little faster.

 For comparison, I also have about 8GB in my index but only 2.8 million
 documents. My search query times on a smaller box than you specify are 6533
 milliseconds on an unwarmed (newly rebooted) instance.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Re-The-search-response-time-is-too-loong-tp1587395p1588554.html
 Sent from the Solr - User mailing list archive at Nabble.com.



urgent SOLR query server request hangs

2010-09-27 Thread Bharat Jain
Hi,
   We are running into issues with SOLR queries. Our solr queries just hang.
We are using SOLR 1.3 and below is the stack trace from threaddump. We are
clueless about what can be causing this issue. We are in the midst of
firefighting with our customer and any help is appreciated. Thanks,Bharat

TP-Processor113 daemon prio=3 tid=0x071c3400 nid=0x134
runnable [0xfd7ed72a..0xfd7ed72a3920]
   java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
- locked 0xfd7f26c1caf0 (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:687)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1064)
- locked 0xfd7f2a260c50 (a 
sun.net.www.protocol.http.HttpURLConnection)
at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:373)
at 
com.xxx..search.solr.SolrSearchServiceImpl.query(SolrSearchServiceImpl.java:271)
at com.xxx..search.Searchable.query(Searchable.java:460)
at 
com.xxx..search.JobReqSearchObject.query(JobReqSearchObject.java:903)




Thanks
Bharat Jain


Is Solr right for our project?

2010-09-27 Thread Mike Thomsen
(I apologize in advance if I missed something in your documentation,
but I've read through the Wiki on the subject of distributed searches
and didn't find anything conclusive)

We are currently evaluating Solr and Autonomy. Solr is attractive due
to its open source background, following and price. Autonomy is
expensive, but we know for a fact that it can handle our distributed
search requirements perfectly.

What we need to know is if Solr has capabilities that match or roughly
approximate Autonomy's Distributed Search Handler. What it does it
acts as a front-end for all of Autonomy's IDOL search servers (which
correspond in this scenario to Solr shards). It is configured to know
what is on each shard, which servers hold each shard and intelligently
farms out queries based on that configuration. There is no need to
specify which IDOL servers to hit while querying; the DiSH just knows
where to go. Additionally, I believe in cases where an index piece is
mirrored, it also monitors server health and falls back intelligently
on other backup instances of a shard/index piece based on that.

I'd appreciate it if someone can give me a frank explanation of where
Solr stands in this area.

Thanks,

Mike


Re: Is Solr right for my business situation ?

2010-09-27 Thread Walter Underwood
When do you need to deploy?

As I understand it, the spatial search in Solr is being rewritten and is slated 
for Solr 4.0, the release after next.

The existing spatial search has some serious problems and is deprecated.

Right now, I think the only way to get spatial search in Solr is to deploy a 
nightly snapshot from the active development on trunk. If you are deploying a 
year from now, that might change.

There is not any support for SQL-like statements or for joins. The best 
practice for Solr is to think of your data as a single table, essentially 
creating a view from your database. The rows become Solr documents, the columns 
become Solr fields.

wunder

On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:

 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion and 
 updation of data, the querying is where our main concerns lie.  Since we have 
 spatial data, the index building takes hours and hours for such tables.
 
 That's when we thought of moving away from standard rdbms and thought of 
 trying something different and fast. 
 My last week has been spent in a journey reading through bigtable to hadoop 
 to hbase, to hive and then finally landed on solr. As far as I am in my 
 tests, it looks pretty good, but I have a few unanswered questions still. 
 Trying this group for them  :)  (I am sure I can find some answers if I 
 read/google more on the topic, but now I m being lazy and feel asking the 
 people who are already using it/or perhaps developing it is a better bet).
 
 1. Can I get my solr instance to load data (fresh data for indexing) from a 
 stream (imagine a mq kind of queue, or similar) ?
 2. Can I host my solr instance to use hbase as the database/file system (read 
 HDFS) ?
 3. are there somewhere any reports available (as in benchmarks ) for a solr 
 instance's performance ? 
 4. are there any APIs available which might help me apply ANSI sql kind of 
 statements to my solr data ? 
 
 It would be great if people could help share their experience in the area... 
 if it's too much trouble writing all of it, perhaps url would be easier... I 
 welcome all kinds of help here... any advice/suggestions are good ...
 
 Looking forward to your viewpoints..
 
 --raghav..
 **
  
 This message may contain confidential or proprietary information intended 
 only for the use of the 
 addressee(s) named above or may contain information that is legally 
 privileged. If you are 
 not the intended addressee, or the person responsible for delivering it to 
 the intended addressee, 
 you are hereby notified that reading, disseminating, distributing or copying 
 this message is strictly 
 prohibited. If you have received this message by mistake, please immediately 
 notify us by  
 replying to the message and delete the original message and any copies 
 immediately thereafter. 
 
 Thank you. 
 **
  
 CLLD
 






RE: bi-grams for common terms - any analyzers do that?

2010-09-27 Thread Burton-West, Tom
Hi Jonathan,

 I'm afraid I'm having trouble understanding   if the analyzer returns more 
 than one position back from a queryparser token

I'm not sure if the queryparser forms a phrase query without explicit phrase 
quotes is a problem for me, I had no idea it happened until now, never 
noticed, and still don't really understand in what circumstances it happens.

The problem I had was for a Boolean query l'art AND historie that the 
WordDelimiterFilter tokenized l'art  as two tokens l at position 1 and 
art at position 2.   So the queryparser decided this means a phrase query for 
l followed immediately by art.  See
http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance 
for details.  

This would happen whenever any token filter split a token into more than one 
token.  For example a filter that splits foo-bar into foo bar.  The 
exception is  SynonymFilter or something like it.  In the case of 
SynonymFilter, its not really a case of splitting one token into multiple 
tokens, but given one token of input, it outputs all the synonyms of the term.  
However all the tokens have the same position attribute. (see: 
http://www.lucidimagination.com/search/document/CDRG_ch05_5.6.19?q=synonym%20filter)

 So for example for the string the small thing  if you had a synonym list for 
small:
small=tiny,teeny

input:
postion|1   |2|3
token  |the |small|thing
Would output

postion|1   |2|2|2|3
token  |the |small| tiny|teeny|thing

In this case when the queryParser gets back small teeny tiny  since they have 
the same position, they are not turned into a phrase query.

for l'art

input
postion|1 
token  |l'art

output
postion|1|2 
token  |l|art
In this case there are two tokens with different positions so it treats them as 
a phrase query.

Tom Burton-West


RE: bi-grams for common terms - any analyzers do that?

2010-09-27 Thread Burton-West, Tom
Hi Yonik,

If the new autoGeneratePhraseQueries is off, position doesn't matter, and 
the query will 
be treated as index OR reader.

Just wanted to make sure, in Solr does autoGeneratePhraseQueries = off treat 
the query with the *default* query operator as set in SolrConfig rather than 
necessarily using the Boolean OR operator?

i.e.  if solrQueryParser defaultOperator=AND/
 and autoGeneratePhraseQueries = off 

then IndexReader - index  reader - index AND reader

Tom




Re: The search response time is too loong

2010-09-27 Thread Simon Willnauer
2010/9/27 newsam new...@zju.edu.cn:
 I have setup a SOLR searcher instance with Tomcat 5.5.21. However, the 
 response time is too long. Here is my scenario:
 1. The index file is 8.2G. The doc num is 6110745.
 2. DELL Server: Intel(R) Xeon(TM) CPU (4 cores) 3.00GHZ, 6G Mem.

 I used Key:* to query all records by localhost:8080. The response time is 
 68703 milliseconds. The cpu load is 50% and mem useage is over 400M.

If you wanna get all records use q=*:* instead of Key:*  that should
give you faster results - way faster :)

Why are you actually requesting all results and how many of them are
you fetching? Maybe it would be a good idea to explain your usecase /
problem first.

simon


 Any comments are welcomed.





Question Related to sorting on Date

2010-09-27 Thread Ahson Iqbal
hi all

I have a question related to sorting of date field i have Date field  that is 
indexed like a string and look like 5/2/2008 4:33:30 PM i want  to do sorting 
on this field on the basis of date, time does not  matters. any suggestion how 
i 
could ignore the time part from this field  and just sort on the date?


  

Re: Is Solr right for my business situation ?

2010-09-27 Thread Grant Ingersoll
Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?
 
 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

 
 The existing spatial search has some serious problems and is deprecated.
 
 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.
 
 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

 
 wunder
 
 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion 
 and updation of data, the querying is where our main concerns lie.  Since we 
 have spatial data, the index building takes hours and hours for such tables.
 
 That's when we thought of moving away from standard rdbms and thought of 
 trying something different and fast. 
 My last week has been spent in a journey reading through bigtable to hadoop 
 to hbase, to hive and then finally landed on solr. As far as I am in my 
 tests, it looks pretty good, but I have a few unanswered questions still. 
 Trying this group for them  :)  (I am sure I can find some answers if I 
 read/google more on the topic, but now I m being lazy and feel asking the 
 people who are already using it/or perhaps developing it is a better bet).
 
 1. Can I get my solr instance to load data (fresh data for indexing) from a 
 stream (imagine a mq kind of queue, or similar) ?

Yes, with a little bit of work.

 2. Can I host my solr instance to use hbase as the database/file system 
 (read HDFS) ?

Probably, but I doubt it will be fast.  Local disk is usually the best.  100+ M 
rows is large but not unreasonable.

 3. are there somewhere any reports available (as in benchmarks ) for a solr 
 instance's performance ? 

You can probably search the web for these.  I've personally seen several 
installs w/ 1B+ docs and subsecond search and faceting and heard of others.  
You might look at the stuff the Hathi trust has put up.  

 4. are there any APIs available which might help me apply ANSI sql kind of 
 statements to my solr data ? 

No.  Question back?  What kinds of things are you trying to do?

 
 It would be great if people could help share their experience in the area... 
 if it's too much trouble writing all of it, perhaps url would be easier... I 
 welcome all kinds of help here... any advice/suggestions are good ...
 
 Looking forward to your viewpoints..
 
 --raghav..
 **
  
 This message may contain confidential or proprietary information intended 
 only for the use of the 
 addressee(s) named above or may contain information that is legally 
 privileged. If you are 
 not the intended addressee, or the person responsible for delivering it to 
 the intended addressee, 
 you are hereby notified that reading, disseminating, distributing or copying 
 this message is strictly 
 prohibited. If you have received this message by mistake, please immediately 
 notify us by  
 replying to the message and delete the original message and any copies 
 immediately thereafter. 
 
 Thank you. 
 **
  
 CLLD
 
 
 
 
 

--
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8



Re: Is Solr right for my business situation ?

2010-09-27 Thread Jonathan Rochkind

Grant Ingersoll wrote:


There is now group-by capabilities in trunk as well, which may or may not help.
  
Really, the field collapsing stuff has been committed to trunk finally? 
Or are you talking about something else?


If it's the field collapsing stuff, and it's been committed to trunk, 
does that mean it'll be in the 3.0 release?


Jonathan

  


Re: Is Solr right for my business situation ?

2010-09-27 Thread Ravi Julapalli
Hi Jonathan,

Field collpasing is available in 1.4 by applying patch 
https://issues.apache.org/jira/browse/SOLR-236

-Ravi





From: Jonathan Rochkind rochk...@jhu.edu
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 9:18:20 PM
Subject: Re: Is Solr right for my business situation ?

Grant Ingersoll wrote:
 
 There is now group-by capabilities in trunk as well, which may or may not 
help.
  
Really, the field collapsing stuff has been committed to trunk finally? Or are 
you talking about something else?

If it's the field collapsing stuff, and it's been committed to trunk, does that 
mean it'll be in the 3.0 release?

Jonathan

  


  

Re: Is Solr right for my business situation ?

2010-09-27 Thread Jonathan Rochkind
Right, I know, I was curious about it's current closeness to being in 
main distro, not a patch.  Among other things, when those who know 
better decide it goes in core distro, that makes me more comfortable 
that they've decided it works acceptably, and also makes more more 
comfortable that it will continue to be supported in _future_ versions 
without someone having to prepare a new patch.


Ravi Julapalli wrote:

Hi Jonathan,

Field collpasing is available in 1.4 by applying patch 
https://issues.apache.org/jira/browse/SOLR-236


-Ravi





From: Jonathan Rochkind rochk...@jhu.edu
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 9:18:20 PM
Subject: Re: Is Solr right for my business situation ?

Grant Ingersoll wrote:
  
There is now group-by capabilities in trunk as well, which may or may not 


help.
  
 

Really, the field collapsing stuff has been committed to trunk finally? Or are 
you talking about something else?


If it's the field collapsing stuff, and it's been committed to trunk, does that 
mean it'll be in the 3.0 release?


Jonathan

  
 




  
  


Re: Is Solr right for my business situation ?

2010-09-27 Thread PeterKerk

@Walter Underwood:

Walter Underwood wrote:
 
 Right now, I think the only way to get spatial search in Solr is to deploy
 a nightly snapshot from the active development on trunk.
 

Could you give me the link to this trunk, I need it very much!

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592330.html
Sent from the Solr - User mailing list archive at Nabble.com.


resources for relevancy score tuning

2010-09-27 Thread Luke Crouch
Can someone share some good resources (books, articles, links, etc.) for
tuning relevancy scores with multiple factors? I'm playing with different
fields and boosts in my 'qf', 'pf', and 'bf' defaults but I feel like I'm
shooting in the dark. http://wiki.apache.org/solr/SolrRelevancyCookbook has
a couple of individual tips, but I need some help devising a good
combination of boosts across multiple fields for scoring.

E.g., I want to tweak scoring derived from a primary identifier field, a
name field, a description field, a rating field, and a number of downloads
field. But it seems when I adjust any single factor, it affects too many
others.

Thanks,
-L


Re: Is Solr right for my business situation ?

2010-09-27 Thread Dennis Gearon
Wow, that is a relief!

I was going to have to look at ElasticSearch instead.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/27/10, Grant Ingersoll gsing...@apache.org wrote:

 From: Grant Ingersoll gsing...@apache.org
 Subject: Re: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org
 Date: Monday, September 27, 2010, 12:35 PM
 Inline.
 
 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:
 
  When do you need to deploy?
  
  As I understand it, the spatial search in Solr is
 being rewritten and is slated for Solr 4.0, the release
 after next.
 
 It will be in 3.x, the next release
 
  
  The existing spatial search has some serious problems
 and is deprecated.
  
  Right now, I think the only way to get spatial search
 in Solr is to deploy a nightly snapshot from the active
 development on trunk. If you are deploying a year from now,
 that might change.
  
  There is not any support for SQL-like statements or
 for joins. The best practice for Solr is to think of your
 data as a single table, essentially creating a view from
 your database. The rows become Solr documents, the columns
 become Solr fields.
 
 There is now group-by capabilities in trunk as well, which
 may or may not help.
 
  
  wunder
  
  On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra
 wrote:
  
  I am sure these kind of questions keep coming to
 you guys, but I want to raise the same question in a
 different context...my own business situation.
  I am very very new to solr and though I have tried
 to read through the documentation, I have nowhere near
 completing the whole read.
  
  The need is like this - 
  
  We have a huge rdbms database/table. A single
 table perhaps houses 100+ million rows. Though oracle is
 doing a fine job of handling the insertion and updation of
 data, the querying is where our main concerns lie. 
 Since we have spatial data, the index building takes hours
 and hours for such tables.
  
  That's when we thought of moving away from
 standard rdbms and thought of trying something different and
 fast. 
  My last week has been spent in a journey reading
 through bigtable to hadoop to hbase, to hive and then
 finally landed on solr. As far as I am in my tests, it looks
 pretty good, but I have a few unanswered questions still.
 Trying this group for them  :)  (I am sure I can
 find some answers if I read/google more on the topic, but
 now I m being lazy and feel asking the people who are
 already using it/or perhaps developing it is a better bet).
  
  1. Can I get my solr instance to load data (fresh
 data for indexing) from a stream (imagine a mq kind of
 queue, or similar) ?
 
 Yes, with a little bit of work.
 
  2. Can I host my solr instance to use hbase as the
 database/file system (read HDFS) ?
 
 Probably, but I doubt it will be fast.  Local disk is
 usually the best.  100+ M rows is large but not
 unreasonable.
 
  3. are there somewhere any reports available (as
 in benchmarks ) for a solr instance's performance ? 
 
 You can probably search the web for these.  I've
 personally seen several installs w/ 1B+ docs and subsecond
 search and faceting and heard of others.  You might
 look at the stuff the Hathi trust has put up.  
 
  4. are there any APIs available which might help
 me apply ANSI sql kind of statements to my solr data ? 
 
 No.  Question back?  What kinds of things are you
 trying to do?
 
  
  It would be great if people could help share their
 experience in the area... if it's too much trouble writing
 all of it, perhaps url would be easier... I welcome all
 kinds of help here... any advice/suggestions are good ...
  
  Looking forward to your viewpoints..
  
  --raghav..
 
 **
 
  This message may contain confidential or
 proprietary information intended only for the use of the 
  addressee(s) named above or may contain
 information that is legally privileged. If you are 
  not the intended addressee, or the person
 responsible for delivering it to the intended addressee, 
  you are hereby notified that reading,
 disseminating, distributing or copying this message is
 strictly 
  prohibited. If you have received this message by
 mistake, please immediately notify us by  
  replying to the message and delete the original
 message and any copies immediately thereafter. 
  
  Thank you. 
 
 **
 
  CLLD
  
  
  
  
  
 
 --
 Grant Ingersoll
 http://lucenerevolution.org Apache Lucene/Solr
 Conference, Boston Oct 7-8
 



Re: Is Solr right for my business situation ?

2010-09-27 Thread PeterKerk

Ah, totally looked over that news: spatial search in 3.x! :-D :-D

Any idea already when this will be released? 

Awesome to hear that it has been moved forward! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592448.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question Related to sorting on Date

2010-09-27 Thread Peter Sturge
Hi Ahson,

You'll really want to store an additional date field (make it a
TrieDateField type) that has only the date, and in the reverse order
from how you've shown it. You can still keep the one you've got, just
use it only for 'human viewing' rather than sorting.
Something like:
20080205  if your example is 5 Feb, or 20080502 for May 2nd.

This way, the parsing is most efficient, you won't have to do any
tricky parsing at sort time, and, when your index gets large, your
sorted searches will remain fast.




On Mon, Sep 27, 2010 at 7:45 PM, Ahson Iqbal mianah...@yahoo.com wrote:
 hi all

 I have a question related to sorting of date field i have Date field  that is
 indexed like a string and look like 5/2/2008 4:33:30 PM i want  to do 
 sorting
 on this field on the basis of date, time does not  matters. any suggestion 
 how i
 could ignore the time part from this field  and just sort on the date?





DIH XML Entity Help (Newbie)

2010-09-27 Thread audev

I am trying to configure the data-config.xml using the XPathEntityProcessor
to index nested xml entities such as the following:
study
 intervention
intervention_typeDrug/intervention_type
intervention_namefentanyl sublingual spray/intervention_name
  /intervention
  intervention
intervention_typeOther/intervention_type
intervention_namequestionnaire administration/intervention_name
  /intervention
/study

The data-config.xml looks like this:
entity name=intervention url=${studiesdir.fileAbsolutePath}
processor=XPathEntityProcessor forEach=/clinical_study/intervention/
field column=intervention_type_t  multiValued=true  
xpath=/clinical_study/intervention/intervention_type /  
field column=intervention_name_t   multiValued=true 
xpath=/clinical_study/intervention/intervention_name /
/entity

but it only indexes the first occurrence of  intervention_type_t and
intervention_name_t and they are placed as children of root entity instead
of being children of intervention.

I would appreciate your help!

Thanks in advance,

Aurelia
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-XML-Entity-Help-Newbie-tp1592723p1592723.html
Sent from the Solr - User mailing list archive at Nabble.com.


Need help with spellcheck city name

2010-09-27 Thread Savannah Beckett
Hi,
  I have city name as a text field, and I want to do spellcheck on it.  I use 
setting in http://wiki.apache.org/solr/SpellCheckComponent

If I setup city name as text field and do spell check on San Jos for San 
Jose, 
I get suggestion for Jos as ojos.  I checked the extendedresult and I found 
that Jose is in the middle of all 10 suggestions in term of score and 
frequency.  I then set city name as string field, and spell check again, I got 
Van for San and Ross for Jos, which is weird because San is correct.  


How do you setup spellchecker to spellcheck city names?  City name can have 
multiple words.
Thanks.


  

Re: Need help with spellcheck city name

2010-09-27 Thread Tom Hill
Maybe process the city name as a single token?

On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
savannah_becket...@yahoo.com wrote:
 Hi,
   I have city name as a text field, and I want to do spellcheck on it.  I use
 setting in http://wiki.apache.org/solr/SpellCheckComponent

 If I setup city name as text field and do spell check on San Jos for San 
 Jose,
 I get suggestion for Jos as ojos.  I checked the extendedresult and I found
 that Jose is in the middle of all 10 suggestions in term of score and
 frequency.  I then set city name as string field, and spell check again, I got
 Van for San and Ross for Jos, which is weird because San is correct.


 How do you setup spellchecker to spellcheck city names?  City name can have
 multiple words.
 Thanks.





Re: Need help with spellcheck city name

2010-09-27 Thread Savannah Beckett
No, it doesn't work, I got weird result. I set my city name field to be parsed 
as a token as following:

    fieldType name=autocomplete1 class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
  /analyzer
    /fieldType

I got following result for spellcheck:

lstname=spellcheck 
-     lstname=suggestions
-         lstname=san
              intname=numFound1/int 
              intname=startOffset0/int 
              intname=endOffset3/int 
-             arrname=suggestion
              strswan/str 
          /arr
      /lst
- lstname=clar
              intname=numFound1/int 
              intname=startOffset4/int 
   intname=endOffset8/int 
                arrname=suggestion
     strclark/str 
 /arr
      /lst
  /lst

 




From: Tom Hill solr-l...@worldware.com
To: solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 3:52:48 PM
Subject: Re: Need help with spellcheck city name

Maybe process the city name as a single token?

On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
savannah_becket...@yahoo.com wrote:
 Hi,
   I have city name as a text field, and I want to do spellcheck on it.  I use
 setting in http://wiki.apache.org/solr/SpellCheckComponent

 If I setup city name as text field and do spell check on San Jos for San 
Jose,
 I get suggestion for Jos as ojos.  I checked the extendedresult and I found
 that Jose is in the middle of all 10 suggestions in term of score and
 frequency.  I then set city name as string field, and spell check again, I got
 Van for San and Ross for Jos, which is weird because San is correct.


 How do you setup spellchecker to spellcheck city names?  City name can have
 multiple words.
 Thanks.






  

Re: Need help with spellcheck city name

2010-09-27 Thread Erick Erickson
Hmmm, did you rebuild your spelling index after the config changes?

And it really looks like somehow you're getting results from a field other
than city. Are you also sure that your cityname field is of type
autocomplete1?

Shooting in the dark here, but these results are so weird that I suspect
it's
something fundamental

Best
Erick

On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett 
savannah_becket...@yahoo.com wrote:

 No, it doesn't work, I got weird result. I set my city name field to be
 parsed
 as a token as following:

 fieldType name=autocomplete1 class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType

 I got following result for spellcheck:

 lstname=spellcheck
 - lstname=suggestions
 - lstname=san
   intname=numFound1/int
   intname=startOffset0/int
   intname=endOffset3/int
 - arrname=suggestion
   strswan/str
   /arr
   /lst
 - lstname=clar
   intname=numFound1/int
   intname=startOffset4/int
intname=endOffset8/int
 arrname=suggestion
  strclark/str
  /arr
   /lst
   /lst





 
 From: Tom Hill solr-l...@worldware.com
 To: solr-user@lucene.apache.org
 Sent: Mon, September 27, 2010 3:52:48 PM
 Subject: Re: Need help with spellcheck city name

 Maybe process the city name as a single token?

 On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
 savannah_becket...@yahoo.com wrote:
  Hi,
I have city name as a text field, and I want to do spellcheck on it.  I
 use
  setting in http://wiki.apache.org/solr/SpellCheckComponent
 
  If I setup city name as text field and do spell check on San Jos for
 San
 Jose,
  I get suggestion for Jos as ojos.  I checked the extendedresult and I
 found
  that Jose is in the middle of all 10 suggestions in term of score and
  frequency.  I then set city name as string field, and spell check again,
 I got
  Van for San and Ross for Jos, which is weird because San is correct.
 
 
  How do you setup spellchecker to spellcheck city names?  City name can
 have
  multiple words.
  Thanks.
 
 
 







Re: Need help with spellcheck city name

2010-09-27 Thread Savannah Beckett
No, I checked, there is a city called Swan in Iowa.  So, it is getting from the 
city index, so is Clerk.  But why does it favor Swan than San?  Spellcheck get 
weird after I treat city name as one token.  If I do it in the old way, it let 
San go, and correct Jos as Ojos instead of Jose because Ojos is ranked as #1 
and 
Jose at the middle.  Any more suggestions?  Rank it by frequency first then 
score doesn't work neither.  


 


From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 5:24:25 PM
Subject: Re: Need help with spellcheck city name

Hmmm, did you rebuild your spelling index after the config changes?

And it really looks like somehow you're getting results from a field other
than city. Are you also sure that your cityname field is of type
autocomplete1?

Shooting in the dark here, but these results are so weird that I suspect
it's
something fundamental

Best
Erick

On Mon, Sep 27, 2010 at 8:05 PM, Savannah Beckett 
savannah_becket...@yahoo.com wrote:

 No, it doesn't work, I got weird result. I set my city name field to be
 parsed
 as a token as following:

        fieldType name=autocomplete1 class=solr.TextField
 positionIncrementGap=100
          analyzer type=index
            tokenizer class=solr.KeywordTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
          /analyzer
          analyzer type=query
            tokenizer class=solr.KeywordTokenizerFactory/
            filter class=solr.LowerCaseFilterFactory/
          /analyzer
        /fieldType

 I got following result for spellcheck:

 lstname=spellcheck
 -    lstname=suggestions
 -        lstname=san
              intname=numFound1/int
              intname=startOffset0/int
              intname=endOffset3/int
 -            arrname=suggestion
                  strswan/str
          /arr
      /lst
 -        lstname=clar
              intname=numFound1/int
              intname=startOffset4/int
        intname=endOffset8/int
                arrname=suggestion
          strclark/str
      /arr
      /lst
  /lst





 
 From: Tom Hill solr-l...@worldware.com
 To: solr-user@lucene.apache.org
 Sent: Mon, September 27, 2010 3:52:48 PM
 Subject: Re: Need help with spellcheck city name

 Maybe process the city name as a single token?

 On Mon, Sep 27, 2010 at 3:25 PM, Savannah Beckett
 savannah_becket...@yahoo.com wrote:
  Hi,
   I have city name as a text field, and I want to do spellcheck on it.  I
 use
  setting in http://wiki.apache.org/solr/SpellCheckComponent
 
  If I setup city name as text field and do spell check on San Jos for
 San
 Jose,
  I get suggestion for Jos as ojos.  I checked the extendedresult and I
 found
  that Jose is in the middle of all 10 suggestions in term of score and
  frequency.  I then set city name as string field, and spell check again,
 I got
  Van for San and Ross for Jos, which is weird because San is correct.
 
 
  How do you setup spellchecker to spellcheck city names?  City name can
 have
  multiple words.
  Thanks.
 
 
 








  

Re: Is Solr right for our project?

2010-09-27 Thread Jan Høydahl / Cominvent
Solr will match this in version 3.1 which is the next major release.
Read this page: http://wiki.apache.org/solr/SolrCloud for feature descriptions
Coming to a trunk near you - see https://issues.apache.org/jira/browse/SOLR-1873

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 27. sep. 2010, at 17.44, Mike Thomsen wrote:

 (I apologize in advance if I missed something in your documentation,
 but I've read through the Wiki on the subject of distributed searches
 and didn't find anything conclusive)
 
 We are currently evaluating Solr and Autonomy. Solr is attractive due
 to its open source background, following and price. Autonomy is
 expensive, but we know for a fact that it can handle our distributed
 search requirements perfectly.
 
 What we need to know is if Solr has capabilities that match or roughly
 approximate Autonomy's Distributed Search Handler. What it does it
 acts as a front-end for all of Autonomy's IDOL search servers (which
 correspond in this scenario to Solr shards). It is configured to know
 what is on each shard, which servers hold each shard and intelligently
 farms out queries based on that configuration. There is no need to
 specify which IDOL servers to hit while querying; the DiSH just knows
 where to go. Additionally, I believe in cases where an index piece is
 mirrored, it also monitors server health and falls back intelligently
 on other backup instances of a shard/index piece based on that.
 
 I'd appreciate it if someone can give me a frank explanation of where
 Solr stands in this area.
 
 Thanks,
 
 Mike



Re: FieldType for storing date

2010-09-27 Thread Chris Hostetter

: I was wondering what would be the best FieldType for storing date with a 
: millisecond precision that would allow me to sort and run range queries 
: against this field. We would like to achieve the best query performance, 
: minimal heap - fieldcache - requirements, good indexing throughput and 
: minimal index size in that order.

if you don't need sortMissingLast or sortMissingFirst then TrieDateField 
should be exactly what you are looking for.

: We could probably use TrieLongField, however, as we understand, this 
: doubles the heap requirements for fieldcache. Was wondering if there is 
: a clever way of achieving this without adding to the heap.

TrieDateField uses the long[] FieldCache, I'm not sure what you mean by 
doubles the heap requirements ... unless you are comparing to int ?

In that case: using TrieIntField seems like what you want?

(but if you are comparing to DateField, the FieldCache for TrieDateField 
is going to be a lot smaller)


-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!



Search Interface

2010-09-27 Thread Claudio Devecchi
Hi everybody,

I`m implementing my first solr engine for conceptual tests, I`m crawling my
wiki intranet to make some searches, the engine is working fine already, but
I need some interface to make my searchs.
Somebody knows where can I find some search interface just for
customizations?

Tks
-- 
Claudio Devecchi
flickr.com/cdevecchi


Re: Solr UIMA integration

2010-09-27 Thread Tommaso Teofili
Hi Maheshkumar,
I attached a patch for inclusion of this project as a Solr contrib module
[1] , there you can find the patch to apply to the Solr trunk along with
needed jars (attached as a zip archive).
I think that your issue could be related to the fact that GC project
dependency is from Solr 1.4.1, not from trunk, so the patch should fix it.
Hope this helps,
Tommaso

[1] : https://issues.apache.org/jira/browse/SOLR-2129

2010/9/27 maheshkumar maheshkuma...@gmail.com


 Hi Tommaso,

 All UIMA dependencies (uima-core,AlchemyAPIAnnotator, OpenCalaisAnnotator,
 Tagger, WhitespaceTokenizer) are 2.3.1-SNAPSHOT. All are checkout from svn

 AlchemyAPIAnnotator:
 http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator
 OpenCalaisAnnotator:
 http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator
 Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger
 WhitespaceTokenizer:
 http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer

 solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima

 I am using the the latest Solr version checkout from svn i guess it is
 greater than 1.4.1.

 Tommaso, is it possible for you to upload all the dependency jar @
 http://code.google.com/p/solr-uima/downloads/list.

 Thanks
 Mahesh




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1587660.html
 Sent from the Solr - User mailing list archive at Nabble.com.