Re: SOLR 1.4.1 - Issue with recognition of solr.solr.home system property

2010-07-17 Thread Tracy Flynn
That's a little telling 

INFO: Opening new SolrCore at /Users/johndoe/example1/solr/, 
dataDir=./solr/data/

Since I'm running with ~/example2 as the current working directory, then that 
would explain it.  Schema etc. is found in ~/example1/solr/conf, but the data 
is being managed in ~/example2/solr/data.

Is this a bug, or is there another setting I need?



On Jul 17, 2010, at 8:55 PM, Mark Miller wrote:

> What is your data dir set to? It should say in the start up logging. 
> 
> - Mark
> 
> http://www.lucidimagination.com (mobile)
> 
> On Jul 17, 2010, at 8:40 PM, Tracy Flynn  wrote:
> 
>> One more piece of information. I notice that it does look for the schema in 
>> ~/solr_example1/solr/conf. A fatal error is generated if 
>> ~/solr_example1/solr/conf  is removed. So, it appears to be localized to the 
>> writing of the index files.
>> 
>> 



Re: SOLR 1.4.1 - Issue with recognition of solr.solr.home system property

2010-07-17 Thread Mark Miller
What is your data dir set to? It should say in the start up logging. 

- Mark

http://www.lucidimagination.com (mobile)

On Jul 17, 2010, at 8:40 PM, Tracy Flynn  wrote:

> One more piece of information. I notice that it does look for the schema in 
> ~/solr_example1/solr/conf. A fatal error is generated if 
> ~/solr_example1/solr/conf  is removed. So, it appears to be localized to the 
> writing of the index files.
> 
> 


Re: SOLR 1.4.1 - Issue with recognition of solr.solr.home system property

2010-07-17 Thread Tracy Flynn
One more piece of information. I notice that it does look for the schema in 
~/solr_example1/solr/conf. A fatal error is generated if 
~/solr_example1/solr/conf  is removed. So, it appears to be localized to the 
writing of the index files.




SOLR 1.4.1 - Issue with recognition of solr.solr.home system property

2010-07-17 Thread Tracy Flynn
There appears to be a problem with the recognition of the 'solr.solr.home' 
property in SOLR 1.4.1 - or else I have a basic misunderstanding of how 
'solr.solr.home' is intended to work.

Conduct the following experiment.

Take the standard SOLR 1.4.1 distribution.

Suppose the home directory is /Users/johndoe.

1) Make two copies of the contents of the ./example subdirectory to some 
location, say ~/solr_example1 and ~/solr_example2.

2) Delete the subdirectories   ~/solr_example1/solr/data/index, 
~/solr_example2/solr/data/index

3) In ~/solr_example2, start up the SOLR server using something similar to:

java -Dsolr.solr.home=/Users/johndoe/example1/solr -jar start.jar

5) In ~/solr_example2/exampledocs, run './post.sh *.xml'

6) Examine the directories ~/example1/solr/data/index and 
~/example2/solr/data/index

Based on the setting of 'solr.solr.home' I would have expected the indexes to 
be created in ~/example1/solr/data/index. They are in fact created in 
~/example2/solr/data/index.

Do I misunderstand the usage of 'solr.solr.home' and how to set it, or is there 
a real issue here?

Any help and insight would be appreciated.

Tracy



Re: HTTP ERROR: 500 - java.lang.ArrayIndexOutOfBoundsException

2010-07-17 Thread Lance Norskog
You cannot sort on text fields, only string&number&date fields.  The
ArrayIndexOutOfBounds exception happens when there are more terms to
sort on than documents (I think?).

On Sat, Jul 17, 2010 at 3:11 PM, Koji Sekiguchi  wrote:
> (10/07/18 4:51), Girish wrote:
>>
>>  Hi Lance,
>>
>> Thanks for the reply!
>>
>> I checked the settings and I don't think it has multivalue setting. Here
>> is
>> the current field configuration:
>>
>> *> required="true" />
>>    > termVectors="true" />
>>    
>>    
>>    
>>    
>>    
>> **
>> *
>>
>>
>
> Tokenized field is one of multiValued type fields since
> multiple tokens (values) are generated in that field by
> tokenizer.
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>



-- 
Lance Norskog
goks...@gmail.com


Re: Spellcheck help

2010-07-17 Thread Lance Norskog
Spellchecking can also take a dictionary as its database. Is it
possible to create a dictionary of the terms you want suggested?

On Sat, Jul 17, 2010 at 10:40 AM,   wrote:
> Can anybody help me with this? :(
>
> -Original Message- From: Marc Ghorayeb
> Sent: Thursday, July 08, 2010 9:46 AM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck help
>
>
> Hello,I've been trying to get rid of a bug when using the spellcheck but so
> far with no success :(When searching for a word that starts with a number,
> for example "3dsmax", i get the results that i want, BUT the spellcheck says
> it is not correctly spelled AND the collation gives me "33dsmax". Further
> investigation shows that the spellcheck is actually only checking "dsmax"
> which it considers does not exist and gives me "3dsmax" for better results,
> but since i have spellcheck.collate = true, the collation that i show is
> "33dsmax" with the first 3 being the one discarded by the spellchecker...
> Otherwise, the spellcheck works correctly for normal words... any ideas?
> :(My spellcheck field is fairly classic, whitespace tokenizer, with
> lowercase filter...Any help would be greatly appreciated :)Thanks,Marc
> _
> Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
> http://www.messengersurvotremobile.com/?d=iPhone
>



-- 
Lance Norskog
goks...@gmail.com


Re: HTTP ERROR: 500 - java.lang.ArrayIndexOutOfBoundsException

2010-07-17 Thread Koji Sekiguchi

(10/07/18 4:51), Girish wrote:

  Hi Lance,

Thanks for the reply!

I checked the settings and I don't think it has multivalue setting. Here is
the current field configuration:

*






**
*

   

Tokenized field is one of multiValued type fields since
multiple tokens (values) are generated in that field by
tokenizer.

Koji

--
http://www.rondhuit.com/en/



Re: HTTP ERROR: 500 - java.lang.ArrayIndexOutOfBoundsException

2010-07-17 Thread Girish
 Hi Lance,

Thanks for the reply!

I checked the settings and I don't think it has multivalue setting. Here is
the current field configuration:

*
   
   
   
   
   
   
**
*

> *
> *Lance Norskog wrote:
>
> This can happen when there are multiple values in a field. Is 'first'
> a multi-valued field?
>
> Sorting only works on single-valued fields. After all, if there are
> multiple values, it can only sort on one field and there is no way to
> decide which one. So, make sure that 'field' has multiValued='false'
> in the field declaration. If this is the problem, you will have to fix
> your data and re-index.
>
> Is 'field' an analyzed text field? Then sorting definitely will not work.
>
> On Fri, Jul 16, 2010 at 6:54 PM, Girish Pandit  
>  wrote:
>
>
>  Hi,
>
> As soon as I add "sort=first+desc" parameter to the select clause, it throws
> ArrayIndexOutOfBound exception. Please suggest if I am missing anything.
> http://localhost:8983/solr/select?q=girish&start=0&indent=on&wt=json&sort=first+desc
>
> I have close to 1 million records indexed.
>
> Thanks
> Girish
>
>
>
>
>
>
>
>


-- 

Girish Pandit
610-517-5888
http://www.jiyasoft.com
http://www.photographypleasure.com/
http://www.photographypleasure.com/girish/


Re: How to speed up solr search speed

2010-07-17 Thread Shawn Heisey
 I don't know of a way to tell Solr to load all the indexes into 
memory, but if you were to simply read all the files at the OS level, 
that would do it. Under a unix OS, "cat * > /dev/null" would work. Under 
Windows, I can't think of a way to do it off the top of my head, but if 
you had Cygwin installed, you could use the Unix method. That's not 
really necessary to do, however. Just the act of running queries against 
the index will load the relevant bits into the disk cache, making 
subsequent queries go to RAM instead of disk.


With 10 cores at 1.5GB each, your total index is a little bigger than 
one of my static indexes. Performance might be reasonable with 8GB of 
total RAM, if the machine is running Linux/Unix and doing nothing but 
Solr, but would be better with 12-16GB. It would be important to set up 
the Solr caches properly. Here's mine:








The status page is a CGI script that I wrote which queries a couple of 
Solr pages on all my VMs. It's heavily tied into the central 
configuration used by my Solr build system, so it's not directly usable 
by the masses.


Thanks,
Shawn


On 7/17/2010 10:36 AM, marship wrote:

Hi. Shawn.
My indexes are smaller than yours. I only store "id" + "type" in indexes so each 
"core" index is about 1 - 1.5GB on disk.
I don't have so many servers/VPS as you have. In my option, my problem is not 
CPU. If possible, I prefer to add more memory to fit indexes in my server. At 
least at memory is cheaper. And I saw lots of my CPU time are wasted because no 
program can fullly use it.

Is there a way to tell solr to load all indexes into memory? like memory 
directory in lucene. That would be breezing fast
Btw, how do you get that status page?




RE: Getting facets count on multiple fields by doing a "Group By"

2010-07-17 Thread Jonathan Rochkind
> I needed to get counts based GRPID clubbed with GRPNAME not different sets


Perhaps using facet.query to write your own "sub queries" that will collect 
whatever you want?





Re: Spellcheck help

2010-07-17 Thread dekay999

Can anybody help me with this? :(

-Original Message- 
From: Marc Ghorayeb

Sent: Thursday, July 08, 2010 9:46 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck help


Hello,I've been trying to get rid of a bug when using the spellcheck but so 
far with no success :(When searching for a word that starts with a number, 
for example "3dsmax", i get the results that i want, BUT the spellcheck says 
it is not correctly spelled AND the collation gives me "33dsmax". Further 
investigation shows that the spellcheck is actually only checking "dsmax" 
which it considers does not exist and gives me "3dsmax" for better results, 
but since i have spellcheck.collate = true, the collation that i show is 
"33dsmax" with the first 3 being the one discarded by the spellchecker... 
Otherwise, the spellcheck works correctly for normal words... any ideas? 
:(My spellcheck field is fairly classic, whitespace tokenizer, with 
lowercase filter...Any help would be greatly appreciated :)Thanks,Marc

_
Messenger arrive enfin sur iPhone ! Venez le télécharger gratuitement !
http://www.messengersurvotremobile.com/?d=iPhone 



Re: Getting facets count on multiple fields by doing a "Group By"

2010-07-17 Thread Rajinimaski

@Hemanth,  I understand the functionality of "fl" and also about "facet"...
In my condition example i have mentioned that i need faceted COUNT  that
will be like merging both the fileds 

Still if I am not clear to you then find below my solr search result's
console view and with what I am finding for indeed:

numfound:4 


field <"GrpId"=1/>
field <"Grpname"=A/>


field <"GrpId"=2/>
field <"Grpname"=A/>


field <"GrpId"=3/>
field <"Grpname"=A/>


field <"GrpId"=4/>
field <"Grpname"=B/>






grpid-1<1>
grpid-2<1>
grpid-3<1>
grpid-4<1>

Grpname-A<3>
Grpname-B<1>




But what i needed is:

grpid-1,Grpname-A<1>
grpid-2,Grpname-A<1>
grpid-3,Grpname-A<1>
grpid-4,Grpname-B<1>

I needed to get counts based GRPID clubbed with GRPNAME not different sets

 Please let me know any solution for this ...


Regards,
Rajani Maski











On Fri, Jul 16, 2010 at 4:32 PM, hemant.verma [via Lucene] <
ml-node+972194-1943694059-326...@n3.nabble.com
> wrote:

> You may be confused what facet is for.
> As your example shows, there is no need for you to use facets.
> Simply do the query and use &fl=Name, ID to get the desired results.
>
> --
> View message @
> http://lucene.472066.n3.nabble.com/Getting-facets-count-on-multiple-fields-by-doing-a-Group-By-tp972105p972194.html
> To unsubscribe from Getting facets count on multiple fields by doing a
> "Group By", click here< (link removed) =>.
>
>
>

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-facets-count-on-multiple-fields-by-doing-a-Group-By-tp972105p975133.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re:Re: How to speed up solr search speed

2010-07-17 Thread marship
Hi. Shawn.
My indexes are smaller than yours. I only store "id" + "type" in indexes so 
each "core" index is about 1 - 1.5GB on disk.
I don't have so many servers/VPS as you have. In my option, my problem is not 
CPU. If possible, I prefer to add more memory to fit indexes in my server. At 
least at memory is cheaper. And I saw lots of my CPU time are wasted because no 
program can fullly use it. 

Is there a way to tell solr to load all indexes into memory? like memory 
directory in lucene. That would be breezing fast
Btw, how do you get that status page? 

Thanks.
Regards.
Scott



在2010-07-17 23:38:58,"Shawn Heisey"  写道:
>  On 7/17/2010 3:28 AM, marship wrote:
>> Hi. Peter and All.
>> I merged my indexes today. Now each index stores 10M document. Now I only 
>> have 10 solr cores.
>> And I used
>>
>> java -Xmx1g -jar -server start.jar
>> to start the jetty server.
>
>How big are the indexes on each of those cores? You can easily get this 
>info from a URL like this (assuming the bundled Jetty and its standard 
>port):
>
>http://hostname:8983/solr/corename/admin/replication/index.jsp
>
>If your server only has 4GB of RAM, low memory is almost guaranteed to 
>be the true problem. With low ram levels, the disk cache is nearly 
>useless, and high disk I/O is the symptom.
>
>My system runs as virtual machines. I've got six static indexes each a 
>little over 12GB in size (7 million rows) and an incremental index that 
>gets to about 700MB (300,000 rows). I've only got one active index core 
>per virtual machine, except when doing a full reindex, which is rare. 
>Each static VM is allocated 2 CPUs and 9GB of memory, each incremental 
>has 2 CPUs and 3GB of memory. As I'm not using VMware, the memory is not 
>oversubscribed. There is a slight oversubscription of CPUs, but I've 
>never seen a CPU load problem. I've got dedicated VMs for load balancing 
>and for the brokers.
>
>With a max heap of 1.5GB, that leaves over 7GB of RAM to act as disk 
>cache for a 12GB index. My statistics show that each of my two broker 
>cores has 185000 queries under its belt, with an average query time of 
>about 185 milliseconds. If I had enough memory to fit the entire 12GB 
>index into RAM, I'm sure my query times would be MUCH smaller.
>
>Here's a screenshot of the status page that aggregates my Solr statistics:
>
>http://www.flickr.com/photos/52107...@n05/4801491979/sizes/l/
>


Re: How to speed up solr search speed

2010-07-17 Thread Shawn Heisey

 On 7/17/2010 3:28 AM, marship wrote:

Hi. Peter and All.
I merged my indexes today. Now each index stores 10M document. Now I only have 
10 solr cores.
And I used

java -Xmx1g -jar -server start.jar
to start the jetty server.


How big are the indexes on each of those cores? You can easily get this 
info from a URL like this (assuming the bundled Jetty and its standard 
port):


http://hostname:8983/solr/corename/admin/replication/index.jsp

If your server only has 4GB of RAM, low memory is almost guaranteed to 
be the true problem. With low ram levels, the disk cache is nearly 
useless, and high disk I/O is the symptom.


My system runs as virtual machines. I've got six static indexes each a 
little over 12GB in size (7 million rows) and an incremental index that 
gets to about 700MB (300,000 rows). I've only got one active index core 
per virtual machine, except when doing a full reindex, which is rare. 
Each static VM is allocated 2 CPUs and 9GB of memory, each incremental 
has 2 CPUs and 3GB of memory. As I'm not using VMware, the memory is not 
oversubscribed. There is a slight oversubscription of CPUs, but I've 
never seen a CPU load problem. I've got dedicated VMs for load balancing 
and for the brokers.


With a max heap of 1.5GB, that leaves over 7GB of RAM to act as disk 
cache for a 12GB index. My statistics show that each of my two broker 
cores has 185000 queries under its belt, with an average query time of 
about 185 milliseconds. If I had enough memory to fit the entire 12GB 
index into RAM, I'm sure my query times would be MUCH smaller.


Here's a screenshot of the status page that aggregates my Solr statistics:

http://www.flickr.com/photos/52107...@n05/4801491979/sizes/l/



RE: Get only partial match results

2010-07-17 Thread Jonathan Rochkind
> 1) While doing a dismax query, I specify the query in double quotes for
> exact match. This works fine but I don't get any partial matches in search
> result.

Rather than specify your query in quotes for 'exact' matches, I was suggesting 
configuring the analyzers differently for your fields "core1_title_exact" and 
"core1_title_partial". -- oops, except I don't think I meant analyzers, I mean 
differnet class types in solr. 

But again, it depends on what you mean by 'exact' -- do you mean it must match 
the whole string start to finish?  If so, if you make the *_exact fields in 
schema.xml use a "string" solr.StrField instead of a  "text" solr.TextField, 
then querries will only match in those fields if they are _exact_, covering the 
whole indexed string start to finish, all punctuation and spaces etc exactly 
the same. (You could use some analyzers to say lowercase, remove punctuation, 
and normalize whitespace to make it a _bit_ more forgiving). No need for 
quoting the query, it'll only match if it's exact. 

Oops, except I just realized this isn't neccesarily true, sorry, because of the 
way the dismax query parser will deal with whitespace in the query. Hmm. 

If what you mean by 'exact' is just a phrase search, then you don't need the 
seperate *_exact fields in the first place, you can just use dismax 'ps' param 
with the right boost. 

Hmm, I think for the first case where 'exact' really does mean 'exact' (not 
phrase), you might be able to combine the _exact field configured as a 
solr.StrField, with the 'ps' technique, only mention the _exact fields in the 
dismax 'ps', not the dismax 'qf'.  

I'm not completely sure any of this will work, just giving you some ideas of 
how I'd try approaching it if it were me. 

> If the frequency of search term is more in "core2_content_exact" field,
> eventhough the search term is present atleast once in the field
> "core1_content_exact" I get "core2_content_exact" as my first search result
> item.

I'm surprised this is true with such gigantic boosts, but I'm not sure what to 
do about it, sorry. Although I guess the boosts I suggested aren't that 
different from each other, they just all are multipled by 1000, which won't 
make them so different from each other. You could try making the boosts even 
more ridiculously higher. at each stage than the last, maybe powers of 10.  ^1, 
^10, ^100, ^1000, ^1.  

Jonathan


Re:Re: Re: How to speed up solr search speed

2010-07-17 Thread marship
Hi. Geert-Jan.
   Thanks for replying.
   I know solr has querycache and it improves the search speed from second 
time. Actually when I talk about the search speed. I don't mean talking about 
the speed of cache. When user search on our site, I don't want the first time 
cost 10s and all following cost 0s. These are unacceptable. So I want the first 
time to be as fast as it can. So all my test speed only count the first time.  
   For fq, yes, I need that. We have 5 different types, for general search, 
user doesn't need to specify which type he need to search over. But sometimes 
he needs to search over eg: type:product, that's the time I used "fq" and I 
believe I understand it correctly. Before I get today's speed, I was always 
testing against the simple search "design" etc, for the time before today, even 
the simple search speed is not acceptable so I doesn't care how "fq" speed will 
go. Today, as the simple search speed is acceptable. I move on to check "fq" 
and looks it sometimes is much slower than the simple search(The slower means 
it would take more than 2s, maybe 10s) .  

>The only thing that helps you here would be a big solr querycache, depending
>on how often queries are repeated.
I don't agree. I don't really care the speed of cache as I know it is always 
super fast. What I want to for solr is to consume as many memory as it can to 
pre-load the lucene index(maybe be 50% or even 100%). Then when the time comes 
it need to do the first time of a keyword. It is fast. (I haven't got the 
answer for this question.)

Thanks.
Regards.




在2010-07-17 19:30:26,"Geert-Jan Brits"  写道:
>>My query string is always simple like "design", "principle of design",
>"tom"
>>EG:
>>URL:
>http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
>
>IMO, indeed with these types of simple searches caching (and thus RAM usage)
>can not be fully exploited, i.e: there isn't really anything to cache (no
>sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr
>filtercache))
>
>The only thing that helps you here would be a big solr querycache, depending
>on how often queries are repeated.
>Just execute the same query twice, the second time you should see a fast
>response (say < 20ms) that's the querycache (and thus RAM)  working for
>you.
>
>>Now the issue I found is search with "fq" argument looks slow down the
>search.
>
>This doesn't align with your previous statement that you only use search
>with a q-param (e.g:
>http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
>)
>For your own sake, explain what you're trying to do, otherwise we really are
>guessing in the dark.
>
>Anyway the FQ-param let's you cache (using the Solr-filtercache)  individual
>documentsets that can be used to efficiently to intersect your resultset.
>Also the first time, caches should be warmed (i.e: the fq-query should be
>exectuted and results saved to cache, since there isn't anything there yet)
>. Only on the second time would you start seeing improvements.
>
>For instance:
>http://localhost:7550/solr/select/?q=design&fq=doctype:pdf&version=2.2&start=0&rows=10&indent=on
>
>would
>only show documents containing "design" when the doctype=pdf (Again this is
>just an example here where I'm just assuming that you have defined a field
>'doctype')
>since the nr of values of documenttype would be pretty low and would be used
>independently of other queries, this would be an excellent candidate for the
>FQ-param.
>
>http://wiki.apache.org/solr/CommonQueryParameters#fq
>
>This was a longer reply than I wanted to. Really think about your use-cases
>first, then present some real examples of what you want to achieve and then
>we can help you in a more useful manner.
>
>Cheers,
>Geert-Jan
>
>2010/7/17 marship 
>
>> Hi. Peter and All.
>> I merged my indexes today. Now each index stores 10M document. Now I only
>> have 10 solr cores.
>> And I used
>>
>> java -Xmx1g -jar -server start.jar
>> to start the jetty server.
>>
>> At first I deployed them all on one search. The search speed is about 3s.
>> Then I noticed from cmd output when search start, 4 of 10's QTime only cost
>> about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web
>> server, 4 on another(DB, high load most time). Then the search speed goes
>> down to about 1s most time.
>> Now most search takes about 1s. That's great.
>>
>> I watched the jetty output on cmd windows on web server, now when each
>> search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms -
>> 700ms.  I do believe the bottleneck is still the hard disk. But at least,
>> the search speed at the moment is acceptable. Maybe i should try memdisk to
>> see if that help.
>>
>>
>> And for -Xmx1g, actually I onl

jetty logging

2010-07-17 Thread Lukas Kahwe Smith
Hi,

I am following:
http://wiki.apache.org/solr/LoggingInDefaultJettySetup

All works fine except defining the logging properties files from jetty.xml
Does this approach work for anyone else?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: Get only partial match results

2010-07-17 Thread Balaji.A

Thanks Jonathan. I appreciate your reply.

Though I got few ideas for implementing my requirement, I got stuck up with
few issues. It would be more helpful if you guide me in resolving those.

As you suggested I configured single core with different fields.

For example the core contains the following fields:

core1_title_exact (type : text_ws)
core1_title_partial (type : text)
core1_content_exact (type : text_ws)
core1_content_partial (type : text)
core2_title_exact (type : text_ws)
core2_title_partial (type: text)
core2_content_exact (type : text_ws)
core2_content_partial (type: text)


Problems
***
1) While doing a dismax query, I specify the query in double quotes for
exact match. This works fine but I don't get any partial matches in search
result.

My query:
q="Ryder Cup"&qf=core1_title_exact^8000 core1_content_exact^7000
core2_title_exact^6000 core2_content_exact^5000 core1_title_partial^4000
core1_content_partial^3000 core2_title_partial^2000
core2_content_partial^1000

2) If the frequency of search term is more in "core2_content_exact" field,
eventhough the search term is present atleast once in the field
"core1_content_exact" I get "core2_content_exact" as my first search result
item. 

For example assume my search term is "Ryder Cup". And if the occurance of
Ryder Cup in core1_content_exact field is 1 and occurance of the same text
in core2_content_exact is about 15, search query is returning me
core2_content_exact as first result.

Is it something to do with term Frequency? How do I fix this problem? Even
if core1_content_exact field should be my topmost priority with the match of
atlest one search term.


Thanks,
Balaji
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Get-only-partial-match-results-tp963212p974850.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re: How to speed up solr search speed

2010-07-17 Thread Geert-Jan Brits
>My query string is always simple like "design", "principle of design",
"tom"
>EG:
>URL:
http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on

IMO, indeed with these types of simple searches caching (and thus RAM usage)
can not be fully exploited, i.e: there isn't really anything to cache (no
sort-ordering, faceting (Lucene fieldcache), no documentsets,faceting (Solr
filtercache))

The only thing that helps you here would be a big solr querycache, depending
on how often queries are repeated.
Just execute the same query twice, the second time you should see a fast
response (say < 20ms) that's the querycache (and thus RAM)  working for
you.

>Now the issue I found is search with "fq" argument looks slow down the
search.

This doesn't align with your previous statement that you only use search
with a q-param (e.g:
http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
)
For your own sake, explain what you're trying to do, otherwise we really are
guessing in the dark.

Anyway the FQ-param let's you cache (using the Solr-filtercache)  individual
documentsets that can be used to efficiently to intersect your resultset.
Also the first time, caches should be warmed (i.e: the fq-query should be
exectuted and results saved to cache, since there isn't anything there yet)
. Only on the second time would you start seeing improvements.

For instance:
http://localhost:7550/solr/select/?q=design&fq=doctype:pdf&version=2.2&start=0&rows=10&indent=on

would
only show documents containing "design" when the doctype=pdf (Again this is
just an example here where I'm just assuming that you have defined a field
'doctype')
since the nr of values of documenttype would be pretty low and would be used
independently of other queries, this would be an excellent candidate for the
FQ-param.

http://wiki.apache.org/solr/CommonQueryParameters#fq

This was a longer reply than I wanted to. Really think about your use-cases
first, then present some real examples of what you want to achieve and then
we can help you in a more useful manner.

Cheers,
Geert-Jan

2010/7/17 marship 

> Hi. Peter and All.
> I merged my indexes today. Now each index stores 10M document. Now I only
> have 10 solr cores.
> And I used
>
> java -Xmx1g -jar -server start.jar
> to start the jetty server.
>
> At first I deployed them all on one search. The search speed is about 3s.
> Then I noticed from cmd output when search start, 4 of 10's QTime only cost
> about 10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web
> server, 4 on another(DB, high load most time). Then the search speed goes
> down to about 1s most time.
> Now most search takes about 1s. That's great.
>
> I watched the jetty output on cmd windows on web server, now when each
> search start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms -
> 700ms.  I do believe the bottleneck is still the hard disk. But at least,
> the search speed at the moment is acceptable. Maybe i should try memdisk to
> see if that help.
>
>
> And for -Xmx1g, actually I only see jetty consume about 150M memory,
> consider now the index is 10x bigger. I don't think that works. I googled
> -Xmx is go enlarge the heap size. Not sure can that help search.  I still
> have 3.5G memory free on server.
>
> Now the issue I found is search with "fq" argument looks slow down the
> search.
>
> Thanks All for your help and suggestions.
> Thanks.
> Regards.
> Scott
>
>
> 在2010-07-17 03:36:19,"Peter Karich"  写道:
> >> > Each solr(jetty) instance on consume 40M-60M memory.
> >
> >> java -Xmx1024M -jar start.jar
> >
> >That's a good suggestion!
> >Please, double check that you are using the -server version of the jvm
> >and the latest 1.6.0_20 or so.
> >
> >Additionally you can start jvisualvm (shipped with the jdk) and hook
> >into jetty/tomcat easily to see the current CPU and memory load.
> >
> >> But I have 70 solr cores
> >
> >if you ask me: I would reduce them to 10-15 or even less and increase
> >the RAM.
> >try out tomcat too
> >
> >> solr distriubted search's speed is decided by the slowest one.
> >
> >so, try to reduce the cores
> >
> >Regards,
> >Peter.
> >
> >> you mentioned that you have a lot of mem free, but your yetty containers
> >> only using between 40-60 mem.
> >>
> >> probably stating the obvious, but have you increased the -Xmx param like
> for
> >> instance:
> >> java -Xmx1024M -jar start.jar
> >>
> >> that way you're configuring the container to use a maximum of 1024 MB
> ram
> >> instead of the standard which is much lower (I'm not sure what exactly
> but
> >> it could well be 64MB for non -server, aligning with what you're seeing)
> >>
> >> Geert-Jan
> >>
> >> 2010/7/16 marship 
> >>
> >>
> >>> Hi Tom Burton-West.
> >>>
> >>>  Sorry loo

Re:Re: How to speed up solr search speed

2010-07-17 Thread marship
Hi. Peter and All.
I merged my indexes today. Now each index stores 10M document. Now I only have 
10 solr cores. 
And I used 

java -Xmx1g -jar -server start.jar
to start the jetty server.

At first I deployed them all on one search. The search speed is about 3s. Then 
I noticed from cmd output when search start, 4 of 10's QTime only cost about 
10ms-500ms. The left 5 cost more, up to 2-3s. Then I put 6 on web server, 4 on 
another(DB, high load most time). Then the search speed goes down to about 1s 
most time. 
Now most search takes about 1s. That's great. 

I watched the jetty output on cmd windows on web server, now when each search 
start, I saw 2 of 6 costs 60ms-80ms. The another 4 cost 170ms - 700ms.  I do 
believe the bottleneck is still the hard disk. But at least, the search speed 
at the moment is acceptable. Maybe i should try memdisk to see if that help.


And for -Xmx1g, actually I only see jetty consume about 150M memory, consider 
now the index is 10x bigger. I don't think that works. I googled -Xmx is go 
enlarge the heap size. Not sure can that help search.  I still have 3.5G memory 
free on server. 

Now the issue I found is search with "fq" argument looks slow down the search.

Thanks All for your help and suggestions.
Thanks.
Regards.
Scott


在2010-07-17 03:36:19,"Peter Karich"  写道:
>> > Each solr(jetty) instance on consume 40M-60M memory.
>
>> java -Xmx1024M -jar start.jar
>
>That's a good suggestion!
>Please, double check that you are using the -server version of the jvm
>and the latest 1.6.0_20 or so.
>
>Additionally you can start jvisualvm (shipped with the jdk) and hook
>into jetty/tomcat easily to see the current CPU and memory load.
>
>> But I have 70 solr cores
>
>if you ask me: I would reduce them to 10-15 or even less and increase
>the RAM.
>try out tomcat too
>
>> solr distriubted search's speed is decided by the slowest one. 
>
>so, try to reduce the cores
>
>Regards,
>Peter.
>
>> you mentioned that you have a lot of mem free, but your yetty containers
>> only using between 40-60 mem.
>>
>> probably stating the obvious, but have you increased the -Xmx param like for
>> instance:
>> java -Xmx1024M -jar start.jar
>>
>> that way you're configuring the container to use a maximum of 1024 MB ram
>> instead of the standard which is much lower (I'm not sure what exactly but
>> it could well be 64MB for non -server, aligning with what you're seeing)
>>
>> Geert-Jan
>>
>> 2010/7/16 marship 
>>
>>   
>>> Hi Tom Burton-West.
>>>
>>>  Sorry looks my email ISP filtered out your replies. I checked web version
>>> of mailing list and saw your reply.
>>>
>>>  My query string is always simple like "design", "principle of design",
>>> "tom"
>>>
>>>
>>>
>>> EG:
>>>
>>> URL:
>>> http://localhost:7550/solr/select/?q=design&version=2.2&start=0&rows=10&indent=on
>>>
>>> Response:
>>>
>>> 
>>> -
>>> 
>>> 0
>>> 16
>>> -
>>> 
>>> on
>>> 0
>>> design
>>> 2.2
>>> 10
>>> 
>>> 
>>> -
>>> 
>>> -
>>> 
>>> product_208619
>>> 
>>>
>>>
>>>
>>>
>>>
>>> EG:
>>> http://localhost:7550/solr/select/?q=Principle&version=2.2&start=0&rows=10&indent=on
>>>
>>> 
>>> -
>>> 
>>> 0
>>> 94
>>> -
>>> 
>>> on
>>> 0
>>> Principle
>>> 2.2
>>> 10
>>> 
>>> 
>>> -
>>> 
>>> -
>>> 
>>> product_56926
>>> 
>>>
>>>
>>>
>>> As I am querying over single core and other cores are not querying at same
>>> time. The QTime looks good.
>>>
>>> But when I query the distributed node: (For this case, 6422ms is still a
>>> not bad one. Many cost ~20s)
>>>
>>> URL:
>>> http://localhost:7499/solr/select/?q=the+first+world+war&version=2.2&start=0&rows=10&indent=on&debugQuery=true
>>>
>>> Response:
>>>
>>> 
>>> -
>>> 
>>> 0
>>> 6422
>>> -
>>> 
>>> true
>>> on
>>> 0
>>> the first world war
>>> 2.2
>>> 10
>>> 
>>> 
>>> -
>>> 
>>>
>>>
>>>
>>> Actually I am thinking and testing a solution: As I believe the bottleneck
>>> is in harddisk and all our indexes add up is about 10-15G. What about I just
>>> add another 16G memory to my server then use "MemDisk" to map a memory disk
>>> and put all my indexes into it. Then each time, solr/jetty need to load
>>> index from harddisk, it is loading from memory. This should give solr the
>>> most throughout and avoid the harddisk access delay. I am testing 
>>>
>>> But if there are way to make solr use better use our limited resource to
>>> avoid adding new ones. that would be great.
>>>
>>>
>>>
>>>
>>>
>>>
>>> 
>>   
>
>
>-- 
>http://karussell.wordpress.com/
>