Re: SEVERE: Could not start SOLR. Check solr/home property

2010-04-26 Thread Siddhant Goel
Did you by any chance set up multicore? Try passing in the path to the Solr
home directory as -Dsolr.solr.home=/path/to/solr/home while you start Solr.

On Mon, Apr 26, 2010 at 1:04 PM, Jon Drukman  wrote:

> What does this error mean?
>
> SEVERE: Could not start SOLR. Check solr/home property
>
> I've had this solr installation working before, but I haven't looked at it
> in a few months.  I checked it today and the web side is returning a 500
> error, the log file shows this when starting up:
>
>
> SEVERE: Could not start SOLR. Check solr/home property
>  java.lang.RuntimeException: java.io.IOException: read past EOF
>   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
>   at org.apache.solr.core.SolrCore.(SolrCore.java:579)
>   at
> org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
>   at
> org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
>   at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
>   at
> org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
>   at
> org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
>   at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
>   at
> org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
>
> For the record, I've never explictly set "solr/home" ever.  It always "just
> worked".
>
> -jsd-
>
>


-- 
- Siddhant


Re: What hardware do I need ?

2010-04-24 Thread Siddhant Goel
If its worth mentioning here, in my case the disk read speeds seemed to have
a really noticeable effect on the query times. What disks are you planning
on using? Also, as Otis has already pointed out, I doubt if a single box of
that capacity can handle 100-700 queries per second.

On Fri, Apr 23, 2010 at 1:32 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> Xavier,
>
> 100-700 QPS is still high.  I'm guessing your 1 box won't handle that
> without sweating a lot (read: slow queries).
>  Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>
>
>
> - Original Message 
> > From: Xavier Schepler 
> > To: solr-user@lucene.apache.org
> > Sent: Fri, April 23, 2010 11:53:23 AM
> > Subject: Re: What hardware do I need ?
> >
> > Le 23/04/2010 17:08, Otis Gospodnetic a écrit :
> > Xavier,
> >
> >
> > 0-1000 QPS is a pretty wide range.  Plus, it depends on how good your
> > auto-complete is, which depends on types of queries it issues, among
> other
> > things.
> > 100K short docs is small, so that will all fit in RAM nicely,
> > assuming those other processes leave enough RAM for the OS to cache the
> > index.
> >
> >   That said, you do need more than 1 box if you want
> > your auto-complete more fault tolerant.
> >
> > Otis
> >
> > 
> > Sematext ::
> > >http://sematext.com/ :: Solr - Lucene - Nutch
> > Lucene ecosystem
> > search ::
> > >http://search-lucene.com/
> >
> >
> >
> > - Original
> > Message 
> >
> >> From: Xavier Schepler<
> > ymailto="mailto:xavier.schep...@sciences-po.fr";
> > href="mailto:xavier.schep...@sciences-po.fr";>
> xavier.schep...@sciences-po.fr>
> >>
> > To:
> > href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
> >>
> > Sent: Fri, April 23, 2010 11:01:24 AM
> >> Subject: What hardware do I
> > need ?
> >>
> >> Hi,
> >>
> > I'm
> > working with Solr 1.4.
> > My schema has about 50 fields.
> >
> > I'm
> >
> >> using full text search in short strings (~
> > 30-100 terms) and facetted
> >> search.
> >>
> >
> > My index will have 100 000 documents.
> >
> > The number of
> > requests
> >
> >> per second will be low. Let's say
> > between 0 and 1000 because of
> >> auto-complete.
> >>
> >
> > Is a standard server (3ghz proc, 4gb ram) with the
> > client
> >
> >> application (apache + php5 + ZF + apc)
> > and Tomcat + Solr enough ???
> >>
> > Do I
> > need
> >
> >> more hardware ?
> >>
> >
> > Thanks in advance,
> >
> > Xavier S.
> >
> >
> Well my auto-complete is built on the facet prefix search
> > component.
> I think that 100-700 requests per seconds is maybe a better
> > approximation.
>



-- 
- Siddhant


Re: exclude words?

2010-03-31 Thread Siddhant Goel
I think you can use something like "q=hello world -books". Should do.

On Wed, Mar 31, 2010 at 7:34 PM, Sebastian Funk
wrote:

> Hey there,
>
> I'm sure this easy a pretty easy thing, but I can't find the solution:
> can I search for a text with one word (e.g. "books") especially not in it?
> so solr returns all documents, that don't have "books" somewhere in them?
>
> thanks for the help,
> sebastian
>



-- 
- Siddhant


Re: jmap output help

2010-03-29 Thread Siddhant Goel
Gentle bounce

On Sun, Mar 28, 2010 at 11:31 AM, Siddhant Goel wrote:

> Hi everyone,
>
> The output of "jmap -histo:live 27959 | head -30" is something like the
> following :
>
> num #instances #bytes  class name
> --
>1:448441  180299464  [C
>2:  5311  135734480  [I
>3:  3623   68389720  [B
>4:445669   17826760  java.lang.String
>5:391739   15669560  org.apache.lucene.index.TermInfo
>6:417442   13358144  org.apache.lucene.index.Term
>7: 587675171496
>  org.apache.lucene.index.FieldsReader$LazyField
>8: 329025049760  
>9: 329023955920  
>   10:  28433512688  
>   11:  23973128048  [Lorg.apache.lucene.index.Term;
>   12:353053592  [J
>   13: 33044288  [Lorg.apache.lucene.index.TermInfo;
>   14: 556712707536  
>   15: 272822701352  [Ljava.lang.Object;
>   16:  28432212384  
>   17:  23432132224  
>   18: 264241056960  java.util.ArrayList
>   19: 164231051072  java.util.LinkedHashMap$Entry
>   20:  20391028944  
>   21: 14336 917504  org.apache.lucene.document.Field
>   22: 29587 710088  java.lang.Integer
>   23:  3171 583464  java.lang.Class
>   24:   813 492880  [Ljava.util.HashMap$Entry;
>   25:  8471 474376  org.apache.lucene.search.PhraseQuery
>   26:  4184 402848  [[I
>   27:  4277 380704  [S
>
> Is it ok to assume that the top 3 entries (character/integer/byte arrays)
> are referring to the entries inside the solr cache?
>
> Thanks,
>
>
> --
> - Siddhant
>



-- 
- Siddhant


jmap output help

2010-03-27 Thread Siddhant Goel
Hi everyone,

The output of "jmap -histo:live 27959 | head -30" is something like the
following :

num #instances #bytes  class name
--
   1:448441  180299464  [C
   2:  5311  135734480  [I
   3:  3623   68389720  [B
   4:445669   17826760  java.lang.String
   5:391739   15669560  org.apache.lucene.index.TermInfo
   6:417442   13358144  org.apache.lucene.index.Term
   7: 587675171496
 org.apache.lucene.index.FieldsReader$LazyField
   8: 329025049760  
   9: 329023955920  
  10:  28433512688  
  11:  23973128048  [Lorg.apache.lucene.index.Term;
  12:353053592  [J
  13: 33044288  [Lorg.apache.lucene.index.TermInfo;
  14: 556712707536  
  15: 272822701352  [Ljava.lang.Object;
  16:  28432212384  
  17:  23432132224  
  18: 264241056960  java.util.ArrayList
  19: 164231051072  java.util.LinkedHashMap$Entry
  20:  20391028944  
  21: 14336 917504  org.apache.lucene.document.Field
  22: 29587 710088  java.lang.Integer
  23:  3171 583464  java.lang.Class
  24:   813 492880  [Ljava.util.HashMap$Entry;
  25:  8471 474376  org.apache.lucene.search.PhraseQuery
  26:  4184 402848  [[I
  27:  4277 380704  [S

Is it ok to assume that the top 3 entries (character/integer/byte arrays)
are referring to the entries inside the solr cache?

Thanks,


-- 
- Siddhant


Re: Solr Performance Issues

2010-03-17 Thread Siddhant Goel
Hi,

Apparently the bottleneck seem to be the time periods when CPU is waiting to
do some I/O. Out of all the numbers I can see, the CPU wait times for I/O
seem to be the highest. I've alloted 4GB to Solr out of the total 8GB
available. There's only 47MB free on the machine, so I assume the rest of
the memory is being used for OS disk caches. In addition, the hit ratios for
queryResultCache isn't going beyond 20%. So the problem I think is not at
Solr's end. Are there any pointers available on how can I resolve such
issues related to disk I/O? Does this mean I need more overall memory? Or
reducing the amount of memory allocated to Solr so that the disk cache has
more memory, would help?

Thanks,

On Fri, Mar 12, 2010 at 11:21 PM, Erick Erickson wrote:

> Sounds like you're pretty well on your way then. This is pretty typical
> of multi-threaded situations... Threads 1-n wait around on I/O and
> increasing the number of threads increases throughput without
> changing (much) the individual response time.
>
> Threads n+1 - p don't change throughput much, but increase
> the response time for each request. On aggregate, though, the
> throughput doesn't change (much).
>
> Adding threads after p+1 *decreases* throughput while
> *increasing* individual response time as your processors start
> spending w to much time context and/or memory
> swapping.
>
> The trick is finding out what n and p are .
>
> Best
> Erick
>
> On Fri, Mar 12, 2010 at 12:06 PM, Siddhant Goel  >wrote:
>
> > Hi,
> >
> > Thanks for your responses. It actually feels good to be able to locate
> > where
> > the bottlenecks are.
> >
> > I've created two sets of data - in the first one I'm measuring the time
> > took
> > purely on Solr's end, and in the other one I'm including network latency
> > (just for reference). The data that I'm posting below contains the time
> > took
> > purely by Solr.
> >
> > I'm running 10 threads simultaneously and the average response time (for
> > each query in each thread) remains close to 40 to 50 ms. But as soon as I
> > increase the number of threads to something like 100, the response time
> > goes
> > up to ~600ms, and further up when the number of threads is close to 500.
> > Yes
> > the average time definitely depends on the number of concurrent requests.
> >
> > Going from memory, debugQuery=on will let you know how much time
> > > was spent in various operations in SOLR. It's important to know
> > > whether it was the searching, assembling the response, or
> > > transmitting the data back to the client.
> >
> >
> > I just tried this. The information that it gives me for a query that took
> > 7165ms is - http://pastebin.ca/1835644
> >
> > So out of the total time 7165ms, QueryComponent took most of the time.
> Plus
> > I can see the load average going up when the number of threads is really
> > high. So it actually makes sense. (I didn't add any other component while
> > searching; it was a plain /select?q=query call).
> > Like I mentioned earlier in this mail, I'm maintaining separate sets for
> > data with/without network latency, and I don't think its the bottleneck.
> >
> >
> > > How many threads does it take to peg the CPU? And what
> > > response times are you getting when your number of threads is
> > > around 10?
> > >
> >
> > If the number of threads is greater than 100, that really takes its toll
> on
> > the CPU. So probably thats the number.
> >
> > When the number of threads is around 10, the response times average to
> > something like 60ms (and 95% of the queries fall within 100ms of that
> > value).
> >
> > Thanks,
> >
> >
> >
> >
> > >
> > > Erick
> > >
> > > On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel  > > >wrote:
> > >
> > > > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
> > > disk
> > > > caching.
> > > >
> > > > I think that at any point of time, there can be a maximum of  of
> > > > threads> concurrent requests, which happens to make sense btw (does
> > it?).
> > > >
> > > > As I increase the number of threads, the load average shown by top
> goes
> > > up
> > > > to as high as 80%. But if I keep the number of threads low (~10), the
> > > load
> > > > average never goes beyond ~8). So probably thats the number of
> requests
> > I
> >

Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
Hi,

Thanks for your responses. It actually feels good to be able to locate where
the bottlenecks are.

I've created two sets of data - in the first one I'm measuring the time took
purely on Solr's end, and in the other one I'm including network latency
(just for reference). The data that I'm posting below contains the time took
purely by Solr.

I'm running 10 threads simultaneously and the average response time (for
each query in each thread) remains close to 40 to 50 ms. But as soon as I
increase the number of threads to something like 100, the response time goes
up to ~600ms, and further up when the number of threads is close to 500. Yes
the average time definitely depends on the number of concurrent requests.

Going from memory, debugQuery=on will let you know how much time
> was spent in various operations in SOLR. It's important to know
> whether it was the searching, assembling the response, or
> transmitting the data back to the client.


I just tried this. The information that it gives me for a query that took
7165ms is - http://pastebin.ca/1835644

So out of the total time 7165ms, QueryComponent took most of the time. Plus
I can see the load average going up when the number of threads is really
high. So it actually makes sense. (I didn't add any other component while
searching; it was a plain /select?q=query call).
Like I mentioned earlier in this mail, I'm maintaining separate sets for
data with/without network latency, and I don't think its the bottleneck.


> How many threads does it take to peg the CPU? And what
> response times are you getting when your number of threads is
> around 10?
>

If the number of threads is greater than 100, that really takes its toll on
the CPU. So probably thats the number.

When the number of threads is around 10, the response times average to
something like 60ms (and 95% of the queries fall within 100ms of that
value).

Thanks,




>
> Erick
>
> On Fri, Mar 12, 2010 at 3:39 AM, Siddhant Goel  >wrote:
>
> > I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS
> disk
> > caching.
> >
> > I think that at any point of time, there can be a maximum of  > threads> concurrent requests, which happens to make sense btw (does it?).
> >
> > As I increase the number of threads, the load average shown by top goes
> up
> > to as high as 80%. But if I keep the number of threads low (~10), the
> load
> > average never goes beyond ~8). So probably thats the number of requests I
> > can expect Solr to serve concurrently on this index size with this
> > hardware.
> >
> > Can anyone give a general opinion as to how much hardware should be
> > sufficient for a Solr deployment with an index size of ~43GB, containing
> > around 2.5 million documents? I'm expecting it to serve at least 20
> > requests
> > per second. Any experiences?
> >
> > Thanks
> >
> > On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West  > >wrote:
> >
> > >
> > > How much of your memory are you allocating to the JVM and how much are
> > you
> > > leaving free?
> > >
> > > If you don't leave enough free memory for the OS, the OS won't have a
> > large
> > > enough disk cache, and you will be hitting the disk for lots of
> queries.
> > >
> > > You might want to monitor your Disk I/O using iostat and look at the
> > > iowait.
> > >
> > > If you are doing phrase queries and your *prx file is significantly
> > larger
> > > than the available memory then when a slow phrase query hits Solr, the
> > > contention for disk I/O with other queries could be slowing everything
> > > down.
> > > You might also want to look at the 90th and 99th percentile query times
> > in
> > > addition to the average. For our large indexes, we found at least an
> > order
> > > of magnitude difference between the average and 99th percentile
> queries.
> > > Again, if Solr gets hit with a few of those 99th percentile slow
> queries
> > > and
> > > your not hitting your caches, chances are you will see serious
> contention
> > > for disk I/O..
> > >
> > > Of course if you don't see any waiting on i/o, then your bottleneck is
> > > probably somewhere else:)
> > >
> > > See
> > >
> > >
> >
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
> > > for more background on our experience.
> > >
> > > Tom Burton-West
> > > University of Michigan Library
> > > www.hathitrust.org
> > >
> &g

Re: Solr Performance Issues

2010-03-12 Thread Siddhant Goel
I've allocated 4GB to Solr, so the rest of the 4GB is free for the OS disk
caching.

I think that at any point of time, there can be a maximum of  concurrent requests, which happens to make sense btw (does it?).

As I increase the number of threads, the load average shown by top goes up
to as high as 80%. But if I keep the number of threads low (~10), the load
average never goes beyond ~8). So probably thats the number of requests I
can expect Solr to serve concurrently on this index size with this hardware.

Can anyone give a general opinion as to how much hardware should be
sufficient for a Solr deployment with an index size of ~43GB, containing
around 2.5 million documents? I'm expecting it to serve at least 20 requests
per second. Any experiences?

Thanks

On Fri, Mar 12, 2010 at 12:47 AM, Tom Burton-West wrote:

>
> How much of your memory are you allocating to the JVM and how much are you
> leaving free?
>
> If you don't leave enough free memory for the OS, the OS won't have a large
> enough disk cache, and you will be hitting the disk for lots of queries.
>
> You might want to monitor your Disk I/O using iostat and look at the
> iowait.
>
> If you are doing phrase queries and your *prx file is significantly larger
> than the available memory then when a slow phrase query hits Solr, the
> contention for disk I/O with other queries could be slowing everything
> down.
> You might also want to look at the 90th and 99th percentile query times in
> addition to the average. For our large indexes, we found at least an order
> of magnitude difference between the average and 99th percentile queries.
> Again, if Solr gets hit with a few of those 99th percentile slow queries
> and
> your not hitting your caches, chances are you will see serious contention
> for disk I/O..
>
> Of course if you don't see any waiting on i/o, then your bottleneck is
> probably somewhere else:)
>
> See
>
> http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1
> for more background on our experience.
>
> Tom Burton-West
> University of Michigan Library
> www.hathitrust.org
>
>
>
> >
> > On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel  > >wrote:
> >
> > > Hi everyone,
> > >
> > > I have an index corresponding to ~2.5 million documents. The index size
> > is
> > > 43GB. The configuration of the machine which is running Solr is - Dual
> > > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache,
> > 8GB
> > > RAM, and 250 GB HDD.
> > >
> > > I'm observing a strange trend in the queries that I send to Solr. The
> > query
> > > times for queries that I send earlier is much lesser than the queries I
> > > send
> > > afterwards. For instance, if I write a script to query solr 5000 times
> > > (with
> > > 5000 distinct queries, most of them containing not more than 3-5 words)
> > > with
> > > 10 threads running in parallel, the average times for queries goes from
> > > ~50ms in the beginning to ~6000ms. Is this expected or is there
> > something
> > > wrong with my configuration. Currently I've configured the
> > queryResultCache
> > > and the documentCache to contain 2048 entries (hit ratios for both is
> > close
> > > to 50%).
> > >
> > > Apart from this, a general question that I want to ask is that is such
> a
> > > hardware enough for this scenario? I'm aiming at achieving around 20
> > > queries
> > > per second with the hardware mentioned above.
> > >
> > > Thanks,
> > >
> > > Regards,
> > >
> > > --
> > > - Siddhant
> > >
> >
>
>
>
> --
> - Siddhant
>
>
>
> --
> View this message in context:
> http://old.nabble.com/Solr-Performance-Issues-tp27864278p27868456.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant


Re: field length normalization

2010-03-11 Thread Siddhant Goel
Did you reindex after setting omitNorms to false? I'm not sure whether or
not it is needed, but it makes sense.

On Thu, Mar 11, 2010 at 5:34 PM, muneeb  wrote:

>
> Hi,
>
> In my schema, the document title field has "omitNorms=false", which, if I
> am
> not wrong, causes length of titles to be counted in the scoring.
>
> But when I query with: "word1 word2 word3" I dont know why still the top
> two
> documents title have these words and other words, where as the document
> which has exact and only these query words is coming on third place.
>
> Setting omitNorms to false, should bring the titles with exact words on top
> shouldn't it?
>
> Also I realized when debugged query, that all three top documents have same
> score, shouldn't this be different as they have different title lengths?
>
> Thanks very much.
> -A
> --
> View this message in context:
> http://old.nabble.com/field-length-normalization-tp27862618p27862618.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant


Re: Solr Performance Issues

2010-03-11 Thread Siddhant Goel
Hi Erick,

The way the load test works is that it picks up 5000 queries, splits them
according to the number of threads (so if we have 10 threads, it schedules
10 threads - each one sending 500 queries). So it might be possible that the
number of queries at a point later in time is greater than the number of
queries earlier in time. I'm not very sure about that though. Its a simple
Ruby script that starts up threads, calls the search function in each
thread, and then waits for each of them to exit.

How many queries per second can we expect Solr to serve, given this kind of
hardware? If what you suggest is true, then is it possible that while Solr
is serving a query, another query hits it, which increases the response time
even further? I'm not sure about it. But yes I can observe the query times
going up as I increase the number of threads.

Thanks,

Regards,

On Thu, Mar 11, 2010 at 8:30 PM, Erick Erickson wrote:

> How many outstanding queries do you have at a time? Is it possible
> that when you start, you have only a few queries executing concurrently
> but as your test runs you have hundreds?
>
> This really is a question of how your load test is structured. You might
> get a better sense of how it works if your tester had a limited number
> of threads running so the max concurrent requests SOLR was serving
> at once were capped (30, 50, whatever).
>
> But no, I wouldn't expect SOLR to bog down the way you're describing
> just because it was running for a while.
>
> HTH
> Erick
>
> On Thu, Mar 11, 2010 at 9:39 AM, Siddhant Goel  >wrote:
>
> > Hi everyone,
> >
> > I have an index corresponding to ~2.5 million documents. The index size
> is
> > 43GB. The configuration of the machine which is running Solr is - Dual
> > Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache,
> 8GB
> > RAM, and 250 GB HDD.
> >
> > I'm observing a strange trend in the queries that I send to Solr. The
> query
> > times for queries that I send earlier is much lesser than the queries I
> > send
> > afterwards. For instance, if I write a script to query solr 5000 times
> > (with
> > 5000 distinct queries, most of them containing not more than 3-5 words)
> > with
> > 10 threads running in parallel, the average times for queries goes from
> > ~50ms in the beginning to ~6000ms. Is this expected or is there something
> > wrong with my configuration. Currently I've configured the
> queryResultCache
> > and the documentCache to contain 2048 entries (hit ratios for both is
> close
> > to 50%).
> >
> > Apart from this, a general question that I want to ask is that is such a
> > hardware enough for this scenario? I'm aiming at achieving around 20
> > queries
> > per second with the hardware mentioned above.
> >
> > Thanks,
> >
> > Regards,
> >
> > --
> > - Siddhant
> >
>



-- 
- Siddhant


Solr Performance Issues

2010-03-11 Thread Siddhant Goel
Hi everyone,

I have an index corresponding to ~2.5 million documents. The index size is
43GB. The configuration of the machine which is running Solr is - Dual
Processor Quad Core Xeon 5430 - 2.66GHz (Harpertown) - 2 x 12MB cache, 8GB
RAM, and 250 GB HDD.

I'm observing a strange trend in the queries that I send to Solr. The query
times for queries that I send earlier is much lesser than the queries I send
afterwards. For instance, if I write a script to query solr 5000 times (with
5000 distinct queries, most of them containing not more than 3-5 words) with
10 threads running in parallel, the average times for queries goes from
~50ms in the beginning to ~6000ms. Is this expected or is there something
wrong with my configuration. Currently I've configured the queryResultCache
and the documentCache to contain 2048 entries (hit ratios for both is close
to 50%).

Apart from this, a general question that I want to ask is that is such a
hardware enough for this scenario? I'm aiming at achieving around 20 queries
per second with the hardware mentioned above.

Thanks,

Regards,

-- 
- Siddhant


Re: Question about fieldNorms

2010-03-08 Thread Siddhant Goel
Wonderful! That explains it. Thanks a lot!

Regards,

On Mon, Mar 8, 2010 at 6:39 AM, Jay Hill  wrote:

> Yes, if omitNorms=true, then no lengthNorm calculation will be done, and
> the
> fieldNorm value will be 1.0, and lengths of the field in question will not
> be a factor in the score.
>
> To see an example of this you can do a quick test. Add two "text" fields,
> and on one omitNorms:
>
>   
>omitNorms="true"/>
>
> Index a doc with the same value for both fields:
>  1 2 3 4 5
>  1 2 3 4 5
>
> Set &debugQuery=true and do two queries: &q=foo:5   &q=bar:5
>
> in the "explain" section of the debug output note that the fieldNorm value
> for the "foo" query is this:
>
>0.4375 = fieldNorm(field=foo, doc=1)
>
> and the value for the "bar" query is this:
>
>1.0 = fieldNorm(field=bar, doc=1)
>
> A simplified description of how the fieldNorm value is: fieldNorm =
> lengthNorm * documentBoost * documentFieldBoosts
>
> and the lengthNorm is calculated like this: lengthNorm  =
> 1/(numTermsInField)**.5
> [note that the value is encoded as a single byte, so there is some
> precision
> loss]
>
> When omitNorms=true no norm calculation is done, so fieldNorm will always
> be
> one on those fields.
>
> You can also use the Luke utility to view the document in the index, and it
> will show that there is a norm value for the foo field, but not the bar
> field.
>
> -Jay
> http://www.lucidimagination.com
>
>
> On Sun, Mar 7, 2010 at 5:55 AM, Siddhant Goel  >wrote:
>
> > Hi everyone,
> >
> > Is the fieldNorm calculation altered by the omitNorms factor? I saw on
> this
> > page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html)
> the
> > formula for calculation of fieldNorms (fieldNorm =
> > fieldBoost/sqrt(numTermsForField)).
> >
> > Does this mean that for a document containing a string like "A B C D E"
> in
> > its field, its fieldNorm would be boost/sqrt(5), and for another document
> > containing the string "A B C" in the same field, its fieldNorm would be
> > boost/sqrt(3). Is that correct?
> >
> > If yes, then is *this* what omitNorms affects?
> >
> > Thanks,
> >
> > --
> > - Siddhant
> >
>



-- 
- Siddhant


Question about fieldNorms

2010-03-07 Thread Siddhant Goel
Hi everyone,

Is the fieldNorm calculation altered by the omitNorms factor? I saw on this
page (http://old.nabble.com/Question-about-fieldNorm-td17782701.html) the
formula for calculation of fieldNorms (fieldNorm =
fieldBoost/sqrt(numTermsForField)).

Does this mean that for a document containing a string like "A B C D E" in
its field, its fieldNorm would be boost/sqrt(5), and for another document
containing the string "A B C" in the same field, its fieldNorm would be
boost/sqrt(3). Is that correct?

If yes, then is *this* what omitNorms affects?

Thanks,

-- 
- Siddhant


Re: Free Webinar: Mastering Solr 1.4 with Yonik Seeley

2010-03-07 Thread Siddhant Goel
Now that I missed attending it, where can I view it? :-)

Thanks

On Fri, Feb 26, 2010 at 10:11 PM, Jay Hill  wrote:

> Yes, it will be recorded and available to view after the presentation.
>
> -Jay
>
>
> On Thu, Feb 25, 2010 at 2:19 PM, Bernadette Houghton <
> bernadette.hough...@deakin.edu.au> wrote:
>
> > Yonk, can you please advise whether this event will be recorded and
> > available for later download? (It starts 5am our time ;-)  )
> >
> > Regards
> > Bern
> >
> > -Original Message-
> > From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
> > Seeley
> > Sent: Thursday, 25 February 2010 10:23 AM
> > To: solr-user@lucene.apache.org
> > Subject: Free Webinar: Mastering Solr 1.4 with Yonik Seeley
> >
> > I'd like to invite you to join me for an in-depth review of Solr's
> > powerful, versatile new features and functions. The free webinar,
> > sponsored by my company, Lucid Imagination, covers an intensive
> > "how-to" for the features you need to make the most of Solr for your
> > search application:
> >
> >* Faceting deep dive, from document fields to performance management
> >* Best practices for sharding, index partitioning and scaling
> >* How to construct efficient Range Queries and function queries
> >* Sneak preview: Solr 1.5 roadmap
> >
> > Join us for a free webinar
> > Thursday, March 4, 2010
> > 10:00 AM PST / 1:00 PM EST / 18:00 GMT
> > Follow this link to sign up
> >
> > http://www.eventsvc.com/lucidimagination/030410?trk=WR-MAR2010-AP
> >
> > Thanks,
> >
> > -Yonik
> > http://www.lucidimagination.com
> >
>



-- 
- Siddhant


Re: multiCore

2010-03-05 Thread Siddhant Goel
Can you provide the error message that you got?

On Sat, Mar 6, 2010 at 11:13 AM, Suram  wrote:

>
> Hi,
>
>
>  how can i send the xml file to solr after created the multicore.i tried it
> refuse accept
> --
> View this message in context:
> http://old.nabble.com/multiCore-tp27802043p27802043.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant


Re: field not found for search

2010-03-04 Thread Siddhant Goel
Did you send a commit after indexing those files?

On Thu, Mar 4, 2010 at 6:30 PM, Suram  wrote:

>
> Hi,
>
>I newly Indexed some xml files, it was not found for search and
> autosuggestion
>
> My xml Index file http://old.nabble.com/file/p27780413/Nike.xmlNike.xml
>
> and my scheme is  http://old.nabble.com/file/p27780413/schema.xmlschema.xml
>
> how can i achive this.
> --
> View this message in context:
> http://old.nabble.com/field-not-found-for-search-tp27780413p27780413.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant


Re: Indexing HTML document

2010-03-02 Thread Siddhant Goel
There is an HTML filter documented here, which might be of some help -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.HTMLStripCharFilterFactory

Control characters can be eliminated using code like this -
http://bitbucket.org/cogtree/python-solr/src/tip/pythonsolr/pysolr.py#cl-449

On Tue, Mar 2, 2010 at 9:37 PM, György Frivolt wrote:

> Hi, How to index properly HTML documents? All the documents are HTML, some
> containing charaters encodid like ží ... Is there a character
> filter for filtering these codes? Is there a way to strip the HTML tags
> out?
> Does solr weight the terms in the document based on where they appear?..
> words in headers (H1, H2,..) would be supposed to describe the document
> more
> then words in paragraphs.
>
> Thanks for help,
>
>   Georg
>



-- 
- Siddhant


Re: fieldType "text"

2010-03-02 Thread Siddhant Goel
I think that's because of the internal tokenization that Solr does. If a
document contains HP1, and you're using the default text field type, Solr
would tokenize that to HP and 1, so that document figures in the list of
documents containing HP, and hence that documents appears in the search
results for HP. Creating a separate text field which does not tokenize like
that might be what you want.

The various filter/tokenizer types are listed here -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

On Tue, Mar 2, 2010 at 6:07 PM, Frederico Azeiteiro <
frederico.azeite...@cision.com> wrote:

> Hi,
>
> I'm using the default "text"  field type that comes with the example.
>
>
>
> When searching for simple words as 'HP' or 'TCS' solr is returning
> results that contains 'HP1' or 'T&CS'
>
> Is there a solution for to avoid this?
>
>
>
> Thanks,
>
> Frederico
>
>


-- 
- Siddhant


Re: updating particular field

2010-03-01 Thread Siddhant Goel
Yep. I think updation in Lucene means first a deletion, and then an
addition. So the entire document needs to be sent to update.

On Mon, Mar 1, 2010 at 7:24 PM, Israel Ekpo  wrote:

> Unfortunately, because of how Lucene works internally, you will not be able
> to update just one or two fields. You have to resubmit the entire document.
>
> If you only send just one or two fields, then the updated document will
> only
> have the fields sent in the last update.
>
> On Mon, Mar 1, 2010 at 7:09 AM, Suram  wrote:
>
> >
> >
> >
> > Siddhant wrote:
> > >
> > > Yes. You can just re-add the document with your changes, and the rest
> of
> > > the
> > > fields in the document will remain unchanged.
> > >
> > > On Mon, Mar 1, 2010 at 5:09 PM, Suram  wrote:
> > >
> > >>
> > >> Hi,
> > >>
> > >> 
> > >>   EN7800GTX/2DHTV/256M
> > >>  ASUS Computer Inc.
> > >>  electronics
> > >>  graphics card
> > >>  NVIDIA GeForce 7800 GTX GPU/VPU clocked at
> > >> 486MHz
> > >>  256MB GDDR3 Memory clocked at 1.35GHz
> > >>  479.95
> > >>  7
> > >>  false
> > >>  2006-02-13T15:26:37Z/DAY
> > >> 
> > >>
> > >> can i possible to update true without
> > >> affect
> > >> any field of my previous document
> > >>
> > >> Thanks in advance
> > >> --
> > >> View this message in context:
> > >>
> > http://old.nabble.com/updating-particular-field-tp27742399p27742399.html
> > >> Sent from the Solr - User mailing list archive at Nabble.com.
> > >>
> > >>
> > >
> > >
> > > --
> > > - Siddhant
> > >
> > >
> >
> >
> > Hi,
> >   Here i don't want to reload entire data just i want u update a
> field
> > i need to change(ie one or more field with id not whole)
> >
> >
> > --
> > View this message in context:
> > http://old.nabble.com/updating-particular-field-tp27742399p27742671.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
>
>
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>



-- 
- Siddhant


Re: updating particular field

2010-03-01 Thread Siddhant Goel
Yes. You can just re-add the document with your changes, and the rest of the
fields in the document will remain unchanged.

On Mon, Mar 1, 2010 at 5:09 PM, Suram  wrote:

>
> Hi,
>
> 
>   EN7800GTX/2DHTV/256M
>  ASUS Computer Inc.
>  electronics
>  graphics card
>  NVIDIA GeForce 7800 GTX GPU/VPU clocked at
> 486MHz
>  256MB GDDR3 Memory clocked at 1.35GHz
>  479.95
>  7
>  false
>  2006-02-13T15:26:37Z/DAY
> 
>
> can i possible to update true without affect
> any field of my previous document
>
> Thanks in advance
> --
> View this message in context:
> http://old.nabble.com/updating-particular-field-tp27742399p27742399.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant


Re: CoreAdmin

2010-02-25 Thread Siddhant Goel
Hi,

Did you *really* go through this page -
http://wiki.apache.org/solr/CoreAdmin ?

On Thu, Feb 25, 2010 at 7:40 PM, Sudhakar_Thangavel
wrote:

>
> Hi,
>Am new to Solr .Am not getting clearly in wiki..can any one tell me
> how to configure coreAdmin i need step by step instruction..
>
>
>
> --
> View this message in context:
> http://old.nabble.com/CoreAdmin-tp27714440p27714440.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
- Siddhant


Re: Ruby client fails to build

2010-01-20 Thread Siddhant Goel
On Wed, Jan 20, 2010 at 4:19 PM, Erik Hatcher wrote:

> Where are you getting your solr-ruby code from?  You can simply "gem
> install" it to pull in an already pre-built gem.
>

I'm just picking it up from the 1.4 release. I also tried checking out the
latest copy from svn, but the results were the same.

So I just figured out I was using the pre built gem the wrong way. Its
working fine here. Is there any documentation that you could point me to?
Right now I'm just figuring out how to use it on a hit and trial basis, and
random googling. The wiki page doesn't tell me much about all the search
options supported.

Thanks,

-- 
- Siddhant


Ruby client fails to build

2010-01-20 Thread Siddhant Goel
Hi,

I'm using Solr 1.4 (and trying to use the Ruby client (solr-ruby) to access
it). The problem is that I just cant get it to work. :-)

If I run the tests (rake test), it fails giving me the following output -
/path/to/solr-ruby/test/unit/delete_test.rb:52: invalid multibyte char
(US-ASCII)
/path/to/solr-ruby/test/unit/delete_test.rb:52: syntax error, unexpected
$end, expecting ')'
request = Solr::Request::Delete.new(:query => 'ëäïöü')
 ^
from
/home/mango/.gem/ruby/1.9.1/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in
`block in '
from
/home/mango/.gem/ruby/1.9.1/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in
`each'
from
/home/mango/.gem/ruby/1.9.1/gems/rake-0.8.7/lib/rake/rake_test_loader.rb:5:in
`'
rake aborted!
Command failed with status (1): [/usr/bin/ruby -I"lib" -r solr -r
test/unit...]


And If I try to build the gem anyway, it fails giving me the following error
(after quite a few lines of output) -
rake aborted!
private method `rm_f' called for File:Class
/path/to/solr-ruby/Rakefile:79:in `block (2 levels) in '


Could anyone please tell me what am I missing here?

Thanks,


-- 
- Siddhant


Re: Queries of type field:value not functioning

2010-01-13 Thread Siddhant Goel
Hi,

Thanks for the responses.
q.alt did the job. Turns out that the dismax query parser was at fault, and
wasn't able to handle queries of the type *:*. Putting the query in q.alt,
or adding a defType=lucene (as pointed out to me on the irc channel) worked.

Thanks,


-- 
- Siddhant


Queries of type field:value not functioning

2010-01-13 Thread Siddhant Goel
Hi all,

Any query I make which is of type field:value does not return any documents.
Same is the case for the *:* query. The *:* query doesn't return any result
either. The index size is close to 1GB now, so it should be returning some
documents. The rest of the queries are functioning properly. Any help?

Thanks,

-- 
- Siddhant


Re: Reload synonyms

2010-01-05 Thread Siddhant Goel
On Tue, Jan 5, 2010 at 2:24 PM, Peter A. Kirk  wrote:

> Thanks for the answer. How does one "reload" a core? Is there an API, or a
> url one can use?
>

I think this should be it - http://wiki.apache.org/solr/CoreAdmin#RELOAD

-- 
- Siddhant


Re: Adaptive search?

2009-12-22 Thread Siddhant Goel
On Tue, Dec 22, 2009 at 12:01 PM, Ryan Kennedy  wrote:

> This approach will be limited to applying a "global" rank to all the
> documents, which may have some unintended consequences. The most
> popular document in your index will be the most popular, even for
> queries for which it was never clicked on.


Right. Makes so much sense. Thanks for sharing.

-- 
- Siddhant


Re: Adaptive search?

2009-12-17 Thread Siddhant Goel
Let say we have a search engine (a simple front end - web app kind of a
thing - responsible for querying Solr and then displaying the results in a
human readable form) based on Solr. If a user searches for something, gets
quite a few search results, and then clicks on one such result - is there
any mechanism by which we can notify Solr to boost the score/relevance of
that particular result in future searches? If not, then any pointers on how
to go about doing that would be very helpful.

Thanks,

On Thu, Dec 17, 2009 at 7:50 PM, Paul Libbrecht  wrote:

> What can it mean to "adapt to user clicks" ? Quite many things in my head.
> Do you have maybe a citation that inspires you here?
>
> paul
>
>
> Le 17-déc.-09 à 13:52, Siddhant Goel a écrit :
>
>
>  Does Solr provide adaptive searching? Can it adapt to user clicks within
>> the
>> search results it provides? Or that has to be done externally?
>>
>
>


-- 
- Siddhant


Adaptive search?

2009-12-17 Thread Siddhant Goel
Hi,

Does Solr provide adaptive searching? Can it adapt to user clicks within the
search results it provides? Or that has to be done externally?

I couldn't find anything on googling for it.

Thanks,

-- 
- Siddhant