LockObtainFailedException and older segment_N files are present

2012-10-23 Thread jame vaalet
Hi,
I have got a searcher server replicating index from master server. Recently
I have noticed the huge difference in the index size between master and
slave followed by LockObtainFailedException in catalin.out log. When I
debugged the searcher index folder, I could see more that 100 segement_N
files in it. After debugging I found the root cause to be mis-configured
 solrconfig.xml, I was using solr3.4 and it had indexConfig section
instead of main index and hence forth it was using simple file lock rather
than the configured native lock(
http://wiki.apache.org/solr/SolrConfigXml#indexConfig). Rectifying this
cofiguration corrected the error and subsequently replication was fine and
older segement files got deleted, finally synching the searcher core size
and indexer core size.

What I would like to understand here is:

   1. Why would it cause lock obtain time out with simple file lock ? (may
   be the defaul /writeLockTimeout was too short).
   2. What would happen in case of this LockObtainFailedException while
   repliation, will it fail to replicate the docs ? However I have observed
   the searcher has equal numfound doc as indexer, which mean clients
   searching wouldn't face any documents missing.
   3. why did the index size bloat and older segment_N files were present?


thanks in advance !



-- 

-JAME


Re: segment number during optimize of index

2012-10-11 Thread jame vaalet
Hi Lance,
My earlier point may be misleading
   1. Segments are independent sub-indexes in seperate file, while
| indexing
| its better to create new segment as it doesnt have to modify an
| existing
| file. where as while searching, *smaller the segment* the better
| it is
|  since
| you open x (not exactly x but xn a value proportional to x)
| physical
|  files
| to search if you have got x segments in the index.

The smallerwas referencing to the segment number rather than segment
size.

When you said Large Pages does it mean segment size should be less than a
threshold for a better performance from OS point of view?  My main concern
here is what would be the main disadvantage (indexing  or searching) if i
merge my entire 150 GB index (right now 100 segments) into a single segment
?





On 11 October 2012 07:28, Lance Norskog goks...@gmail.com wrote:

 Study index merging. This is awesome.

 http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

 Jame- opening lots of segments is not a problem. A major performance
 problem you will find is 'Large Pages'. This is an operating-system
 strategy for managing servers with 10s of gigabytes of memory. Without it,
 all large programs run much more slowly than they could. It is not a Solr
 or JVM problem.


 - Original Message -
 | From: jun Wang wangjun...@gmail.com
 | To: solr-user@lucene.apache.org
 | Sent: Wednesday, October 10, 2012 6:36:09 PM
 | Subject: Re: segment number during optimize of index
 |
 | I have an other question, does the number of segment affect speed for
 | update index?
 |
 | 2012/10/10 jame vaalet jamevaa...@gmail.com
 |
 |  Guys,
 |  thanks for all the inputs, I was continuing my research to know
 |  more about
 |  segments in Lucene. Below are my conclusion, please correct me if
 |  am wrong.
 | 
 | 1. Segments are independent sub-indexes in seperate file, while
 | indexing
 | its better to create new segment as it doesnt have to modify an
 | existing
 | file. where as while searching, smaller the segment the better
 | it is
 |  since
 | you open x (not exactly x but xn a value proportional to x)
 | physical
 |  files
 | to search if you have got x segments in the index.
 | 2. since lucene has memory map concept, for each file/segment in
 | index a
 | new m-map file is created and mapped to the physcial file in
 | disk. Can
 | someone explain or correct this in detail, i am sure there are
 | lot many
 | people wondering how m-map works while you merge or optimze
 | index
 |  segments.
 | 
 | 
 | 
 |  On 6 October 2012 07:41, Otis Gospodnetic
 |  otis.gospodne...@gmail.com
 |  wrote:
 | 
 |   If I were you and not knowing all your details...
 |  
 |   I would optimize indices that are static (not being modified) and
 |   would optimize down to 1 segment.
 |   I would do it when search traffic is low.
 |  
 |   Otis
 |   --
 |   Search Analytics -
 |   http://sematext.com/search-analytics/index.html
 |   Performance Monitoring - http://sematext.com/spm/index.html
 |  
 |  
 |   On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet
 |   jamevaa...@gmail.com
 |  wrote:
 |Hi Eric,
 |I  am in a major dilemma with my index now. I have got 8 cores
 |each
 |   around
 |300 GB in size and half of them are deleted documents in it and
 |above
 |   that
 |each has got around 100 segments as well. Do i issue a
 |expungeDelete
 |  and
 |allow the merge policy to take care of the segments or optimize
 |them
 |  into
 |single segment. Search performance is not at par compared to
 |usual solr
 |speed.
 |If i have to optimize what segment number should i choose? my
 |RAM size
 |around 120 GB and JVM heap is around 45 GB (oldGen being 30
 |GB). Pleas
 |advice !
 |   
 |thanks.
 |   
 |   
 |On 6 October 2012 00:00, Erick Erickson
 |erickerick...@gmail.com
 |  wrote:
 |   
 |because eventually you'd run out of file handles. Imagine a
 |long-running server with 100,000 segments. Totally
 |unmanageable.
 |   
 |I think shawn was emphasizing that RAM requirements don't
 |depend on the number of segments. There are other
 |resources that file consume however.
 |   
 |Best
 |Erick
 |   
 |On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet
 |jamevaa...@gmail.com
 |   wrote:
 | hi Shawn,
 | thanks for the detailed explanation.
 | I have got one doubt, you said it doesn matter how many
 | segments
 |  index
 |have
 | but then why does solr has this merge policy which merges
 | segments
 | frequently?  why can it leave the segments as it is rather
 | than
 |   merging
 | smaller one's into bigger one?
 |
 | thanks
 | .
 |
 | On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org
 | wrote:
 |
 | On 10/4/2012 3:22 PM, jame vaalet wrote:
 |
 | so

Re: segment number during optimize of index

2012-10-10 Thread jame vaalet
Guys,
thanks for all the inputs, I was continuing my research to know more about
segments in Lucene. Below are my conclusion, please correct me if am wrong.

   1. Segments are independent sub-indexes in seperate file, while indexing
   its better to create new segment as it doesnt have to modify an existing
   file. where as while searching, smaller the segment the better it is since
   you open x (not exactly x but xn a value proportional to x) physical files
   to search if you have got x segments in the index.
   2. since lucene has memory map concept, for each file/segment in index a
   new m-map file is created and mapped to the physcial file in disk. Can
   someone explain or correct this in detail, i am sure there are lot many
   people wondering how m-map works while you merge or optimze index segments.



On 6 October 2012 07:41, Otis Gospodnetic otis.gospodne...@gmail.comwrote:

 If I were you and not knowing all your details...

 I would optimize indices that are static (not being modified) and
 would optimize down to 1 segment.
 I would do it when search traffic is low.

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Fri, Oct 5, 2012 at 4:27 PM, jame vaalet jamevaa...@gmail.com wrote:
  Hi Eric,
  I  am in a major dilemma with my index now. I have got 8 cores each
 around
  300 GB in size and half of them are deleted documents in it and above
 that
  each has got around 100 segments as well. Do i issue a expungeDelete and
  allow the merge policy to take care of the segments or optimize them into
  single segment. Search performance is not at par compared to usual solr
  speed.
  If i have to optimize what segment number should i choose? my RAM size
  around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
  advice !
 
  thanks.
 
 
  On 6 October 2012 00:00, Erick Erickson erickerick...@gmail.com wrote:
 
  because eventually you'd run out of file handles. Imagine a
  long-running server with 100,000 segments. Totally
  unmanageable.
 
  I think shawn was emphasizing that RAM requirements don't
  depend on the number of segments. There are other
  resources that file consume however.
 
  Best
  Erick
 
  On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet jamevaa...@gmail.com
 wrote:
   hi Shawn,
   thanks for the detailed explanation.
   I have got one doubt, you said it doesn matter how many segments index
  have
   but then why does solr has this merge policy which merges segments
   frequently?  why can it leave the segments as it is rather than
 merging
   smaller one's into bigger one?
  
   thanks
   .
  
   On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org wrote:
  
   On 10/4/2012 3:22 PM, jame vaalet wrote:
  
   so imagine i have merged the 150 Gb index into single segment, this
  would
   make a single segment of 150 GB in memory. When new docs are
 indexed it
   wouldn't alter this 150 Gb index unless i update or delete the older
  docs,
   right? will 150 Gb single segment have problem with memory swapping
 at
  OS
   level?
  
  
   Supplement to my previous reply:  the real memory mentioned in the
 last
   paragraph does not include the memory that the OS uses to cache disk
   access.  If more memory is needed and all the free memory is being
 used
  by
   the disk cache, the OS will throw away part of the disk cache (a
   near-instantaneous operation that should never involve disk I/O) and
  give
   that memory to the application that requests it.
  
   Here's a very good breakdown of how memory gets used with
 MMapDirectory
  in
   Solr.  It's applicable to any program that uses memory mapping, not
 just
   Solr:
  
  
 http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory
  http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory
  
   Thanks,
   Shawn
  
  
  
  
   --
  
   -JAME
 
 
 
 
  --
 
  -JAME




-- 

-JAME


Re: segment number during optimize of index

2012-10-05 Thread jame vaalet
hi Shawn,
thanks for the detailed explanation.
I have got one doubt, you said it doesn matter how many segments index have
but then why does solr has this merge policy which merges segments
frequently?  why can it leave the segments as it is rather than merging
smaller one's into bigger one?

thanks
.

On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org wrote:

 On 10/4/2012 3:22 PM, jame vaalet wrote:

 so imagine i have merged the 150 Gb index into single segment, this would
 make a single segment of 150 GB in memory. When new docs are indexed it
 wouldn't alter this 150 Gb index unless i update or delete the older docs,
 right? will 150 Gb single segment have problem with memory swapping at OS
 level?


 Supplement to my previous reply:  the real memory mentioned in the last
 paragraph does not include the memory that the OS uses to cache disk
 access.  If more memory is needed and all the free memory is being used by
 the disk cache, the OS will throw away part of the disk cache (a
 near-instantaneous operation that should never involve disk I/O) and give
 that memory to the application that requests it.

 Here's a very good breakdown of how memory gets used with MMapDirectory in
 Solr.  It's applicable to any program that uses memory mapping, not just
 Solr:

 http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectoryhttp://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory

 Thanks,
 Shawn




-- 

-JAME


Re: segment number during optimize of index

2012-10-05 Thread jame vaalet
Hi Eric,
I  am in a major dilemma with my index now. I have got 8 cores each around
300 GB in size and half of them are deleted documents in it and above that
each has got around 100 segments as well. Do i issue a expungeDelete and
allow the merge policy to take care of the segments or optimize them into
single segment. Search performance is not at par compared to usual solr
speed.
If i have to optimize what segment number should i choose? my RAM size
around 120 GB and JVM heap is around 45 GB (oldGen being 30 GB). Pleas
advice !

thanks.


On 6 October 2012 00:00, Erick Erickson erickerick...@gmail.com wrote:

 because eventually you'd run out of file handles. Imagine a
 long-running server with 100,000 segments. Totally
 unmanageable.

 I think shawn was emphasizing that RAM requirements don't
 depend on the number of segments. There are other
 resources that file consume however.

 Best
 Erick

 On Fri, Oct 5, 2012 at 1:08 PM, jame vaalet jamevaa...@gmail.com wrote:
  hi Shawn,
  thanks for the detailed explanation.
  I have got one doubt, you said it doesn matter how many segments index
 have
  but then why does solr has this merge policy which merges segments
  frequently?  why can it leave the segments as it is rather than merging
  smaller one's into bigger one?
 
  thanks
  .
 
  On 5 October 2012 05:46, Shawn Heisey s...@elyograg.org wrote:
 
  On 10/4/2012 3:22 PM, jame vaalet wrote:
 
  so imagine i have merged the 150 Gb index into single segment, this
 would
  make a single segment of 150 GB in memory. When new docs are indexed it
  wouldn't alter this 150 Gb index unless i update or delete the older
 docs,
  right? will 150 Gb single segment have problem with memory swapping at
 OS
  level?
 
 
  Supplement to my previous reply:  the real memory mentioned in the last
  paragraph does not include the memory that the OS uses to cache disk
  access.  If more memory is needed and all the free memory is being used
 by
  the disk cache, the OS will throw away part of the disk cache (a
  near-instantaneous operation that should never involve disk I/O) and
 give
  that memory to the application that requests it.
 
  Here's a very good breakdown of how memory gets used with MMapDirectory
 in
  Solr.  It's applicable to any program that uses memory mapping, not just
  Solr:
 
  http://java.dzone.com/**articles/use-lucene%E2%80%99s-**mmapdirectory
 http://java.dzone.com/articles/use-lucene%E2%80%99s-mmapdirectory
 
  Thanks,
  Shawn
 
 
 
 
  --
 
  -JAME




-- 

-JAME


Re: solr meger policy

2012-10-04 Thread jame vaalet
Thats the first thing i tried, but it had only merge factor and
maxmergedocs in it. We have different merge policies like
LogMergePolicyhttp://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/LogMergePolicy.html
, 
NoMergePolicyhttp://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/NoMergePolicy.html
, 
TieredMergePolicyhttp://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/TieredMergePolicy.html
, 
UpgradeIndexMergePolicyhttp://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/UpgradeIndexMergePolicy.html.
finally i found what is the default policy and values in 3.4 lucene:

   - default policy is
TieredMergePolicyhttp://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/TieredMergePolicy.html
(
   
http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/MergePolicy.html
   )
   - default constants unless specified are
   
http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/constant-values.html


On 4 October 2012 18:29, Otis Gospodnetic otis.gospodne...@gmail.comwrote:

 Hi,

 Look for the word merge in solrconfig.xml :)

 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Oct 4, 2012 7:50 AM, jame vaalet jamevaa...@gmail.com wrote:

  Hi,
  I would like to know the different merge policies lucene uses in
 different
  versions of SOLR. I have got 3.4 and 3.6 versions of solr running but how
  do i point them to use different merge policies?
  Thanks in advance !
 
  --
 
  -JAME
 




-- 

-JAME


Re: solr meger policy

2012-10-04 Thread jame vaalet
thanks Tomaz.

On 4 October 2012 19:56, Tomás Fernández Löbbe tomasflo...@gmail.comwrote:

 TieredMergePolicy is the default in Solr since 3.3. See
 https://issues.apache.org/jira/browse/SOLR-2567 It is still the default
 for
 4.0, so you should have the same MergePolicy in 3.4 and 3.6.



 On Thu, Oct 4, 2012 at 9:14 AM, jame vaalet jamevaa...@gmail.com wrote:

  Thats the first thing i tried, but it had only merge factor and
  maxmergedocs in it. We have different merge policies like
  LogMergePolicy
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/LogMergePolicy.html
  
  , NoMergePolicy
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/NoMergePolicy.html
  
  , TieredMergePolicy
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/TieredMergePolicy.html
  
  , UpgradeIndexMergePolicy
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/UpgradeIndexMergePolicy.html
  .
  finally i found what is the default policy and values in 3.4 lucene:
 
 - default policy is
  TieredMergePolicy
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/TieredMergePolicy.html
  
  (
 
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/org/apache/lucene/index/MergePolicy.html
 )
 - default constants unless specified are
 
 
 http://lucene.apache.org/core/old_versioned_docs/versions/3_4_0/api/core/constant-values.html
 
 
  On 4 October 2012 18:29, Otis Gospodnetic otis.gospodne...@gmail.com
  wrote:
 
   Hi,
  
   Look for the word merge in solrconfig.xml :)
  
   Otis
   --
   Performance Monitoring - http://sematext.com/spm
   On Oct 4, 2012 7:50 AM, jame vaalet jamevaa...@gmail.com wrote:
  
Hi,
I would like to know the different merge policies lucene uses in
   different
versions of SOLR. I have got 3.4 and 3.6 versions of solr running but
  how
do i point them to use different merge policies?
Thanks in advance !
   
--
   
-JAME
   
  
 
 
 
  --
 
  -JAME
 




-- 

-JAME


Re: segment number during optimize of index

2012-10-04 Thread jame vaalet
so imagine i have merged the 150 Gb index into single segment, this would
make a single segment of 150 GB in memory. When new docs are indexed it
wouldn't alter this 150 Gb index unless i update or delete the older docs,
right? will 150 Gb single segment have problem with memory swapping at OS
level?

On 5 October 2012 02:28, Otis Gospodnetic otis.gospodne...@gmail.comwrote:

 You can certainly optimize down to just 1 segment.

 Note that this is the most expensive option and that when you do that
 you may actually hurt performance for a bit because Solr/Lucene may
 need to re-read a bunch of data from the index for sorting and
 faceting purposes.  You will also invalidate the previously cached
 index data in the OS cache.

 Finally, if this index is being modified, it will be de-optimized
 again.  Note that Lucene periodically merges segments under the hood
 as documents are added to the index anyway.

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Thu, Oct 4, 2012 at 4:20 PM, jame vaalet jamevaa...@gmail.com wrote:
  Hi,
  I was about to do optimize on my index which has got around 100 segments
  right now, but am confused about the segment size that has to be chosen.
  would it have any trouble merging all the index into one single segment ?
  thanks in advance.
 
  --
 
  -JAME




-- 

-JAME


indexing key value pair into lucene solr index

2011-10-24 Thread jame vaalet
hi,
in my use case i have list of key value pairs in each document object, if i
index them as separate index fields then in the result doc object i will get
two arrays corresponding to my keys and values. The problem i face here is
that there wont be any mapping between those keys and values.

do we have any easy to index these data in solr ? thanks in advance ...

-- 

-JAME


Re: indexing key value pair into lucene solr index

2011-10-24 Thread jame vaalet
thanks karsten.
can we preserve order within index field ? if yes, i can index them
separately and map them using their order.

On 24 October 2011 17:32, karsten-s...@gmx.de wrote:

 Hi Jame,

 you can
  - generate one token for each pair (key, value) -- key_value
  - insert a gap between each pair and us phrase queries
  - use key as field-name (if you have a restricted set of keys)
  - wait for joins in Solr 4.0 (http://wiki.apache.org/solr/Join)
  - use position or payloads to connect key and value
  - tell the forum your exact use-case with examples

 Best regrads
  Karsten

  Original-Nachricht 
  Datum: Mon, 24 Oct 2011 17:11:49 +0530
  Von: jame vaalet jamevaa...@gmail.com
  An: solr-user@lucene.apache.org
  Betreff: indexing key value pair into lucene solr index

  hi,
  in my use case i have list of key value pairs in each document object, if
  i
  index them as separate index fields then in the result doc object i will
  get
  two arrays corresponding to my keys and values. The problem i face here
 is
  that there wont be any mapping between those keys and values.
 
  do we have any easy to index these data in solr ? thanks in advance ...
 
  --
 
  -JAME




-- 

-JAME


term vector parser in solr.NET

2011-09-19 Thread jame vaalet
hi,
i was wondering if there is any method to get back the term vector list from
solr through solr.NET?
from the source code for SOLR.NET i couldn't notice any term vector parser
in SOLR.NET .

-- 

-JAME


Re: can i create filters of score range

2011-08-25 Thread jame vaalet
well when i said client.. i meant querying through solr.NET (in a way this
can be seen as posting it through web browser url).
so coming back to the issue .. even if am sorting it by _docid_ i need to do
paging( 2 million docs in result)
how is it internally doing it ?
when sorted by docid, don we have deep pagin issue ? (getting all the
previous pages into memory to get the next page)
so whats the main difference we are gaining by sorting lucene docids and
normal fields ?

On 23 August 2011 22:41, Erick Erickson erickerick...@gmail.com wrote:

 Did you try exactly what Chris suggested? Appending
 sort=_docid_ asc to the query? When you say
 client I assume you're talking SolrJ, and I'm pretty
 sure that SolrQuery.setSortField is what you want.

 I suppose you could also set this as the default in your
 query handler.

 Best
 Erick

 On Tue, Aug 23, 2011 at 4:43 AM, jame vaalet jamevaa...@gmail.com wrote:
  okey, so this is something i was looking for .. the default order of
 result
  docs in lucene\solr ..
  and you are right, since i don care about the order in which i get the
 docs
  ideally i shouldn't ask solr to do any sorting on its raw result list
 ...
  though i understand your point, how do i do it as solr client ? by
 default
  if am not mentioning the sort parameter in query URL to solr, solr will
 try
  to sort it with respect to the score it calculated .. how do i prevent
 even
  this sorting ..do we have any setting as such in solr for this ?
 
 
  On 23 August 2011 03:29, Chris Hostetter hossman_luc...@fucit.org
 wrote:
 
 
  : before going into lucene doc id , i have got creationDate datetime
 field
  in
  : my index which i can use as page definition using filter query..
  : i have learned exposing lucene docid wont be a clever idea, as its
 again
  : relative to index instance.. where as my index date field will be
 unique
  : ..and i can definitely create ranges with that..
 
  i think you missunderstood me: i'm *not* suggesting you do any filtering
  on the internal lucene doc id.  I am suggesting that you forget all
 about
  trying to filter to work arround the issues with deep paging, and simply
  *sort* on _docid_ asc, which should make all inherient issues with deep
  paging go away (as far as i know).  At no point with the internal lucene
  doc ids be exposed to your client code, it's just a instruction to
  Solr/Lucene that it doesn't really need to do any sorting, it can just
  return the Nth-Mth docs as collected.
 
  : i ahve got on more doubt .. if i use filter query each time will it
  result
  : in memory problem like that we see in deep paging issues..
 
  it could, i'm not sure. that's why i said...
 
  :  I'm not sure if this would really gain you much though -- yes this
  would
  :  work arround some of the memory issues inherient in deep paging
 but
  it
  :  would still require a lot or rescoring of documents again and again.
 
 
  -Hoss
 
 
 
 
  --
 
  -JAME
 




-- 

-JAME


Re: can i create filters of score range

2011-08-23 Thread jame vaalet
okey, so this is something i was looking for .. the default order of result
docs in lucene\solr ..
and you are right, since i don care about the order in which i get the docs
ideally i shouldn't ask solr to do any sorting on its raw result list ...
though i understand your point, how do i do it as solr client ? by default
if am not mentioning the sort parameter in query URL to solr, solr will try
to sort it with respect to the score it calculated .. how do i prevent even
this sorting ..do we have any setting as such in solr for this ?


On 23 August 2011 03:29, Chris Hostetter hossman_luc...@fucit.org wrote:


 : before going into lucene doc id , i have got creationDate datetime field
 in
 : my index which i can use as page definition using filter query..
 : i have learned exposing lucene docid wont be a clever idea, as its again
 : relative to index instance.. where as my index date field will be unique
 : ..and i can definitely create ranges with that..

 i think you missunderstood me: i'm *not* suggesting you do any filtering
 on the internal lucene doc id.  I am suggesting that you forget all about
 trying to filter to work arround the issues with deep paging, and simply
 *sort* on _docid_ asc, which should make all inherient issues with deep
 paging go away (as far as i know).  At no point with the internal lucene
 doc ids be exposed to your client code, it's just a instruction to
 Solr/Lucene that it doesn't really need to do any sorting, it can just
 return the Nth-Mth docs as collected.

 : i ahve got on more doubt .. if i use filter query each time will it
 result
 : in memory problem like that we see in deep paging issues..

 it could, i'm not sure. that's why i said...

 :  I'm not sure if this would really gain you much though -- yes this
 would
 :  work arround some of the memory issues inherient in deep paging but
 it
 :  would still require a lot or rescoring of documents again and again.


 -Hoss




-- 

-JAME


can i create filters of score range

2011-08-22 Thread jame vaalet
hi.
Is it possible to say fq=score[1 TO *]
i have tried but solr is throwing error ? can this be done with some other
syntax ?

-- 

-JAME


Re: can i create filters of score range

2011-08-22 Thread jame vaalet
thanks erick for the answer..
my index have around 20 million document in it.. and each query of mine will
yield around 1 million hits (numFound)..corresponding to each my query i
store the hit document id into data base for further processing..
retrieving 1 million docids from solr through paging is resulting in deep
pagin issues..so i wonder if i can use filter queries to fetch all the 1
mllion docids chunk by chunk .. so for me the best filter wiould score... if
i can find the maximum score i can filter out other docs ..

what is the minimum value of solr score? i don think it will have negative
values.. so if its always above 0.. my first chunk wud be score [0 TO *]
rows =1 my next chunk will start from the max score from first chunk to
* with rows =1 .. this will ensure that while fetching the 1000th chunk
solr don have to get all the previous doc ids into memory ..



On 22 August 2011 19:51, Erick Erickson erickerick...@gmail.com wrote:

 I don't believe that this is possible, and I strongly question
 whether it's useful (not to mention the syntax error, score:[1 TO *],
 notice
 the colon).

 Scores really are dimensionless. A normalized score of 0.5 for a
 particular
 query doesn't really say anything about how good the document is, it just
 tells you that it's better than a doc of 0.4

 This smells like an XY problem, see:
 http://people.apache.org/~hossman/#xyproblem

 What is a higher-level statement of the problem you're trying to solve?

 Best
 Erick



 On Mon, Aug 22, 2011 at 7:28 AM, jame vaalet jamevaa...@gmail.com wrote:
  hi.
  Is it possible to say fq=score[1 TO *]
  i have tried but solr is throwing error ? can this be done with some
 other
  syntax ?
 
  --
 
  -JAME
 




-- 

-JAME


Re: can i create filters of score range

2011-08-22 Thread jame vaalet
thanks hoss... thats a real good explanation ..
well,  don care about the sort order i just want all of the docs .. and yes
score values may be duplicated which will deteriorate my search
performance...
before going into lucene doc id , i have got creationDate datetime field in
my index which i can use as page definition using filter query..
i have learned exposing lucene docid wont be a clever idea, as its again
relative to index instance.. where as my index date field will be unique
..and i can definitely create ranges with that..

i ahve got on more doubt .. if i use filter query each time will it result
in memory problem like that we see in deep paging issues..

On 23 August 2011 01:05, Chris Hostetter hossman_luc...@fucit.org wrote:


 : retrieving 1 million docids from solr through paging is resulting in deep
 : pagin issues..so i wonder if i can use filter queries to fetch all the 1
 : mllion docids chunk by chunk .. so for me the best filter wiould score...
 if
 : i can find the maximum score i can filter out other docs ..
 :
 : what is the minimum value of solr score? i don think it will have
 negative
 : values.. so if its always above 0.. my first chunk wud be score [0 TO *]
 : rows =1 my next chunk will start from the max score from first chunk
 to
 : * with rows =1 .. this will ensure that while fetching the 1000th
 chunk
 : solr don have to get all the previous doc ids into memory ..

 a) given an arbitrary query, there is no min/max score (think about
 function queries, you could write a math based query that results in
 -10 being the highest score)

 b) you could use an frange query on score to partition your docs like
 this.  you'd need to start with an unfiltered query, record the docid and
 score for all of page #1 and then use the score of the last docid on
 page #1 as the min for your filter when asking for page #2 (still with
 start=0 though) .. but you'd have to manually ignore any docs you'd
 already seen because of duplicate scores.

 I'm not sure if this would really gain you much though -- yes this would
 work arround some of the memory issues inherient in deep paging but it
 would still require a lot or rescoring of documents again and again.

 If that type of appraoch works for you, then you'd probably be better off
 using your own ID field as the sort/filter instead of score (since there
 would be no duplicates)

 Based on your problem description though, it sounds like you don't
 actaully care about the scores -- and i don't see anything in your writup
 that suggests that the order actually matters to you -- you just want them
 all ... correct?

 in that case, have you considered jsut using sort=_docid_ asc ?

 that gives you the internal lucene doc id sorting which actually means
 no sorting work is needed, which i *think* means there is no in memory
 buffering needed for the deep paging situation.


 -Hoss




-- 

-JAME


Re: query cache result

2011-08-20 Thread jame vaalet
thanks tomas ..
can we set querywindowsize of particular query through url ?
say, i want only a particular set of query's result to be cached and not
other queries . is it possible to control this query cache results and
window size for each query separately ?


2011/8/19 Tomás Fernández Löbbe tomasflo...@gmail.com

 From my understanding, seeing the cache as a set of key-value pairs, this
 cache has the query as key and the list of IDs resulting from the query as
 values. When the exact same query is issued, it will be found as key in
 this
 cache, and Solr will already have the list of IDs that match it.
 If you set the size of this cache to 50, that means that Solr will keep in
 memory the last 50 queries with their list of resulting document IDs.

 The number of IDs per query can be configured with the parameter
 queryResultWindowSize
 http://wiki.apache.org/solr/SolrCaching#queryResultWindowSize

 On Fri, Aug 19, 2011 at 10:34 AM, jame vaalet jamevaa...@gmail.com
 wrote:

  wiki says *size
 
  The maximum number of entries in the cache.
  andqueryResultCache
 
  This cache stores ordered sets of document IDs — the top N results of a
  query ordered by some criteria.
  *
 
  doesn't it mean number of document ids rather than number of queries ?
 
 
 
 
 
  2011/8/19 Tomás Fernández Löbbe tomasflo...@gmail.com
 
   Hi Jame, the size for the queryResultCache is the number of queries
 that
   will fit into this cache. AutowarmCount is the number of queries that
 are
   going to be copyed from the old cache to the new cache when a commit
   occurrs
   (actually, the queries are going to be executed again agains the new
   IndexSearcher, as the results for them may have changed on the new
  Index).
   initial size is the initial size of the array, it will start to grow
 from
   that size up to size. You may want to see this page of the wiki:
   http://wiki.apache.org/solr/SolrCaching
  
   Regards,
  
   Tomás
   On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com
  wrote:
  
hi,
i understand that queryResultCache tag in solrconfig is the one which
determines the cache size of SOLR in jvm.
   
queryResultCache class=*solr.LRUCache*
size=*${queryResultCacheSize:0}*initialSize
=*${queryResultCacheInitialSize:0}* autowarmCount=*
${queryResultCacheRows:0}* /
   
   
out of the different attributes what is size? Is it the amount of
  memory
reserved in bytes ? or number of doc ids cached ? or is it the number
  of
queries it will cache?
   
similarly wat is initial size and autowarm depicted in?
   
can some please reply ...
   
  
 
 
 
  --
 
  -JAME
 




-- 

-JAME


Re: paging size in SOLR

2011-08-19 Thread jame vaalet
1 .what does this specify ?

queryResultCache class=*solr.LRUCache*
size=*${queryResultCacheSize:0}*initialSize
=*${queryResultCacheInitialSize:0}* autowarmCount=*
${queryResultCacheRows:0}* /

2.

when i say *queryResultCacheSize : 512 *, does it mean 512 queries can be
cached or 512 bytes are reserved for caching ?

can some please give me an answer ?



On 14 August 2011 21:41, Erick Erickson erickerick...@gmail.com wrote:

 Yep.

 ResultWindowSize in
  solrconfig.xml
 
  Best
  Erick
 
  On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet jamevaa...@gmail.com
 wrote:
   thanks erick ... that means it depends upon the memory allocated to
 the
  JVM
   .
  
   going back queryCacheResults factor i have got this doubt ..
   say, i have got 10 threads with 10 different queries ..and each of
 them
  in
   parallel are searching the same index with millions of docs in it
   (multisharded ) .
   now each of the queries have large number of results in it hence got
 to
  page
   them all..
   which all thread's (query ) result-set will be cached ? so that
  subsequent
   pages can be retrieved quickly ..?
  
   On 14 August 2011 17:40, Erick Erickson erickerick...@gmail.com
 wrote:
  
   There isn't an optimum page size that I know of, it'll vary with
 lots
  of
   stuff, not the least of which is whatever servlet container limits
 there
   are.
  
   But I suspect you can get quite a few (1000s) without
   too much problem, and you can always use the JSON response
   writer to pack in more pages with less overhead.
  
   You pretty much have to try it and see.
  
   Best
   Erick
  
   On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet jamevaa...@gmail.com
  wrote:
speaking about pagesizes, what is the optimum page size that should
 be
retrieved each time ??
i understand it depends upon the data you are fetching back
 fromeach
  hit
document ... but lets say when ever a document is hit am fetching
 back
   100
bytes worth data from each of those docs in indexes (along with
 solr
response statements ) .
this will make 100*x bytes worth data in each page if x is the page
  size
   ..
what is the optimum value of this x that solr can return each time
   without
going into exceptions 
   
On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com
  wrote:
   
Jame:
   
You control the number via settings in solrconfig.xml, so it's
up to you.
   
Jonathan:
Hmmm, that's seems right, after all the deep paging penalty is
  really
about keeping a large sorted array in memory but at least you
  only
pay it once per 10,000, rather than 100 times (assuming page size
 is
100)...
   
Best
Erick
   
On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
 jamevaa...@gmail.com
wrote:
 when you say queryResultCache, does it only cache n number of
  result
   for
the
 last one query or more than one queries?


 On 10 August 2011 20:14, simon mtnes...@gmail.com wrote:

 Worth remembering there are some performance penalties with
 deep
 paging, if you use the page-by-page approach. may not be too
 much
  of
   a
 problem if you really are only looking to retrieve 10K docs.

 -Simon

 On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
 erickerick...@gmail.com wrote:
  Well, if you really want to you can specify start=0 and
  rows=1
   and
  get them all back at once.
 
  You can do page-by-page by incrementing the start parameter
 as
   you
  indicated.
 
  You can keep from re-executing the search by setting your
 queryResultCache
  appropriately, but this affects all searches so might be an
  issue.
 
  Best
  Erick
 
  On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet 
  jamevaa...@gmail.com
   
 wrote:
  hi,
  i want to retrieve all the data from solr (say 10,000 ids )
 and
  my
page
 size
  is 1000 .
  how do i get back the data (pages) one after other ?do i
 have
  to
 increment
  the start value each time by the page size from 0 and do
 the
iteration
 ?
  In this case am i querying the index 10 time instead of one
 or
   after
 first
  query the result will be cached somewhere for the subsequent
  pages
   ?
 
 
  JAME VAALET
 
 




 --

 -JAME

   
   
   
   
--
   
-JAME
   
  
  
  
  
   --
  
   -JAME
  
 
 
 
 
  --
 
  -JAME
 




-- 

-JAME


query cache result

2011-08-19 Thread jame vaalet
hi,
i understand that queryResultCache tag in solrconfig is the one which
determines the cache size of SOLR in jvm.

queryResultCache class=*solr.LRUCache*
size=*${queryResultCacheSize:0}*initialSize
=*${queryResultCacheInitialSize:0}* autowarmCount=*
${queryResultCacheRows:0}* /


out of the different attributes what is size? Is it the amount of memory
reserved in bytes ? or number of doc ids cached ? or is it the number of
queries it will cache?

similarly wat is initial size and autowarm depicted in?

can some please reply ...


Re: query cache result

2011-08-19 Thread jame vaalet
wiki says *size

The maximum number of entries in the cache.
andqueryResultCache

This cache stores ordered sets of document IDs — the top N results of a
query ordered by some criteria.
*

doesn't it mean number of document ids rather than number of queries ?





2011/8/19 Tomás Fernández Löbbe tomasflo...@gmail.com

 Hi Jame, the size for the queryResultCache is the number of queries that
 will fit into this cache. AutowarmCount is the number of queries that are
 going to be copyed from the old cache to the new cache when a commit
 occurrs
 (actually, the queries are going to be executed again agains the new
 IndexSearcher, as the results for them may have changed on the new Index).
 initial size is the initial size of the array, it will start to grow from
 that size up to size. You may want to see this page of the wiki:
 http://wiki.apache.org/solr/SolrCaching

 Regards,

 Tomás
 On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com wrote:

  hi,
  i understand that queryResultCache tag in solrconfig is the one which
  determines the cache size of SOLR in jvm.
 
  queryResultCache class=*solr.LRUCache*
  size=*${queryResultCacheSize:0}*initialSize
  =*${queryResultCacheInitialSize:0}* autowarmCount=*
  ${queryResultCacheRows:0}* /
 
 
  out of the different attributes what is size? Is it the amount of memory
  reserved in bytes ? or number of doc ids cached ? or is it the number of
  queries it will cache?
 
  similarly wat is initial size and autowarm depicted in?
 
  can some please reply ...
 




-- 

-JAME


Re: paging size in SOLR

2011-08-14 Thread jame vaalet
speaking about pagesizes, what is the optimum page size that should be
retrieved each time ??
i understand it depends upon the data you are fetching back fromeach hit
document ... but lets say when ever a document is hit am fetching back 100
bytes worth data from each of those docs in indexes (along with solr
response statements ) .
this will make 100*x bytes worth data in each page if x is the page size ..
what is the optimum value of this x that solr can return each time without
going into exceptions 

On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com wrote:

 Jame:

 You control the number via settings in solrconfig.xml, so it's
 up to you.

 Jonathan:
 Hmmm, that's seems right, after all the deep paging penalty is really
 about keeping a large sorted array in memory but at least you only
 pay it once per 10,000, rather than 100 times (assuming page size is
 100)...

 Best
 Erick

 On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet jamevaa...@gmail.com
 wrote:
  when you say queryResultCache, does it only cache n number of result for
 the
  last one query or more than one queries?
 
 
  On 10 August 2011 20:14, simon mtnes...@gmail.com wrote:
 
  Worth remembering there are some performance penalties with deep
  paging, if you use the page-by-page approach. may not be too much of a
  problem if you really are only looking to retrieve 10K docs.
 
  -Simon
 
  On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
  erickerick...@gmail.com wrote:
   Well, if you really want to you can specify start=0 and rows=1 and
   get them all back at once.
  
   You can do page-by-page by incrementing the start parameter as you
   indicated.
  
   You can keep from re-executing the search by setting your
  queryResultCache
   appropriately, but this affects all searches so might be an issue.
  
   Best
   Erick
  
   On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com
  wrote:
   hi,
   i want to retrieve all the data from solr (say 10,000 ids ) and my
 page
  size
   is 1000 .
   how do i get back the data (pages) one after other ?do i have to
  increment
   the start value each time by the page size from 0 and do the
 iteration
  ?
   In this case am i querying the index 10 time instead of one or after
  first
   query the result will be cached somewhere for the subsequent pages ?
  
  
   JAME VAALET
  
  
 
 
 
 
  --
 
  -JAME
 




-- 

-JAME


Re: paging size in SOLR

2011-08-14 Thread jame vaalet
thanks erick ... that means it depends upon the memory allocated to the JVM
.

going back queryCacheResults factor i have got this doubt ..
say, i have got 10 threads with 10 different queries ..and each of them in
parallel are searching the same index with millions of docs in it
(multisharded ) .
now each of the queries have large number of results in it hence got to page
them all..
which all thread's (query ) result-set will be cached ? so that subsequent
pages can be retrieved quickly ..?

On 14 August 2011 17:40, Erick Erickson erickerick...@gmail.com wrote:

 There isn't an optimum page size that I know of, it'll vary with lots of
 stuff, not the least of which is whatever servlet container limits there
 are.

 But I suspect you can get quite a few (1000s) without
 too much problem, and you can always use the JSON response
 writer to pack in more pages with less overhead.

 You pretty much have to try it and see.

 Best
 Erick

 On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet jamevaa...@gmail.com wrote:
  speaking about pagesizes, what is the optimum page size that should be
  retrieved each time ??
  i understand it depends upon the data you are fetching back fromeach hit
  document ... but lets say when ever a document is hit am fetching back
 100
  bytes worth data from each of those docs in indexes (along with solr
  response statements ) .
  this will make 100*x bytes worth data in each page if x is the page size
 ..
  what is the optimum value of this x that solr can return each time
 without
  going into exceptions 
 
  On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com wrote:
 
  Jame:
 
  You control the number via settings in solrconfig.xml, so it's
  up to you.
 
  Jonathan:
  Hmmm, that's seems right, after all the deep paging penalty is really
  about keeping a large sorted array in memory but at least you only
  pay it once per 10,000, rather than 100 times (assuming page size is
  100)...
 
  Best
  Erick
 
  On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet jamevaa...@gmail.com
  wrote:
   when you say queryResultCache, does it only cache n number of result
 for
  the
   last one query or more than one queries?
  
  
   On 10 August 2011 20:14, simon mtnes...@gmail.com wrote:
  
   Worth remembering there are some performance penalties with deep
   paging, if you use the page-by-page approach. may not be too much of
 a
   problem if you really are only looking to retrieve 10K docs.
  
   -Simon
  
   On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
   erickerick...@gmail.com wrote:
Well, if you really want to you can specify start=0 and rows=1
 and
get them all back at once.
   
You can do page-by-page by incrementing the start parameter as
 you
indicated.
   
You can keep from re-executing the search by setting your
   queryResultCache
appropriately, but this affects all searches so might be an issue.
   
Best
Erick
   
On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com
 
   wrote:
hi,
i want to retrieve all the data from solr (say 10,000 ids ) and my
  page
   size
is 1000 .
how do i get back the data (pages) one after other ?do i have to
   increment
the start value each time by the page size from 0 and do the
  iteration
   ?
In this case am i querying the index 10 time instead of one or
 after
   first
query the result will be cached somewhere for the subsequent pages
 ?
   
   
JAME VAALET
   
   
  
  
  
  
   --
  
   -JAME
  
 
 
 
 
  --
 
  -JAME
 




-- 

-JAME


Re: paging size in SOLR

2011-08-14 Thread jame vaalet
my queryResultCache size =0  and queryResultWindowSize =50
does this mean that am not caching any results ?

On 14 August 2011 18:27, Erick Erickson erickerick...@gmail.com wrote:

 As many results will be cached as you ask. See solrconfig.xml,
 the queryResultCache. This cache is essentially a map of queries
 and result document IDs. The number of doc IDs cached for
 each query is controlled by queryResultWindowSize in
 solrconfig.xml

 Best
 Erick

 On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet jamevaa...@gmail.com wrote:
  thanks erick ... that means it depends upon the memory allocated to the
 JVM
  .
 
  going back queryCacheResults factor i have got this doubt ..
  say, i have got 10 threads with 10 different queries ..and each of them
 in
  parallel are searching the same index with millions of docs in it
  (multisharded ) .
  now each of the queries have large number of results in it hence got to
 page
  them all..
  which all thread's (query ) result-set will be cached ? so that
 subsequent
  pages can be retrieved quickly ..?
 
  On 14 August 2011 17:40, Erick Erickson erickerick...@gmail.com wrote:
 
  There isn't an optimum page size that I know of, it'll vary with lots
 of
  stuff, not the least of which is whatever servlet container limits there
  are.
 
  But I suspect you can get quite a few (1000s) without
  too much problem, and you can always use the JSON response
  writer to pack in more pages with less overhead.
 
  You pretty much have to try it and see.
 
  Best
  Erick
 
  On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet jamevaa...@gmail.com
 wrote:
   speaking about pagesizes, what is the optimum page size that should be
   retrieved each time ??
   i understand it depends upon the data you are fetching back fromeach
 hit
   document ... but lets say when ever a document is hit am fetching back
  100
   bytes worth data from each of those docs in indexes (along with solr
   response statements ) .
   this will make 100*x bytes worth data in each page if x is the page
 size
  ..
   what is the optimum value of this x that solr can return each time
  without
   going into exceptions 
  
   On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com
 wrote:
  
   Jame:
  
   You control the number via settings in solrconfig.xml, so it's
   up to you.
  
   Jonathan:
   Hmmm, that's seems right, after all the deep paging penalty is
 really
   about keeping a large sorted array in memory but at least you
 only
   pay it once per 10,000, rather than 100 times (assuming page size is
   100)...
  
   Best
   Erick
  
   On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet jamevaa...@gmail.com
   wrote:
when you say queryResultCache, does it only cache n number of
 result
  for
   the
last one query or more than one queries?
   
   
On 10 August 2011 20:14, simon mtnes...@gmail.com wrote:
   
Worth remembering there are some performance penalties with deep
paging, if you use the page-by-page approach. may not be too much
 of
  a
problem if you really are only looking to retrieve 10K docs.
   
-Simon
   
On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
erickerick...@gmail.com wrote:
 Well, if you really want to you can specify start=0 and
 rows=1
  and
 get them all back at once.

 You can do page-by-page by incrementing the start parameter as
  you
 indicated.

 You can keep from re-executing the search by setting your
queryResultCache
 appropriately, but this affects all searches so might be an
 issue.

 Best
 Erick

 On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet 
 jamevaa...@gmail.com
  
wrote:
 hi,
 i want to retrieve all the data from solr (say 10,000 ids ) and
 my
   page
size
 is 1000 .
 how do i get back the data (pages) one after other ?do i have
 to
increment
 the start value each time by the page size from 0 and do the
   iteration
?
 In this case am i querying the index 10 time instead of one or
  after
first
 query the result will be cached somewhere for the subsequent
 pages
  ?


 JAME VAALET


   
   
   
   
--
   
-JAME
   
  
  
  
  
   --
  
   -JAME
  
 
 
 
 
  --
 
  -JAME
 




-- 

-JAME


paging size in SOLR

2011-08-10 Thread jame vaalet
hi,
i want to retrieve all the data from solr (say 10,000 ids ) and my page size
is 1000 .
how do i get back the data (pages) one after other ?do i have to increment
the start value each time by the page size from 0 and do the iteration ?
In this case am i querying the index 10 time instead of one or after first
query the result will be cached somewhere for the subsequent pages ?


JAME VAALET


Re: paging size in SOLR

2011-08-10 Thread jame vaalet
when you say queryResultCache, does it only cache n number of result for the
last one query or more than one queries?


On 10 August 2011 20:14, simon mtnes...@gmail.com wrote:

 Worth remembering there are some performance penalties with deep
 paging, if you use the page-by-page approach. may not be too much of a
 problem if you really are only looking to retrieve 10K docs.

 -Simon

 On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
 erickerick...@gmail.com wrote:
  Well, if you really want to you can specify start=0 and rows=1 and
  get them all back at once.
 
  You can do page-by-page by incrementing the start parameter as you
  indicated.
 
  You can keep from re-executing the search by setting your
 queryResultCache
  appropriately, but this affects all searches so might be an issue.
 
  Best
  Erick
 
  On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com
 wrote:
  hi,
  i want to retrieve all the data from solr (say 10,000 ids ) and my page
 size
  is 1000 .
  how do i get back the data (pages) one after other ?do i have to
 increment
  the start value each time by the page size from 0 and do the iteration
 ?
  In this case am i querying the index 10 time instead of one or after
 first
  query the result will be cached somewhere for the subsequent pages ?
 
 
  JAME VAALET
 
 




-- 

-JAME


proximity within phrases

2011-07-26 Thread Jame Vaalet
How do u write solr query to mention proximity between two phrases 

dance jockey should appear within 10 words before video jokey 

(dance jockey) (video jockey)~10 

This isn't working fine . can some one suggest a way ?


-JAME


in fragsize whats the pre hit number and post hit number

2011-07-25 Thread jame vaalet
hi,
while searching for word SOLR in
highlighting  in solr can be manipulated
with frag-size =10 .

how is the fragment decided ? how many characters are taken before the world
SOLR and after the word SOLR ?


jame


highlighting fragsize

2011-07-25 Thread jame vaalet
hi
when u highlight and get back snippet fragments , can you over write the
default hl.regex.pattern through url .
can some quote an example url of that sort ?

what if i make pass hl.slop=0 will this stop considering regex pattern at
all ?




-- 

-JAME


difference between shard and core in solr

2011-07-17 Thread jame vaalet
hi ,

i just wanna be clear in the concepts of core and shard ?

a single core is an index with same schema  , is this wat core really is ?

can a single core contain two separate indexes with different schema in it ?

Is a shard  refers to a collection of index in a single physical machine
?can a single core be presented in different shards ?






-- 

-JAME


Re: Start parameter messes with rows

2011-07-16 Thread jame vaalet
hi ,
i just wanna be clear in the concepts of core and shard ?
a single core is an index with same schema  , is this wat core really is ?
can a single core contain two separate indexes with different schema in it ?
Is a shard  refers to a collection of index in a single physical machine
?can a single core be presented in different shards ?



- JAME


performance variation with respect to the index size

2011-07-08 Thread jame vaalet
hi,

is there any performance degradation (response time etc ) if the index has
document content text stored in it  (stored=true)?

-JAME


Re: performance variation with respect to the index size

2011-07-08 Thread jame vaalet
i would prefer every setting to be in its default stage and compare the
result with stored = true and False .

2011/7/8 François Schiettecatte fschietteca...@gmail.com

 Hi

 I don't think that anyone has run such benchmarks, in fact this topic came
 up two weeks ago and I volunteered some time to do that because I have some
 spare time this week, so I am going to run some benchmarks this weekend and
 report back.

 The machine I have to do this a core i7 960, 24GB, 4TB of disk. I am going
 to run SOLR 3.3 under Tomcat 7.0.16. I have three databases I can use for
 this, icwsm-2009 (38.5GB compressed), cdip (24GB compressed), trec vlc2
 (31GB compressed). I could also use a copy of wikipedia. I have lots of user
 searches I can use (saved from Feedster days).

 I would like some input on a couple of things to make this test as
 real-world as possible. One is any optimizations I should set in
 solrconfig.xml, and the other are the heap/GC settings I should set for
 tomcat. Anything else?

 Cheers

 François

 On Jul 8, 2011, at 4:08 AM, jame vaalet wrote:

  hi,
 
  is there any performance degradation (response time etc ) if the index
 has
  document content text stored in it  (stored=true)?
 
  -JAME




-- 

-JAME


searching a subset of SOLR index

2011-07-05 Thread Jame Vaalet
Hi,
Let say, I have got 10^10 documents in an index with unique id being document 
id which is assigned to each of those from 1 to 10^10 .
Now I want to search a particular query string in a subset of these documents 
say ( document id 100 to 1000).

The question here is.. will SOLR able to search just in this set of documents 
rather than the entire index ? if yes what should be query to limit search into 
this subset ?

Regards,
JAME VAALET
Software Developer
EXT :8108
Capital IQ



RE: searching a subset of SOLR index

2011-07-05 Thread Jame Vaalet
Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ




RE: searching a subset of SOLR index

2011-07-05 Thread Jame Vaalet
I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ




RE: searching a subset of SOLR index

2011-07-05 Thread Jame Vaalet
But incase the website docs contribute around 50 % of the entire docs , why to 
recreate the indexes . don't you think its redundancy ?
Can two web apps (solr instances ) share a single index file to search on it 
without interfering each other 


Regards,
JAME VAALET
Software Developer 
EXT :8108
Capital IQ


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

From what you tell us, I guess a separate index for website docs would be the 
best. If you fear that request from the window service would cripple your web 
site performance, why not have a totally separated index on another server, 
and have your website documents index in both indexes ?

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 13:14
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

I have got two applications 

1. website
The website will enable any user to search the document repository , 
and the set they search on is known as website presentable
2. windows service 
The windows service will search on all the documents in the repository 
for fixed set of key words and store the found result in database.this set  
 is universal set of documents in the doc repository including the website 
presentable.


Website is a high prioritized app which should work smoothly without any 
interference , where as windows service should run all day long continuously 
without break to save result from incoming docs.
The problem here is website set is predefined and I don't want the windows 
service request to SOLR to slow down website request.

Suppose am segregating the website presentable docs index into a particular 
core and rest of them into different core will it solve the problem ?
I have also read about multiple ports for listening request from different apps 
, can this be used. 



Regards,
JAME VAALET


-Original Message-
From: Pierre GOSSE [mailto:pierre.go...@arisem.com] 
Sent: Tuesday, July 05, 2011 3:52 PM
To: solr-user@lucene.apache.org
Subject: RE: searching a subset of SOLR index

The limit will always be logical if you have all documents in the same index. 
But filters are very efficient when working with subset of your index, 
especially if you reuse the same filter for many queries since there is a cache.

If your subsets are always the same subsets, maybe your could use shards. But 
we would need to know more about what you intend to do, to point to an adequate 
solution.

Pierre

-Message d'origine-
De : Jame Vaalet [mailto:jvaa...@capitaliq.com] 
Envoyé : mardi 5 juillet 2011 11:10
À : solr-user@lucene.apache.org
Objet : RE: searching a subset of SOLR index

Thanks.
But does this range query just limit the universe logically or does it have any 
mechanism to limit this physically as well .Do we leverage time factor by using 
the range query ?

Regards,
JAME VAALET


-Original Message-
From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi 
Kant
Sent: Tuesday, July 05, 2011 2:26 PM
To: solr-user@lucene.apache.org
Subject: Re: searching a subset of SOLR index

Range query


On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote:
 Hi,
 Let say, I have got 10^10 documents in an index with unique id being document 
 id which is assigned to each of those from 1 to 10^10 .
 Now I want to search a particular query string in a subset of these documents 
 say ( document id 100 to 1000).

 The question here is.. will SOLR able to search just in this set of documents 
 rather than the entire index ? if yes what should be query to limit search 
 into this subset ?

 Regards,
 JAME VAALET
 Software Developer
 EXT :8108
 Capital IQ




what s the optimum size of SOLR indexes

2011-07-04 Thread Jame Vaalet
Hi,

What would be the maximum size of a single SOLR index file for resulting in 
optimum search time ?
In case I have got to index all the documents in my repository  (which is in TB 
size) what would be the ideal architecture to follow , distributed SOLR ?

Regards,
JAME VAALET
Software Developer
EXT :8108
Capital IQ