Using recency rord on /distrib

2009-09-24 Thread Pooja Verlani
Hi,
I have to put recency using recip and rord functions on an app using
/distrib requesthandler.
Can i put bf param in /distrib directly call the url like:
http://localhost:8983/solr/distrib/?q=cable

where in /distrib requesthandler bf is defined as:
str name=bf
recip(rord(last_sold_date),1,1000,1000)^0.7
 /str

I am not able to see the difference in the results with or without the bf
param defined.

Please share your views.

regards,
Pooja


Re: Can solr build on top of HBase

2009-09-24 Thread 梁景明
hi, thanks, and now i can index data from hbase to the solr server using
nutch core.
but the indexdata will be local storage,that 's what i worry about,to be too
large in local.

MountableHDFS i never use it ,i am not sure weather solr can write the index
into HDFS,i doubt it
can work without implements Writable in HDFS.

and i think the point is the reading and writing the indexfile in HDFS just
like it in local filesystem ,
can u make a new index file format witch can use in the HDFS, if it can ,i
think that will
be a great helpful to distrabuted index.

if solr built on top of lucene , will it be easy to implement the HDFS file
format?

2009/9/24 Amit Nithian anith...@gmail.com

 Would FUSE (http://wiki.apache.org/hadoop/MountableHDFS) be of use?
 I wonder if you could take the data from HBase and index it into a Lucene
 index stored on HDFS.

 2009/9/23 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

  can hbase be mounted on the filesystem? Solr can only read data from a
   filesystem
 
  On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com wrote:
   hi,  i use hbase and solr ,now i have a large data need to index ,it
  means
   solr-index  will be large,
   as the data increases,it will be more larger than now.
   so  solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i used it
  from
   api ,and point to my
   distrabuted hbase data storage,
   and if the index is too large ,will it be slow?
   thanks.
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 



Re: Can solr build on top of HBase

2009-09-24 Thread Grant Ingersoll
I don't think using HDFS or HBase will perform for this kind of thing  
at all.  If you are that large, you should look into distributing your  
index into shards and using Solr's distributed search capabilities.


-Grant

On Sep 24, 2009, at 3:25 AM, 梁景明 wrote:

hi, thanks, and now i can index data from hbase to the solr server  
using

nutch core.
but the indexdata will be local storage,that 's what i worry  
about,to be too

large in local.

MountableHDFS i never use it ,i am not sure weather solr can write  
the index

into HDFS,i doubt it
can work without implements Writable in HDFS.

and i think the point is the reading and writing the indexfile in  
HDFS just

like it in local filesystem ,
can u make a new index file format witch can use in the HDFS, if it  
can ,i

think that will
be a great helpful to distrabuted index.

if solr built on top of lucene , will it be easy to implement the  
HDFS file

format?

2009/9/24 Amit Nithian anith...@gmail.com


Would FUSE (http://wiki.apache.org/hadoop/MountableHDFS) be of use?
I wonder if you could take the data from HBase and index it into a  
Lucene

index stored on HDFS.

2009/9/23 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com


can hbase be mounted on the filesystem? Solr can only read data  
from a

filesystem

On Thu, Sep 24, 2009 at 7:27 AM, 梁景明 futur...@gmail.com  
wrote:
hi,  i use hbase and solr ,now i have a large data need to  
index ,it

means

solr-index  will be large,
as the data increases,it will be more larger than now.
so  solrconfig.xml 's dataDir/solrhome/data/dataDir ,can i  
used it

from

api ,and point to my
distrabuted hbase data storage,
and if the index is too large ,will it be slow?
thanks.





--
-
Noble Paul | Principal Engineer| AOL | http://aol.com





--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: define index at search time

2009-09-24 Thread DHast

No, I am talking about having multiple indexes, i want to send the index name
to the searcher so it will search that index, rather than use the one
defined in the schema/solrconfig.

nothing t do with multiple cores, i mean different indexes entirely with
completely different content.




Avlesh Singh wrote:
 
 Are you talking about multiple cores?
 
 Cheers
 Avlesh
 
 On Mon, Sep 21, 2009 at 9:15 PM, DHast hastings.recurs...@gmail.com
 wrote:
 

 is there a way i can actually tell solr which index i want it to search
 against with the query? I know it will cost a bit on performance, but it
 would be helpful
 i have many indexes and it would be nice to determine which one should be
 used by the user.
 thanks
 --
 View this message in context:
 http://www.nabble.com/define-index-at-search-time-tp25530378p25530378.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/define-index-at-search-time-tp25530378p25564438.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: define index at search time

2009-09-24 Thread DHast

well after looking at http://wiki.apache.org/solr/CoreAdmin
perhaps multiple cores is what i want, 

DHast wrote:
 
 No, I am talking about having multiple indexes, i want to send the index
 name to the searcher so it will search that index, rather than use the one
 defined in the schema/solrconfig.
 
 nothing t do with multiple cores, i mean different indexes entirely with
 completely different content.
 
 
 
 
 Avlesh Singh wrote:
 
 Are you talking about multiple cores?
 
 Cheers
 Avlesh
 
 On Mon, Sep 21, 2009 at 9:15 PM, DHast hastings.recurs...@gmail.com
 wrote:
 

 is there a way i can actually tell solr which index i want it to search
 against with the query? I know it will cost a bit on performance, but it
 would be helpful
 i have many indexes and it would be nice to determine which one should
 be
 used by the user.
 thanks
 --
 View this message in context:
 http://www.nabble.com/define-index-at-search-time-tp25530378p25530378.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/define-index-at-search-time-tp25530378p25564937.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Multivalue Field Cache

2009-09-24 Thread Grant Ingersoll

Have a look at UninvertedField.java.  I think that might help.

On Sep 23, 2009, at 2:35 PM, Amit Nithian wrote:

Are there any good implementations of a field cache that will return  
all
values of a multivalued field? I am in the process of writing one  
for my
immediate needs but I was wondering if if there is a complete  
implementation
of one that handles the different field types. If not, then I can  
continue

on with mine and donate back.
Thanks!
Amit


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Finding near duplicates which searching Documents

2009-09-24 Thread Grant Ingersoll


On Sep 23, 2009, at 2:55 PM, Jason Rutherglen wrote:


I think don't this handle near duplicates which would require some of
the methods mentioned recently on the Mahout list.


It's pluggable and I believe the TextProfileSignature is a fuzzy  
implementation in Solr that was brought over from Nutch.


Agree on the Mahout discussion, too, though: 
http://www.lucidimagination.com/search/document/9d7ad3a882e2a944/finding_the_similarity_of_documents_using_mahout_for_deduplication#b0321c0f25f835a0



On Wed, Sep 23, 2009 at 2:59 AM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
On Wed, Sep 23, 2009 at 3:14 PM, Ninad Raut hbase.user.ni...@gmail.com 
wrote:



Hi,
When we have news content crawled we face a problme of same  
content being
repeated in many documents.  We want to add a near duplicate  
document

filter
to detect such documents. Is there a way to do that in SOLR?



Look at http://wiki.apache.org/solr/Deduplication

--
Regards,
Shalin Shekhar Mangar.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Showcase: Facetted Search for Wine using Solr

2009-09-24 Thread marian.steinbach
Hello everybody!

The purpose of this mail is to say thank you to the creators of Solr
and to the community that supports it.

We released our first project using Solr several weeks ago, after
having tested Solr for several months.

The project I'm talking about is a product search for an online wine
shop (sorry, german user interface only):

   http://www.koelner-weinkeller.de/index.php?id=sortiment

Our client offers about 3000 different wines and other related products.

Before we introduced Solr, the products have been searched via
complicated and slow SQL statements, with all kinds problems related
to that. No full text indexing, no stemming etc.

We are happy to make use of several built-in features which solve
problems that bugged us: Facetted search, german accents and stemming
and synonyms beeing the most important ones.

The surrounding website is TYPO3 driven. We integrated Solr by
creating our own frontend plugin which talks to the Solr webservice
(and we're very happy about the PHP output type!).

I'd be glad about your comments.

Cheers,

Marian


Sorting/paging problem

2009-09-24 Thread Charlie Jackson
I've run into a strange issue with my Solr installation. I'm running
queries that are sorting by a DateField field but from time to time, I'm
seeing individual records very much out of order. What's more, they
appear on multiple pages of my result set. Let me give an example.
Starting with a basic query, I sort on the date that the document was
added to the index and see these rows on the first page (I'm just
showing the date field here):

 

docdate name=indexed_date2009-09-23T19:24:47.419Z/date/doc

docdate name=indexed_date2009-09-23T19:25:03.229Z/date/doc

docdate name=indexed_date2009-09-23T19:25:03.400Z/date/doc

docdate name=indexed_date2009-09-23T19:25:19.951/date/doc

docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc

 

Note how the last document's date jumps a bit. Not necessarily a
problem, but the next page looks this:

 

docdate name=indexed_date2009-09-23T19:26:16.022Z/date/doc

docdate name=indexed_date2009-09-23T19:26:32.547Z/date/doc

docdate name=indexed_date2009-09-23T19:27:45.470Z/date/doc

docdate name=indexed_date2009-09-23T19:27:45.592Z/date/doc

docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc

 

So, not only is the date sorting wrong, but the exact same document
shows up on the next page, also still out of date order. I've seen the
same document show up in 4-5 pages in some cases. It's always the last
record on the page, too. If I change the page size, the problem seems to
disappear for a while, but then starts up again later. Also, running the
same query/queries later on doesn't show the same behavior. 

 

Could it be some sort of page boundary issue with the cache? Has anyone
else run into a problem like this? I'm using the Sept 22 nightly build. 

 

- Charlie



Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Michael
Using a multicore approach, you could send a create a core named
'core3weeksold' pointing to '/datadirs/3weeksold'  command to a live Solr,
which would spin it up on the fly.  Then you query it, and maybe keep it
spun up until it's not queried for 60 seconds or something, then send a
remove core 'core3weeksold'  command.
See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler .

Michael

On Thu, Sep 24, 2009 at 12:31 AM, Silent Surfer silentsurfe...@yahoo.comwrote:

 Hi,

 Is there any way to dynamically point the Solr servers to an index/data
 directories at run time?

 We are generating 200 GB worth of index per day and we want to retain the
 index for approximately 1 month. So our idea is to keep the first 1 week of
 index available at anytime for the users i.e have set of Solr servers up and
 running and handle request to get the past 1 week of date.

 But when user tries to query data which is older than 7 days old, we want
 to dynamically point the existing Solr instances to the inactive/dormant
 indexes and get the results.

 The main intention is to limit the number of Solr Slave instances and there
 by limit the # of Servers required.

 If the index directory and Solr instances are tightly coupled, then most of
 the Solr instances are just up and running and may hardly used, as most of
 the users are mainly interested in past 1 week data and not beyond that.

 Any thoughts or any other approaches to tackle this would be greatly
 appreciated.

 Thanks,
 sS







Alphanumeric Wild Card Search Question

2009-09-24 Thread Carr, Adrian
Hello Solr Users,
I've tried to find the answer to this question, and have tried changing my 
configuration several times, but to no avail. I think someone on this list will 
know the answer.

Here's my question:
I have some products that I want to allow people to search for with wild cards. 
For example, if my product is YBM354, I'd like for users to be able to search 
on YBM*, YBM3*, YBM35* and for any of these searches to return that 
product. I've found that I can search for YBM* and get the product, just not 
the other combinations.

I found this: 
http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--td24369209.html,
 but adding preserveOriginal=1 doesn't seem to make a difference.

I found an example here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory
 that is close, but I want to do the opposite. The example is:
Super-Duper-XL500-42-AutoCoder! - 0:Super, 1:Duper, 2:XL, 
2:SuperDuperXL, 3:500 4:42, .

In this example, I want to be able to find this record by searching for XL5*.

I appreciate the help. Please let me know if there are any questions.

Thanks,
Adrian Carr




RE: Alphanumeric Wild Card Search Question

2009-09-24 Thread Ensdorf Ken
 Here's my question:
 I have some products that I want to allow people to search for with
 wild cards. For example, if my product is YBM354, I'd like for users to
 be able to search on YBM*, YBM3*, YBM35* and for any of these
 searches to return that product. I've found that I can search for
 YBM* and get the product, just not the other combinations.

Are you using WordDelimiterFilterFactory?  That would explain this behavior.

If so, do you need it - for the queries you describe you don't need that kind 
of tokenization.

Also, have you played with the analysis tool on the admin page, it is a great 
help in debugging things like this.

-Ken


download pre-release nightly solr 1.4

2009-09-24 Thread michael8

Hi,

I know Solr 1.4 is going to be released any day now pending Lucene 2.9
release.  Is there anywhere where one can download a pre-released nighly
build of Solr 1.4 just for getting familiar with new features (e.g. field
collapsing)?

Thanks,
Michael
-- 
View this message in context: 
http://www.nabble.com/download-pre-release-nightly-solr-1.4-tp25590281p25590281.html
Sent from the Solr - User mailing list archive at Nabble.com.



unsubcribe

2009-09-24 Thread Rafeek Raja
unsubcribe


Re: download pre-release nightly solr 1.4

2009-09-24 Thread Mark Miller
michael8 wrote:
 Hi,

 I know Solr 1.4 is going to be released any day now pending Lucene 2.9
 release.  Is there anywhere where one can download a pre-released nighly
 build of Solr 1.4 just for getting familiar with new features (e.g. field
 collapsing)?

 Thanks,
 Michael
   
You can download nightlies
here:http://people.apache.org/builds/lucene/solr/nightly/

field collapsing won't be in 1.4 though. You have to build from svn
after applying the patch for that.

-- 
- Mark

http://www.lucidimagination.com





Looking for suggestion of WordDelimiter filter config and 'ALMA awards'

2009-09-24 Thread michael8

Hi,

I have this situation that I believe is very common but was curious if
anyone knows the right way to go about solving it.  

I have a document with 'ALMA awards' in it.  However, when user searches for
'aLMA awards', it ends up with no results found.  However, when I search for
'alma awards' or 'ALMA awards', the right results came back as expected.  

I immediately went to solr/admin/analysis to see what is going on with
indexing of 'ALMA awards' and query parsing of 'aLMA awards', and looks like
WordDelimiter is the one causing the mismatched.  WordDelimiter, with
splitOnCaseChange=1, will turn my search query 'aLMA awards' into 'a' and
'LMA' and 'awards', which is exactly what splitOnCaseChange does.  In this
type of situation, is there a proper way to handle such a situation whereby
the user simply got the case wrong for the 1st letter, or maybe n letters? 
I like the benefits that WordDelimiter filter w/ splitOnCaseChange provides
me, but I am not sure what is the proper way to solve this situation without
compromising on the other benefits this filter provides.  I also tried
preserveOriginal=1, hoping that aLMA will be preserved and later on became
all lowercase alma via another filter, but with no luck.

P.S.: I am basically using the standard config for 'text' fieldtype for my
default search field. (solr 1.3)

Thanks,
Michael
-- 
View this message in context: 
http://www.nabble.com/Looking-for-suggestion-of-WordDelimiter-filter-config-and-%27ALMA-awards%27-tp25591381p25591381.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr highlighting doesn't respect quotes

2009-09-24 Thread Paul Tomblin
If I do a query for a couple of words in quotes, Solr correctly only returns
pages where those words appear exactly within the quotes.  But the
highlighting acts as if the words were given separately, and stems them and
everything.  For example, if I search for knee pain, it returns a document
that has the word knee pain, and doesn't return documents that have knee
and pain without other words between them.  However, with highlighting
turned on, the highlighted field will have knee, knees, pain and
pains highlighted even when they aren't next to each other.
For instance:
responselst name='responseHeader'int name='status'0/int
int name='QTime'45/int
lst name='params'str name='explainOther'/
str name='fl'*,score/str
str name='indent'on/str
str name='start'0/str
str name='q'knee pain/str
str name='hl.fl'text/str
str name='qt'standard/str
str name='wt'standard/str
str name='hl'on/str
str name='rows'10/str
str name='version'2.2/str
/lst
/lst

lst name='2:
http://news.prnewswire.com/DisplayReleaseContent.aspx?ACCT=ind_focus.storyamp;STORY=/www/story/09-24-2009/0005100306amp;EDATE=
'arr name='text'strI had one injection in each lt;emkneelt;/em and
my doctor said it could relieve my lt;emkneelt;/em lt;empainlt;/em
for up to six/str
/arr
/lst

-- 
http://www.linkedin.com/in/paultomblin


OutOfMemoryError due to auto-warming

2009-09-24 Thread didier deshommes
Hi there,
We are running solr and allocating  1GB to it and we keep having
OutOfMemoryErrors. We get messages like this:

Error during auto-warming of
key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.lt;initgt;(String.java:216)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at 
org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
at org.apache.lucene.search.Searcher.search(Searcher.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51)
at 
org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
at 
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And like this:
   Error during auto-warming of
key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError:
Java heap space

We've searched and one suggestion was to reduce the size of the
various caches that do sorting in solrconfig.xml
(http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html).
Does this solution generally work?  Can anyone think of any other
cause for this problem?

didier


RE: OutOfMemoryError due to auto-warming

2009-09-24 Thread Francis Yakin
You also can increase the JVM HeapSize if you have enough physical memory, like 
for example if you have 4GB physical, gives the JVM heapsize 2GB or 2.5GB. 

Francis

-Original Message-
From: didier deshommes [mailto:dfdes...@gmail.com] 
Sent: Thursday, September 24, 2009 3:32 PM
To: solr-user@lucene.apache.org
Cc: Andrew Montalenti
Subject: OutOfMemoryError due to auto-warming

Hi there,
We are running solr and allocating  1GB to it and we keep having
OutOfMemoryErrors. We get messages like this:

Error during auto-warming of
key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.lt;initgt;(String.java:216)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at 
org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
at org.apache.lucene.search.Searcher.search(Searcher.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51)
at 
org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
at 
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And like this:
   Error during auto-warming of
key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError:
Java heap space

We've searched and one suggestion was to reduce the size of the
various caches that do sorting in solrconfig.xml
(http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html).
Does this solution generally work?  Can anyone think of any other
cause for this problem?

didier


Re: OutOfMemoryError due to auto-warming

2009-09-24 Thread didier deshommes
On Thu, Sep 24, 2009 at 5:40 PM, Francis Yakin fya...@liquid.com wrote:
 You also can increase the JVM HeapSize if you have enough physical memory, 
 like for example if you have 4GB physical, gives the JVM heapsize 2GB or 
 2.5GB.

Thanks,
we can definitely do that (we have 4GB available). I also forgot to
add that we're running a development version of solr (git clone from ~
3 weeks ago).

Thanks,
didier


 Francis

 -Original Message-
 From: didier deshommes [mailto:dfdes...@gmail.com]
 Sent: Thursday, September 24, 2009 3:32 PM
 To: solr-user@lucene.apache.org
 Cc: Andrew Montalenti
 Subject: OutOfMemoryError due to auto-warming

 Hi there,
 We are running solr and allocating  1GB to it and we keep having
 OutOfMemoryErrors. We get messages like this:

 Error during auto-warming of
 key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError:
 Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.lt;initgt;(String.java:216)
        at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
        at 
 org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
        at 
 org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
        at 
 org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
        at 
 org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
        at 
 org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181)
        at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
        at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
        at org.apache.lucene.search.Searcher.search(Searcher.java:173)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
        at 
 org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51)
        at 
 org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332)
        at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
        at 
 org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481)
        at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

 And like this:
   Error during auto-warming of
 key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError:
 Java heap space

 We've searched and one suggestion was to reduce the size of the
 various caches that do sorting in solrconfig.xml
 (http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html).
 Does this solution generally work?  Can anyone think of any other
 cause for this problem?

 didier



RE: OutOfMemoryError due to auto-warming

2009-09-24 Thread Francis Yakin
 
I reduced the size of queryResultCache in solrconfig seems to fix the issue as 
well.

!-- Maximum number of documents to cache for any entry in the
 queryResultCache. --
queryResultMaxDocsCached200/queryResultMaxDocsCached


From 500

!-- Maximum number of documents to cache for any entry in the
 queryResultCache. --
queryResultMaxDocsCached500/queryResultMaxDocsCached

Francis

-Original Message-
From: didier deshommes [mailto:dfdes...@gmail.com] 
Sent: Thursday, September 24, 2009 3:32 PM
To: solr-user@lucene.apache.org
Cc: Andrew Montalenti
Subject: OutOfMemoryError due to auto-warming

Hi there,
We are running solr and allocating  1GB to it and we keep having
OutOfMemoryErrors. We get messages like this:

Error during auto-warming of
key:org.apache.solr.search.queryresult...@c785194d:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.lt;initgt;(String.java:216)
at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:122)
at 
org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:169)
at 
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:701)
at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:208)
at 
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:676)
at 
org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:94)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:252)
at org.apache.lucene.search.Searcher.search(Searcher.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
at 
org.apache.solr.search.SolrIndexSearcher.access$000(SolrIndexSearcher.java:51)
at 
org.apache.solr.search.SolrIndexSearcher$3.regenerateItem(SolrIndexSearcher.java:332)
at org.apache.solr.search.LRUCache.warm(LRUCache.java:194)
at 
org.apache.solr.search.SolrIndexSearcher.warm(SolrIndexSearcher.java:1481)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1154)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)

And like this:
   Error during auto-warming of
key:org.apache.solr.search.queryresult...@33cf792:java.lang.OutOfMemoryError:
Java heap space

We've searched and one suggestion was to reduce the size of the
various caches that do sorting in solrconfig.xml
(http://osdir.com/ml/solr-user.lucene.apache.org/2009-05/msg01043.html).
Does this solution generally work?  Can anyone think of any other
cause for this problem?

didier


Re: Solr highlighting doesn't respect quotes

2009-09-24 Thread Koji Sekiguchi

Set hl.usePhraseHighlighter parameter to true:

http://wiki.apache.org/solr/HighlightingParameters#hl.usePhraseHighlighter

Koji

Paul Tomblin wrote:

If I do a query for a couple of words in quotes, Solr correctly only returns
pages where those words appear exactly within the quotes.  But the
highlighting acts as if the words were given separately, and stems them and
everything.  For example, if I search for knee pain, it returns a document
that has the word knee pain, and doesn't return documents that have knee
and pain without other words between them.  However, with highlighting
turned on, the highlighted field will have knee, knees, pain and
pains highlighted even when they aren't next to each other.
For instance:
responselst name='responseHeader'int name='status'0/int
int name='QTime'45/int
lst name='params'str name='explainOther'/
str name='fl'*,score/str
str name='indent'on/str
str name='start'0/str
str name='q'knee pain/str
str name='hl.fl'text/str
str name='qt'standard/str
str name='wt'standard/str
str name='hl'on/str
str name='rows'10/str
str name='version'2.2/str
/lst
/lst

lst name='2:
http://news.prnewswire.com/DisplayReleaseContent.aspx?ACCT=ind_focus.storyamp;STORY=/www/story/09-24-2009/0005100306amp;EDATE=
'arr name='text'strI had one injection in each lt;emkneelt;/em and
my doctor said it could relieve my lt;emkneelt;/em lt;empainlt;/em
for up to six/str
/arr
/lst

  




Re: Seattle / PNW Hadoop/Lucene/HBase Meetup, Wed Sep 30th

2009-09-24 Thread Bradford Stephens
Friendly Reminder! One week to go.

On Mon, Sep 14, 2009 at 11:35 AM, Bradford Stephens 
bradfordsteph...@gmail.com wrote:

 Greetings,

 It's time for another Hadoop/Lucene/ApacheCloud  Stack meetup!
 This month it'll be on Wednesday, the 30th, at 6:45 pm.

 We should have a few interesting guests this time around -- someone from
 Facebook may be stopping by to talk about Hive :)

 We've had great attendance in the past few months, let's keep it up! I'm
 always
 amazed by the things I learn from everyone.

 We're back at the University of Washington, Allen Computer Science
 Center (not Computer Engineering)
 Map: http://www.washington.edu/home/maps/?CSE

 Room: 303 -or- the Entry level. If there are changes, signs will be posted.

 More Info:

 The meetup is about 2 hours (and there's usually food): we'll have two
 in-depth talks of 15-20
 minutes each, and then several lightning talks of 5 minutes. If no
 one offers, We'll then have discussion and 'social time'.  we'll just
 have general discussion. Let net know if you're interested in speaking
 or attending. We'd like to focus on education, so every presentation
 *needs* to ask some questions at the end. We can talk about these
 after the presentations, and I'll record what we've learned in a wiki
 and share that with the rest of us.

 Contact: Bradford Stephens, 904-415-3009, bradfordsteph...@gmail.com

 Cheers,
 Bradford
 --
 http://www.roadtofailure.com -- The Fringes of Scalability, Social
 Media, and Computer Science




-- 
http://www.roadtofailure.com -- The Fringes of Scalability, Social Media,
and Computer Science


Use cases for ReplicationHandler's backup facility?

2009-09-24 Thread Chris Harris
The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication)
has support for backups, which can be triggered in one of two ways:

1. in response to startup/commit/optimize events (specified through
the backupAfter tag specified in the handler's requestHandler tag in
solrconfig.xml)
2. by manually hitting http://master_host:port/solr/replication?command=backup

These backups get placed in directories named, e.g.
snapshot.20090924033521, inside the solr data directory.

According to the docs, these backups are not necessary for replication
to work. My question is: What use case *are* they meant to address?

The first potential use case that came to mind was that maybe I would
be able to restore my index from these snapshot directories should it
ever become corrupted. (I could just do something like rm -r data; mv
snapshot.20090924033521 data.) That appears not to be one of the
intended use cases, though; if it were, then I imagine the snapshot
directories would contain the entire index, whereas they seem to
contain only deltas of one form or another.

Thanks,
Chris


Re: Solrj possible deadlock

2009-09-24 Thread pof

Well, in the same processes I am using a jdbc connection to get all the
relative paths to the documents I want to index, then I parse the documents
to plain text using tones of open source libraries like POI, PFDBox
etc.(which might account for java2d) then I add them to the index and commit
every 2000 documents.

I write a db row for each row I index so I can resume where I left off after
a crash or exception. My app will happily index for hours then after it
hangs, the resumed indexing run will only last a few additional minutes! The
thread dumps look the same.

Cheers.


ryantxu wrote:
 
 do you have anything custom going on?
 
 The fact that the lock is in java2d seems suspicious...
 
 
 On Sep 23, 2009, at 7:01 PM, pof wrote:
 

 I had the same problem again yesterday except the process halted  
 after about
 20mins this time.


 pof wrote:

 Hello, I was running a batch index the other day using the Solrj
 EmbeddedSolrServer when the process abruptly froze in it's tracks  
 after
 running for about 4-5 hours and indexing ~400K documents. There  
 were no
 document locks so it would seem likely that there was some kind of  
 thread
 deadlock. I was hoping someone might be able to tell me some  
 information
 about the following thread dump taken at the time:

 Full thread dump OpenJDK Client VM (1.6.0-b09 mixed mode):

 DestroyJavaVM prio=10 tid=0x9322a800 nid=0xcef waiting on condition
 [0x..0x0018a044]
   java.lang.Thread.State: RUNNABLE

 Java2D Disposer daemon prio=10 tid=0x0a28cc00 nid=0xf1c in  
 Object.wait()
 [0x0311d000..0x0311def4]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x97a96840 (a java.lang.ref.ReferenceQueue 
 $Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
 133)
- locked 0x97a96840 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
 149)
at sun.java2d.Disposer.run(Disposer.java:143)
at java.lang.Thread.run(Thread.java:636)

 pool-1-thread-1 prio=10 tid=0x93a26c00 nid=0xcf7 waiting on  
 condition
 [0x08a6a000..0x08a6b074]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0x967acfd0 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer 
 $ConditionObject)
at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
 java.util.concurrent.locks.AbstractQueuedSynchronizer 
 $ConditionObject.await(AbstractQueuedSynchronizer.java:1978)
at
 java 
 .util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java: 
 386)
at
 java 
 .util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java: 
 1043)
at
 java 
 .util 
 .concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java: 
 1103)
at
 java.util.concurrent.ThreadPoolExecutor 
 $Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

 Low Memory Detector daemon prio=10 tid=0x93a00c00 nid=0xcf5  
 runnable
 [0x..0x]
   java.lang.Thread.State: RUNNABLE

 CompilerThread0 daemon prio=10 tid=0x09fe9800 nid=0xcf4 waiting on
 condition [0x..0x096a7af4]
   java.lang.Thread.State: RUNNABLE

 Signal Dispatcher daemon prio=10 tid=0x09fe8800 nid=0xcf3 waiting  
 on
 condition [0x..0x]
   java.lang.Thread.State: RUNNABLE

 Finalizer daemon prio=10 tid=0x09fd7000 nid=0xcf2 in Object.wait()
 [0x005ca000..0x005caef4]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x966e6d40 (a java.lang.ref.ReferenceQueue 
 $Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
 133)
- locked 0x966e6d40 (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java: 
 149)
at java.lang.ref.Finalizer 
 $FinalizerThread.run(Finalizer.java:177)

 Reference Handler daemon prio=10 tid=0x09fd2c00 nid=0xcf1 in
 Object.wait() [0x00579000..0x00579d74]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x966e6dc8 (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:502)
at
 java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked 0x966e6dc8 (a java.lang.ref.Reference$Lock)

 VM Thread prio=10 tid=0x09fcf800 nid=0xcf0 runnable

 VM Periodic Task Thread prio=10 tid=0x93a02400 nid=0xcf6 waiting on
 condition

 JNI global references: 1072

 Heap
 def new generation   total 36288K, used 23695K [0x93f1,  
 0x9667,
 0x9667)
  eden space 32256K,  73% used [0x93f1, 0x95633f60, 0x95e9)
  from space 4032K,   0% used [0x95e9, 0x95e9, 0x9628)
  to   space 4032K,   0% used [0x9628, 0x9628, 0x9667)
 tenured generation   total 483968K, used 72129K 

Re: Parallel requests to Tomcat

2009-09-24 Thread Lance Norskog
Are you on Java 5, 6 or 7? Each release sees some tweaking of the Java
multithreading model as well as performance improvements (and bug
fixes) in the Sun HotSpot runtime.

You may be tripping over the TCP/IP multithreaded connection manager.
You might wish to create each client thread with a separate socket.

Also, here is a standard bit of benchmarking advice: include think
time. This means that instead of sending requests constantly, each
thread should time out for a few seconds before sending the next
request. This simulates a user stopping and thinking before clicking
the mouse again. This helps simulate the quantity of threads, etc.
which are stopped and waiting at each stage of the request pipeline.
As it is, you are trying to simulate the throughput behaviour without
simulating the horizontal volume. (Benchmarking is much harder than it
looks.)

On Wed, Sep 23, 2009 at 9:43 AM, Grant Ingersoll gsing...@apache.org wrote:

 On Sep 23, 2009, at 12:09 PM, Michael wrote:

 On Wed, Sep 23, 2009 at 12:05 PM, Yonik Seeley
 yo...@lucidimagination.comwrote:

 On Wed, Sep 23, 2009 at 11:47 AM, Michael solrco...@gmail.com wrote:

 If this were IO bound, wouldn't I see the same results when sending my 8
 requests to 8 Tomcats?  There's only one disk (well, RAM) whether I'm
 querying 8 processes or 8 threads in 1 process, right?

 Right - I was thinking IO bound at the Lucene Directory level - which
 synchronized in the past and led to poor concurrency.  Buy your Solr
 version is recent enough to use the newer unsynchronized method by
 default (on non-windows)


 Ah, OK.  So it looks like comparing to Jetty is my only next step.
  Although
 I'm not sure what I'm going to do based on the result of that test -- if
 Jetty behaves differently, then I still don't know why the heck Tomcat is
 behaving badly! :)


 Have you done any profiling to see where hotspots are?  Have you looked at
 garbage collection?  Do you have any full collections occurring?  What
 garbage collector are you using?  How often are you updating/committing,
 etc?


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search





-- 
Lance Norskog
goks...@gmail.com


Re: Very big numbers

2009-09-24 Thread Lance Norskog
There is no bignum support in Solr at this time.

You can pick a fixed-length string with leading zeros. That is, if
your other strings are the same length as the above.
99,999,999,999,999.99
00,000,999,999,999.99

You can do sorted queries, range queries, and facets from this format.
Solr is generally not a math engine so you won't miss much.

On Wed, Sep 23, 2009 at 1:56 PM, Jonathan Ariel ionat...@gmail.com wrote:
 Hi! I need to index in solr very big numbers. Something like
 99,999,999,999,999.99
 Right now i'm using an sdouble field type because I need to make range
 queries on this field.
 The problem is that the field value is being returned in scientific
 notation. Is there any way to avoid that?

 Thanks!
 Jonathan




-- 
Lance Norskog
goks...@gmail.com


Re: Mixed field types and boolean searching

2009-09-24 Thread Lance Norskog
No- there are various analyzers. StandardAnalyzer is geared toward
searching bodies of text for interesting words -  punctuation is
ripped out. Other analyzers are more useful for concrete text. You
may have to work at finding one that leaves punctuation in.

On Wed, Sep 23, 2009 at 2:14 PM, Ensdorf Ken ensd...@zoominfo.com wrote:
 Hi-

 let's say you have two indexed fields, F1 and F2.  F1 uses the 
 StandardAnalyzer, while F2 doesn't.  Now imagine you index a document where 
 you have

 F1=A  B

 F2=C + D

 Now imagine you run a query:

 (F1:A OR F2:A) AND (F1:B OR F2:B)

 in other words, both A and B must exist in at least one of F1 or F2.  
 This  returns the document in question.  Now imagine you run another query:

 (F1:A OR F2:A) AND (F1: OR F2:)

 Since  is removed by the StandardAnalyzer, the parsed query looks like

 (F1:A OR F2:A) AND (F2:)

 Now you don't match the document.  Is this a bug?

 Thanks!
 -Ken





-- 
Lance Norskog
goks...@gmail.com


Re: Solr http post performance seems slow - help?

2009-09-24 Thread Lance Norskog
In top, press the '1' key. This will give a list of the CPUs and how
much load is on each. The display is otherwise a little weird for
multi-cpu machines. But don't be surprised when Solr is I/O bound. The
biggest fanciest RAID is often a better investment than CPUs. On one
project we bought low-end rack servers come with 6-8 disk bays,
filling them with 10k/15k RPM disks.

On Wed, Sep 23, 2009 at 2:47 PM, Dan A. Dickey dan.dic...@savvis.net wrote:
 On Friday 11 September 2009 11:06:20 am Dan A. Dickey wrote:
 ...
 Our JBoss expert and I will be looking into why this might be occurring.
 Does anyone know of any JBoss related slowness with Solr?
 And does anyone have any other sort of suggestions to speed indexing
 performance?   Thanks for your help all!  I'll keep you up to date with
 further progress.

 Ok, further progress... just to keep any interested parties up to date
 and for the record...

 I'm finding that using the example jetty setup (will be switching very
 very soon to a real jetty installation) is about the fastest.  Using
 several processes to send posts to Solr helps a lot, and we're seeing
 about 80 posts a second this way.

 We also stripped down JBoss to the bare bones and the Solr in it
 is running nearly as fast - about 50 posts a second.  It was our previous
 JBoss configuration that was making it appear slow for some reason.

 We will be running more tests and spreading out the pre-index workload
 across more machines and more processes. In our case we were seeing
 the bottleneck being one machine running 18 processes.
 The 2 quad core xeon system is experiencing about a 25% cpu load.
 And I'm not certain, but I think this may be actually 25% of one of the 8 
 cores.
 So, there's *lots* of room for Solr to be doing more work there.
        -Dan

 --
 Dan A. Dickey | Senior Software Engineer

 Savvis
 10900 Hampshire Ave. S., Bloomington, MN  55438
 Office: 952.852.4803 | Fax: 952.852.4951
 E-mail: dan.dic...@savvis.net




-- 
Lance Norskog
goks...@gmail.com


Re: solr caching problem

2009-09-24 Thread Lance Norskog
There are now two excellent books: Lucene In Action 2 and Solr 1.4
Enterprise Search Server the describe the inners workings of these
technologies and how they fit together.

Otherwise Solr and Lucene knowledge are only available in a fragmented
form across many wiki pages, bug reports and email discussions.

But the direct answer is: before you commit your changes you will not
seem them in queries. When you commit them, all caches are thrown away
and rebuilt when you do the same queries you did before. This
rebuilding process has various tools to control it in solrconfig.xml.

On Wed, Sep 23, 2009 at 8:27 PM, satya tosatyaj...@gmail.com wrote:
 Is there any way to analyze or see that which documents are getting cached
 by documentCache -

  documentCache
     class=solr.LRUCache
     size=512
     initialSize=512
     autowarmCount=0/



 On Wed, Sep 23, 2009 at 8:10 AM, satya tosatyaj...@gmail.com wrote:

 First of all , thanks a lot for the clarification.Is there any way to see,
 how this cache is working internally and what are the objects being stored
 and how much memory its consuming,so that we can get a clear picture in
 mind.And how to test the performance through cache.


 On Tue, Sep 22, 2009 at 11:19 PM, Fuad Efendi f...@efendi.ca wrote:

  1)Then do you mean , if we delete a perticular doc ,then that is going
 to
 be
  deleted from
    cache also.

 When you delete document, and then COMMIT your changes, new caches will be
 warmed up (and prepopulated by some key-value pairs from old instances),
 etc:

  !-- documentCache caches Lucene Document objects (the stored fields for
 each document).
       Since Lucene internal document ids are transient, this cache will
 not
 be autowarmed.  --
    documentCache
      class=solr.LRUCache
      size=512
      initialSize=512
      autowarmCount=0/

 - this one won't be 'prepopulated'.




  2)In solr,is  cache storing the entire document in memory or only the
  references to
     documents in memory.

 There are many different cache instances, DocumentCache should store ID,
 Document pairs, etc








-- 
Lance Norskog
goks...@gmail.com


Re: Solrj possible deadlock

2009-09-24 Thread Chris Hostetter

: Well, in the same processes I am using a jdbc connection to get all the
: relative paths to the documents I want to index, then I parse the documents
: to plain text using tones of open source libraries like POI, PFDBox
: etc.(which might account for java2d) then I add them to the index and commit
: every 2000 documents.

Since nothing in your threaddumps refers to solrj or solr (either as the 
current method, or in the call stack suggesting that it's code called by 
Solr(J)), there's really no indication that the problem is even remotely 
solr related.

i suspect that if you commented out all of hte code where you use SolrJ so 
that you still did all of the parsing and then just wrote the resulting 
data to /dev/null you would probably still see this behavior -- perhaps 
one of the other libraries you are using has some semantics you aren't 
obeying (ie: a parser that must be used single threaded, or an object that 
must be closed so that it can reset some static state) that is acusing 
this problem only after some time has elapsed (or on particular 
permutations of data)



-Hoss



Re: Sorting/paging problem

2009-09-24 Thread Lance Norskog
Which version of Java are you using?

Please try the standard tricks:
Do a fresh checkout of the Solr trunk.
Do 'ant clean dist' and use the newly built war  latest lucene libraries.
Try changing the JVM startup parameters which control how incremental
compilation works: -server and others. Also try changing the garbage
collection algorithms.

On Thu, Sep 24, 2009 at 9:49 AM, Charlie Jackson
charlie.jack...@cision.com wrote:
 I've run into a strange issue with my Solr installation. I'm running
 queries that are sorting by a DateField field but from time to time, I'm
 seeing individual records very much out of order. What's more, they
 appear on multiple pages of my result set. Let me give an example.
 Starting with a basic query, I sort on the date that the document was
 added to the index and see these rows on the first page (I'm just
 showing the date field here):



 docdate name=indexed_date2009-09-23T19:24:47.419Z/date/doc

 docdate name=indexed_date2009-09-23T19:25:03.229Z/date/doc

 docdate name=indexed_date2009-09-23T19:25:03.400Z/date/doc

 docdate name=indexed_date2009-09-23T19:25:19.951/date/doc

 docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc



 Note how the last document's date jumps a bit. Not necessarily a
 problem, but the next page looks this:



 docdate name=indexed_date2009-09-23T19:26:16.022Z/date/doc

 docdate name=indexed_date2009-09-23T19:26:32.547Z/date/doc

 docdate name=indexed_date2009-09-23T19:27:45.470Z/date/doc

 docdate name=indexed_date2009-09-23T19:27:45.592Z/date/doc

 docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc



 So, not only is the date sorting wrong, but the exact same document
 shows up on the next page, also still out of date order. I've seen the
 same document show up in 4-5 pages in some cases. It's always the last
 record on the page, too. If I change the page size, the problem seems to
 disappear for a while, but then starts up again later. Also, running the
 same query/queries later on doesn't show the same behavior.



 Could it be some sort of page boundary issue with the cache? Has anyone
 else run into a problem like this? I'm using the Sept 22 nightly build.



 - Charlie





-- 
Lance Norskog
goks...@gmail.com


Re: Showcase: Facetted Search for Wine using Solr

2009-09-24 Thread Grant Ingersoll

Hi Marian,

Looks great!  Wish I could order some wine.  When you get a chance,  
please add the site to http://wiki.apache.org/solr/PublicServers!


Cheers,
Grant

On Sep 24, 2009, at 11:51 AM, marian.steinbach wrote:


Hello everybody!

The purpose of this mail is to say thank you to the creators of Solr
and to the community that supports it.

We released our first project using Solr several weeks ago, after
having tested Solr for several months.

The project I'm talking about is a product search for an online wine
shop (sorry, german user interface only):

  http://www.koelner-weinkeller.de/index.php?id=sortiment

Our client offers about 3000 different wines and other related  
products.


Before we introduced Solr, the products have been searched via
complicated and slow SQL statements, with all kinds problems related
to that. No full text indexing, no stemming etc.

We are happy to make use of several built-in features which solve
problems that bugged us: Facetted search, german accents and stemming
and synonyms beeing the most important ones.

The surrounding website is TYPO3 driven. We integrated Solr by
creating our own frontend plugin which talks to the Solr webservice
(and we're very happy about the PHP output type!).

I'd be glad about your comments.

Cheers,

Marian


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Chris Hostetter
: Using a multicore approach, you could send a create a core named
: 'core3weeksold' pointing to '/datadirs/3weeksold'  command to a live Solr,
: which would spin it up on the fly.  Then you query it, and maybe keep it
: spun up until it's not queried for 60 seconds or something, then send a
: remove core 'core3weeksold'  command.
: See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler .

something that seems implicit in the question is what to do when the 
request spans all of the data ... this is where (in theory) distributed 
searching could help you out.

index each days worth of data into it's own core, that makes it really 
easy to expire the old data (just UNLOAD and delete an entire core once 
it's more then 30 days old) if your user is only searching current dta 
then your app can directly query the core containing the most current data 
-- but if they want to query the last week, or last two weeks worth of 
data, you do a distributed request for all of the shards needed to search 
the appropriate amount of data.

Between the ALIAS and SWAP commands it on the CoreAdmin screen it should 
be pretty easy have cores with names like today,1dayold,2dayold so 
that your app can configure simple shard params for all the perumations 
you'll need to query.


-Hoss



Re: Use cases for ReplicationHandler's backup facility?

2009-09-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Fri, Sep 25, 2009 at 4:57 AM, Chris Harris rygu...@gmail.com wrote:
 The ReplicationHandler (http://wiki.apache.org/solr/SolrReplication)
 has support for backups, which can be triggered in one of two ways:

 1. in response to startup/commit/optimize events (specified through
 the backupAfter tag specified in the handler's requestHandler tag in
 solrconfig.xml)
 2. by manually hitting http://master_host:port/solr/replication?command=backup

 These backups get placed in directories named, e.g.
 snapshot.20090924033521, inside the solr data directory.

 According to the docs, these backups are not necessary for replication
 to work. My question is: What use case *are* they meant to address?

 The first potential use case that came to mind was that maybe I would
 be able to restore my index from these snapshot directories should it
 ever become corrupted. (I could just do something like rm -r data; mv
 snapshot.20090924033521 data.) That appears not to be one of the
 intended use cases, though; if it were, then I imagine the snapshot
 directories would contain the entire index, whereas they seem to
 contain only deltas of one form or another.
Yes, the only reason to take a backup should be for restoration/archival
They should contain all the files required for the latest commit point.



 Thanks,
 Chris




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Can we point a Solr server to index directory dynamically at runtime..

2009-09-24 Thread Silent Surfer
Hi,

Thank you Michael and Chris for the response. 

Today after the mail from Michael, we tested with the dynamic loading of cores 
and it worked well. So we need to go with the hybrid approach of Multicore and 
Distributed searching.

As per our testing, we found that a Solr instance with 20 GB of index(single 
index or spread across multiple cores) can provide better performance when 
compared to having a Solr instance say 40 (or) 50 GB of index (single index or 
index spread across cores).

So the 200 GB of index on day 1 will be spread across 200/20=10 Solr salve 
instances.

On day 2 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*2/20=20
...
..
..
On day 30 data, 10 more Solr slave servers are required; Cumulative Solr Slave 
instances = 200*30/20=300

So with the above approach, we may need ~300 Solr slave instances, which 
becomes very unmanageable.

But we know that most of the queries is for the past 1 week, i.e we definitely 
need 70 Solr Slaves containing the last 7 days worth of data up and running.

Now for the rest of the 230 Solr instances, do we need to keep it running for 
the odd query,that can span across the 30 days of data (30*200 GB=6 TB data) 
which can come up only a couple of times a day.
This linear increase of Solr servers with the retention period doesn't seems to 
be a very scalable solution. 

So we are looking for something more simpler approach to handle this scenario. 

Appreciate any further inputs/suggestions.

Regards,
sS

--- On Fri, 9/25/09, Chris Hostetter hossman_luc...@fucit.org wrote:

 From: Chris Hostetter hossman_luc...@fucit.org
 Subject: Re: Can we point a Solr server to index directory dynamically at  
 runtime..
 To: solr-user@lucene.apache.org
 Date: Friday, September 25, 2009, 4:04 AM
 : Using a multicore approach, you
 could send a create a core named
 : 'core3weeksold' pointing to '/datadirs/3weeksold' 
 command to a live Solr,
 : which would spin it up on the fly.  Then you query
 it, and maybe keep it
 : spun up until it's not queried for 60 seconds or
 something, then send a
 : remove core 'core3weeksold'  command.
 : See http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
 .
 
 something that seems implicit in the question is what to do
 when the 
 request spans all of the data ... this is where (in theory)
 distributed 
 searching could help you out.
 
 index each days worth of data into it's own core, that
 makes it really 
 easy to expire the old data (just UNLOAD and delete an
 entire core once 
 it's more then 30 days old) if your user is only searching
 current dta 
 then your app can directly query the core containing the
 most current data 
 -- but if they want to query the last week, or last two
 weeks worth of 
 data, you do a distributed request for all of the shards
 needed to search 
 the appropriate amount of data.
 
 Between the ALIAS and SWAP commands it on the CoreAdmin
 screen it should 
 be pretty easy have cores with names like
 today,1dayold,2dayold so 
 that your app can configure simple shard params for all the
 perumations 
 you'll need to query.
 
 
 -Hoss