subject:"Index"

solr index size

2009-04-03 Thread Jun Rao



Hi,

We built a Solr index on a set of documents a few times. Each time, we did
an optimize to reduce the index to a single segment. The index sizes are
slightly different across different runs. Even though the documents are not
inserted in the same order across runs, it seems to me that the final
optimized index should be identical. Running CheckIndex  showed that the
number of docs and fields are the same, but the number of terms are
slightly different. Does anyone know how to explain this? Thanks,

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

jun...@almaden.ibm.com

Index Version Number

2009-04-10 Thread Richard Wiseman

Is it possible for a Solr client to determine if the index has changed 
since the last time it performed a query?  For example, is it possible 
to query the current Lucene indexVersion?


Thanks in advance for your help,
Richard

Re: Solr index

2009-04-24 Thread Otis Gospodnetic


Hi,

Solr doesn't include such functionality.  But Nutch has:

[o...@localhost src]$ ff \*Signature\*java
./test/org/apache/nutch/crawl/TestSignatureFactory.java
./java/org/apache/nutch/crawl/SignatureFactory.java
./java/org/apache/nutch/crawl/MD5Signature.java
./java/org/apache/nutch/crawl/Signature.java
./java/org/apache/nutch/crawl/TextProfileSignature.java
./java/org/apache/nutch/crawl/SignatureComparator.java

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: aidahaj 
> To: solr-user@lucene.apache.org
> Sent: Friday, April 24, 2009 12:13:47 PM
> Subject: Solr index
> 
> 
> Hi ,
> I'm using Nutch to crawl a list of web sites.
> Solr is my index(Nutch-1.0 integration with solr).
> I'm working on detecting web site defacement(if there's any changes in the
> text of a web page).
> I want to know if solr may give me the possibility to detect the changes in
> the Documents in the indexe before commiting or a log file or something like
> that(the text that has been changed between two points of time ).
> I'm looking for your help. Thanks a lot.
> -- 
> View this message in context: 
> http://www.nabble.com/Solr-index-tp23219842p23219842.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr index

2009-04-24 Thread Shalin Shekhar Mangar

Solr 1.4 (trunk) has a similar functionality.

http://wiki.apache.org/solr/Deduplication

On Fri, Apr 24, 2009 at 9:53 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

>
> Hi,
>
> Solr doesn't include such functionality.  But Nutch has:
>
> [o...@localhost src]$ ff \*Signature\*java
> ./test/org/apache/nutch/crawl/TestSignatureFactory.java
> ./java/org/apache/nutch/crawl/SignatureFactory.java
> ./java/org/apache/nutch/crawl/MD5Signature.java
> ./java/org/apache/nutch/crawl/Signature.java
> ./java/org/apache/nutch/crawl/TextProfileSignature.java
> ./java/org/apache/nutch/crawl/SignatureComparator.java
>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: aidahaj 
> > To: solr-user@lucene.apache.org
> > Sent: Friday, April 24, 2009 12:13:47 PM
> > Subject: Solr index
> >
> >
> > Hi ,
> > I'm using Nutch to crawl a list of web sites.
> > Solr is my index(Nutch-1.0 integration with solr).
> > I'm working on detecting web site defacement(if there's any changes in
> the
> > text of a web page).
> > I want to know if solr may give me the possibility to detect the changes
> in
> > the Documents in the indexe before commiting or a log file or something
> like
> > that(the text that has been changed between two points of time ).
> > I'm looking for your help. Thanks a lot.
> > --
> > View this message in context:
> > http://www.nabble.com/Solr-index-tp23219842p23219842.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr index

2009-04-27 Thread aidahaj


Thanks a lot,
I have made a look in these classes.
But what I exactly want to do is to detect if a Document(in the index of
solr)has changed when I recrawl a site with Nutch.
Not to block deduplication, but to detect if a Document has changed and
extract changes in a file without writing them over the old Document.
After that I decide wether to rewrite the Document or to keep both of them
the old and the new one.
I wish I am more precise.
Thanks and permit my poor english.


-- 
View this message in context: 
http://www.nabble.com/Solr-index-tp23219842p23254601.html
Sent from the Solr - User mailing list archive at Nabble.com.

Multi-index Design

2009-05-05 Thread Chris Masters


Hi All,

I'm [still!] evaluating Solr and setting up a PoC. The requirements are to 
index the following objects:

 - people - name, status, date added, address, profile, other people specific 
fields like group...
 - organisations - name, status, date added, address, profile, other 
organisational specific fields like size...
 - products - name, status, date added, profile, other product specific fields 
like product groups..

AND...I need to isolate indexes to a number of dynamic domains (customerA, 
customerB...) that will grow over time.

So, my initial thoughts are to do the following:

 - flatten the searchable objects as much as I can - use a type field to 
distinguish - into a single index
 - use multi-core approach to segregate domains of data

So, a couple questions on this:

 1) Is this approach/design sensible and do others use it?

 2) By flattening the data we will only index common fields; is it unreasonable 
to do a second database search and union the results when doing advanced 
searches on non indexed fields? Do others do this?

 3) I've read that I can dynamically add a new core - this fits well with the 
ability to dynamically add new domains; how scaliable is this approach? Would 
it be unreasonable to have 20-30 dynaimically created cores? I guess, 
redundancy aside and given our one core per domain approach, we could easily 
spill onto other physical servers without the need for replication? 

Thanks again for your help!
rotis

Splitting the index

2009-05-19 Thread RaghavPrabhu


Hi all,

Do you have any idea about to split the index in to different data
directory? If so, kindly let me know please.. 

Thanks & regards
Prabhu.K
-- 
View this message in context: 
http://www.nabble.com/Splitting-the-index-tp23613882p23613882.html
Sent from the Solr - User mailing list archive at Nabble.com.

Index size concerns

2009-05-25 Thread Muhammed Sameer


Salaam,

We are using apache-solr to index our files for faster searches, all things 
happen without a problem, my only concern is the size of the cache.

It seems that the trend is that the if I cache 1 GB of files the index goes to 
800MB ie we are seeing a 80% cache size.

Is this normal or am I missing something in the configuration of solr

Thanks and regards,
Muhammed Sameer

Locked Index files

2010-07-13 Thread ZAROGKIKAS,GIORGOS

Hi 
My solr Index files are locked and I can’t index anything 
How can I remove the lock file ?
I can’t delete it

index pdf files

2010-08-12 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]

I wrote a simple java program to import a pdf file. I can get a result when I 
do search *:* from admin page. I get nothing if I search a word. I wonder if I 
did something wrong or miss set something. 

Here is part of result I get when do *:* search:
*
- 
- 
  Hristovski D 
  
- 
  application/pdf 
  
- 
  microarray analysis, literature-based discovery, semantic predications, 
natural language processing 
  
- 
  Thu Aug 12 10:58:37 EDT 2010 
  
- 
  Combining Semantic Relations and DNA Microarray Data for Novel 
Hypotheses Generation Combining Semantic Relations and DNA Microarray Data for 
Novel Hypotheses Generation Dimitar Hristovski, PhD,1 Andrej 
Kastrin,2...
*
Please help me out if anyone has experience with pdf files. I really appreciate 
it!

Thanks so much,

Index time boosting

2010-09-03 Thread phoey


Hi there,

Im having some issues with my relevancy of some results. I have 5 fields,
with varying boost values and being copied into a copyfield "text" which is
used to be searched on 










...  


im sending each of these fields with the boost values (i_title is 20,
i_authors is 10 ... i_description is 1)

now the issue is that some items which the query matches in the description
field (which is boosted at 1) is scored higher than result for where the
query is matched in the authors field (which is boosted at 10).

this leaves me to believe that the index time boost isn't working correctly.
Have i omitted norms correctly?

the score for one result which is matched by the author field is nearly the
same as the score for one result which is matched by the description field
so doubtful the boost is kept and is normalized when in the copyfield.

Unfortunately i cannot switch to dismax parser at this late stage so cannot
do query time boosting unless theres another way of doing it on the standard
parser.

thanks
joe

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-time-boosting-tp1411105p1411105.html
Sent from the Solr - User mailing list archive at Nabble.com.

Index with ItalianStemmer

2010-09-03 Thread Tommaso Teofili

Hi all,
I am experiencing a strange behavior while indexing italian text (an indexed
not stored text field) when stemming with italian language:



  





 generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>

 

 protected="protwords.txt"/>

  


if I try to index the text field with the value:
"mi voglio documentare su Solr e sulla sua storia" (which means "I want to
study Solr and its history")
my search for "q=text:documentare" or for  "q=text:documento" turns out
nothing.
The biggest issue is that the first one, which was intended to work both if
stemming was and was not enabled, does not match any document

If I change the stemmer language to English and then reindex, the first of
the queries above succeeds as expected because no stemming is applied.

Does anyone know what could be the root cause or if I am missing something?
Thanks in advance for any help,
Tommaso

Index update issue

2010-09-16 Thread maggie chen


Dear All,

I use Solr in Rails. I added a new item, the index number update took a long
time (one hour).
For example, now the index is "97" and add a new item, the index will become
"98" in one hour.
I checked all of Solr config files, but I couldn't find the setting about
that.
I comment out 

 1
 1000


in solrconfig.xml 

But...

Thanks




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Index-update-issue-tp1487956p1487956.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Optimize Index

2010-11-04 Thread Shawn Heisey


On 11/4/2010 7:22 AM, stockiii wrote:

how can i start an optimize by using DIH, but NOT after an delta- or
full-import ?


I'm not aware of a way to do this with DIH, though there might be 
something I'm not aware of.  You can do it with an HTTP POST.  Here's 
how to do it with curl:


/usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \
-H "Content-Type: text/xml" \
--data-binary ''

Shawn

Re: Optimize Index

2010-11-04 Thread Rich Cariens

For what it's worth, the Solr class instructor at the Lucene Revolution
conference recommended *against* optimizing, and instead suggested to just
let the merge factor do it's job.

On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey  wrote:

> On 11/4/2010 7:22 AM, stockiii wrote:
>
>> how can i start an optimize by using DIH, but NOT after an delta- or
>> full-import ?
>>
>
> I'm not aware of a way to do this with DIH, though there might be something
> I'm not aware of.  You can do it with an HTTP POST.  Here's how to do it
> with curl:
>
> /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \
> -H "Content-Type: text/xml" \
> --data-binary ''
>
> Shawn
>
>

Re: Optimize Index

2010-11-04 Thread Markus Jelsma

Huh? That's something new for me. Optmize removed documents that have been 
flagged for deletion. For relevancy it's important those are removed because 
document frequencies are not updated for deletes.

Did i miss something?

> For what it's worth, the Solr class instructor at the Lucene Revolution
> conference recommended *against* optimizing, and instead suggested to just
> let the merge factor do it's job.
> 
> On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey  wrote:
> > On 11/4/2010 7:22 AM, stockiii wrote:
> >> how can i start an optimize by using DIH, but NOT after an delta- or
> >> full-import ?
> > 
> > I'm not aware of a way to do this with DIH, though there might be
> > something I'm not aware of.  You can do it with an HTTP POST.  Here's
> > how to do it with curl:
> > 
> > /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \
> > -H "Content-Type: text/xml" \
> > --data-binary ''
> > 
> > Shawn

Re: Optimize Index

2010-11-04 Thread Peter Karich


 what you can try maxSegments=2 or more as a 'partial' optimize:

"If the index is so large that optimizes are taking longer than desired 
or using more disk space during optimization than you can spare, 
consider adding the maxSegments parameter to the optimize command. In 
the XML message, this would be an attribute; the URL form and SolrJ have 
the corresponding option too. By default this parameter is 1 since an 
optimize results in a single Lucene "segment". By setting it larger than 
1 but less than the mergeFactor, you permit partial optimization to no 
more than this many segments. Of course the index won't be fully 
optimized and therefore searches will be slower. "


from http://wiki.apache.org/solr/PacktBook2009 (I only found that link 
there must be sth. on the real wiki for the maxSegments parameter ...)



Hello.

My Index have ~30 Million documents and a optimize=true is very heavy. it
takes long time ...

how can i start an optimize by using DIH, but NOT after an delta- or
full-import ?

i set my index to compound-index.

thx



--
http://jetwick.com twitter search prototype

Re: Optimize Index

2010-11-04 Thread Erick Erickson

no, you didn't miss anything. The comment at Lucen Revolution was more
along the lines that optimize didn't actually improve much #absent# deletes.

Plus, on a significant size corpus, the doc frequencies won't changed that
much by deleting documents, but that's a case-by-case thing

Best
Erick

On Thu, Nov 4, 2010 at 4:31 PM, Markus Jelsma wrote:

> Huh? That's something new for me. Optmize removed documents that have been
> flagged for deletion. For relevancy it's important those are removed
> because
> document frequencies are not updated for deletes.
>
> Did i miss something?
>
> > For what it's worth, the Solr class instructor at the Lucene Revolution
> > conference recommended *against* optimizing, and instead suggested to
> just
> > let the merge factor do it's job.
> >
> > On Thu, Nov 4, 2010 at 2:55 PM, Shawn Heisey  wrote:
> > > On 11/4/2010 7:22 AM, stockiii wrote:
> > >> how can i start an optimize by using DIH, but NOT after an delta- or
> > >> full-import ?
> > >
> > > I'm not aware of a way to do this with DIH, though there might be
> > > something I'm not aware of.  You can do it with an HTTP POST.  Here's
> > > how to do it with curl:
> > >
> > > /usr/bin/curl "http://HOST:PORT/solr/CORE/update"; \
> > > -H "Content-Type: text/xml" \
> > > --data-binary ''
> > >
> > > Shawn
>

Export Index Data.

2010-11-19 Thread Anderson vasconcelos

Hi
Is possible to export one set of documents indexed in one solr server for do
a sincronization with other solr server?

Thank's

only index synonyms

2010-12-06 Thread lee carroll

Hi Can the following usecase be achieved.

value to be analysed at index time "this is a pretty line of text"

synonym list is pretty => scenic , text => words

valued placed in the index is "scenic words"

That is to say only the matching synonyms. Basically i want to produce a
normalised set of phrases for faceting.

Cheers Lee C

Optimize a Index

2011-01-07 Thread Jörg Agatz

Hallo, i have a Index withe 800.000 Dokuments, and now i hope it will be
Faster, if i optimize the Index, it sounds good ;-)

But i cant find an Example to Optimize one of milticors or all cors..


Maby one of you have a little example for that ..

King

SCHEMA-INDEX-MISMATCH

2009-12-22 Thread johnson hong


Hi,all.
I use Lucene's NumericField to index "price" field,And query with
solr.TrieDoubleField.
When i use "price:[1  TO 5000]" to search,it can return all results that
price is between 1 and 5000.
but the price value return is
:ERROR:SCHEMA-INDEX-MISMATCH,stringValue=2000.0
anybogy know why?
-- 
View this message in context: 
http://old.nabble.com/SCHEMA-INDEX-MISMATCH-tp26897605p26897605.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Corrupted Index

2010-01-07 Thread Ryan McKinley


what version of solr are you running?


On Jan 7, 2010, at 3:08 PM, Jake Brownell wrote:


Hi all,

Our application uses solrj to communicate with our solr servers. We  
started a fresh index yesterday after upping the maxFieldLength  
setting in solrconfig. Our task indexes content in batches and all  
appeared to be well until noonish today, when after 40k docs, I  
started seeing errors. I've placed three stack traces below, the  
first occurred once and was the initial error, the second occurred a  
few times before the third started occurring on each request. I'd  
really appreciate any insight into what could have caused this, a  
missing file and then a corrupt index. If you know we'll have to  
nuke the entire index and start over I'd like to know that too-oddly  
enough searches against the index appear to be working.


Thanks!
Jake

#1

January 7, 2010 12:10:06 PM CST Caught error; TaskWrapper block 1
January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ 
_fsk_1uj.del (No such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No  
such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update
January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ 
_fsk_1uj.del (No such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No  
such file or directory)


solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update
org.benetech.exception.WrappedException   
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)

org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)

org 
.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)

   org.apache.solr.client.solrj.SolrServer#commit(86)
   org.apache.solr.client.solrj.SolrServer#commit(75)
   org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
   org.bookshare.search.solr.SolrSearchEngine#index(232)

org 
.bookshare 
.service.task.SearchEngineIndexingTask#initialInstanceLoad(95)

   org.bookshare.service.task.SearchEngineIndexingTask#run(53)
   org.bookshare.service.scheduler.TaskWrapper#run(233)
   java.util.TimerThread#mainLoop(512)
   java.util.TimerThread#run(462)
Caused by:
solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)

request: /core0/update
org.apache.solr.common.SolrException
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)

org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)

org 
.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)

   org.apache.solr.client.solrj.SolrServer#commit(86)
   org.apache.solr.client.solrj.SolrServer#commit(75)
   org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
   org.bookshare.search.solr.SolrSearchEngine#index(232)

org 
.bookshare 
.service.task.SearchEngineIndexingTask#initialInstanceLoad(95)

   org.bookshare.service.task.SearchEngineIndexingTask#run(53)
   org.bookshare.service.scheduler.TaskWrapper#run(233)
   java.util.TimerThread#mainLoop(512)
   java.util.TimerThread#run(462)

#2

January 7, 2010 12:10:10 PM CST Caught error; TaskWrapper block 1
January 7, 2010 12:10:10 PM CST  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update
January 7, 2010 12:10:10 PM CST  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update  
org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _hug: fieldsReader shows 8 but segmentInfo shows 2


request: /core0/update
org.benetech.exception.WrappedException   
org.apache.solr.client.solrj.impl.CommonsHttpSo

RE: Corrupted Index

2010-01-07 Thread Jake Brownell

Yes, that would be helpful to include, sorry, the official 1.4.

-Original Message-
From: Ryan McKinley [mailto:ryan...@gmail.com] 
Sent: Thursday, January 07, 2010 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Corrupted Index

what version of solr are you running?


On Jan 7, 2010, at 3:08 PM, Jake Brownell wrote:

> Hi all,
>
> Our application uses solrj to communicate with our solr servers. We  
> started a fresh index yesterday after upping the maxFieldLength  
> setting in solrconfig. Our task indexes content in batches and all  
> appeared to be well until noonish today, when after 40k docs, I  
> started seeing errors. I've placed three stack traces below, the  
> first occurred once and was the initial error, the second occurred a  
> few times before the third started occurring on each request. I'd  
> really appreciate any insight into what could have caused this, a  
> missing file and then a corrupt index. If you know we'll have to  
> nuke the entire index and start over I'd like to know that too-oddly  
> enough searches against the index appear to be working.
>
> Thanks!
> Jake
>
> #1
>
> January 7, 2010 12:10:06 PM CST Caught error; TaskWrapper block 1
> January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ 
> _fsk_1uj.del (No such file or directory)
>
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
>
> request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No  
> such file or directory)
>
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
>
> request: /core0/update
> January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ 
> _fsk_1uj.del (No such file or directory)
>
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
>
> request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No  
> such file or directory)
>
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
>
> request: /core0/update
> org.benetech.exception.WrappedException   
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)
> 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)
> 
> org 
> .apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)
>org.apache.solr.client.solrj.SolrServer#commit(86)
>org.apache.solr.client.solrj.SolrServer#commit(75)
>org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
>org.bookshare.search.solr.SolrSearchEngine#index(232)
> 
> org 
> .bookshare 
> .service.task.SearchEngineIndexingTask#initialInstanceLoad(95)
>org.bookshare.service.task.SearchEngineIndexingTask#run(53)
>    org.bookshare.service.scheduler.TaskWrapper#run(233)
>    java.util.TimerThread#mainLoop(512)
>java.util.TimerThread#run(462)
> Caused by:
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
>
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
>
> request: /core0/update
> org.apache.solr.common.SolrException
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)
> 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)
> 
> org 
> .apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)
>    org.apache.solr.client.solrj.SolrServer#commit(86)
>org.apache.solr.client.solrj.SolrServer#commit(75)
>org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
>org.bookshare.search.solr.SolrSearchEngine#index(232)
> 
> org 
> .bookshare 
> .service.task.SearchEngineIndexingTask#initialInstanceLoad(95)
>org.bookshare.service.task.SearchEngineIndexingTask#run(53)
>org.bookshare.service.scheduler.TaskWrapper#run(233)
>java.util.TimerThread#mainLoop(512)
>java.util.TimerThread#run(462)
>
> #2
>
> January 7, 2010 12:10:10 PM CST Caught error; TaskWrapper block 1
> January 7, 2010 12:10:10 PM CST  
> org.apache.lucene.index.CorruptIndexException: doc counts differ for  
> segment _hug: fieldsReader shows 8 but segmentInfo shows 2
>
> org.apache.lucene.index.CorruptIndexException: doc counts differ for  
> segment _hug: fieldsReader shows 8 but segmentInfo shows 2
>
> request: /core0/update  
> org.apache.lucene.index.CorruptIndexException: doc counts differ for  
> segment _hug: fieldsReader shows 8 but segmentInfo shows 2
>
> org.apache.lucene.index.CorruptIndexException: doc counts differ for  
> segment _hug: fieldsReader shows 8 but segmentInfo shows 2
>
> request: /core0/upda

Re: Corrupted Index

2010-01-07 Thread Otis Gospodnetic

If you need to fix the index and maybe lose some data (in bad segments), 
check Lucene's CheckIndex (cmd-line app)

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
> From: Jake Brownell 
> To: "solr-user@lucene.apache.org" 
> Sent: Thu, January 7, 2010 3:08:55 PM
> Subject: Corrupted Index
> 
> Hi all,
> 
> Our application uses solrj to communicate with our solr servers. We started a 
> fresh index yesterday after upping the maxFieldLength setting in solrconfig. 
> Our 
> task indexes content in batches and all appeared to be well until noonish 
> today, 
> when after 40k docs, I started seeing errors. I've placed three stack traces 
> below, the first occurred once and was the initial error, the second occurred 
> a 
> few times before the third started occurring on each request. I'd really 
> appreciate any insight into what could have caused this, a missing file and 
> then 
> a corrupt index. If you know we'll have to nuke the entire index and start 
> over 
> I'd like to know that too-oddly enough searches against the index appear to 
> be 
> working.
> 
> Thanks!
> Jake
> 
> #1
> 
> January 7, 2010 12:10:06 PM CST Caught error; TaskWrapper block 1
> January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/_fsk_1uj.del (No 
> such 
> file or directory)
> 
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
> 
> request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No such file 
> or 
> directory)
> 
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
> 
> request: /core0/update
> January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/_fsk_1uj.del (No 
> such 
> file or directory)
> 
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
> 
> request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No such file 
> or 
> directory)
> 
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
> 
> request: /core0/update
> org.benetech.exception.WrappedException  
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)
> 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)
> 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)
> org.apache.solr.client.solrj.SolrServer#commit(86)
> org.apache.solr.client.solrj.SolrServer#commit(75)
> org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
> org.bookshare.search.solr.SolrSearchEngine#index(232)
> 
> org.bookshare.service.task.SearchEngineIndexingTask#initialInstanceLoad(95)
> org.bookshare.service.task.SearchEngineIndexingTask#run(53)
> org.bookshare.service.scheduler.TaskWrapper#run(233)
> java.util.TimerThread#mainLoop(512)
> java.util.TimerThread#run(462)
> Caused by:
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
> 
> solr-home/core0/data/index/_fsk_1uj.del (No such file or directory)
> 
> request: /core0/update
> org.apache.solr.common.SolrException  
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424)
> 
> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243)
> 
> org.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105)
> org.apache.solr.client.solrj.SolrServer#commit(86)
> org.apache.solr.client.solrj.SolrServer#commit(75)
> org.bookshare.search.solr.SolrSearchServerWrapper#add(63)
> org.bookshare.search.solr.SolrSearchEngine#index(232)
> 
> org.bookshare.service.task.SearchEngineIndexingTask#initialInstanceLoad(95)
> org.bookshare.service.task.SearchEngineIndexingTask#run(53)
> org.bookshare.service.scheduler.TaskWrapper#run(233)
> java.util.TimerThread#mainLoop(512)
> java.util.TimerThread#run(462)
> 
> #2
> 
> January 7, 2010 12:10:10 PM CST Caught error; TaskWrapper block 1
> January 7, 2010 12:10:10 PM CST 
> org.apache.lucene.index.CorruptIndexException: 
> doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo 
> shows 2
> 
> org.apache.lucene.index.CorruptIndexException: doc counts differ for segment 
> _hug: fieldsReader shows 8 but segmentInfo shows 2
> 
> request: /core0/update org.apache.lucene.index.CorruptIndexException: doc 
> counts 
> differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2
> 
> org.apache.lucene.index.CorruptIndexException: doc counts differ for segment 
> _hug: fieldsReader shows 8

update solr index

2010-01-11 Thread Marc Des Garets

Hi,

I am running solr in tomcat and I have about 35 indexes (between 2 and
80 millions documents each). Currently if I try to update few documents
from an index (let's say the one which contains 80 millions documents)
while tomcat is running and therefore receiving requests, I am getting
few very long garbage collection (about 60sec). I am running tomcat with
-Xms10g -Xmx10g -Xmn2g -XX:PermSize=256m -XX:MaxPermSize=256m. I'm using
ConcMarkSweepGC.

I have 2 questions:
1. Is solr doing something specific while an index is being updated like
updating something in memory which would cause the garbage collection?

2. Any idea how I could solve this problem? Currently I stop tomcat,
update index, start tomcat. I would like to be able to update my index
while tomcat is running. I was thinking about running more tomcat
instance with less memory for each and each running few of my indexes.
Do you think it would be the best way to go?


Thanks,
Marc
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.

ERROR:SCHEMA-INDEX-MISMATCH

2010-02-24 Thread deepak agrawal

Hi,

I upgrade Solr v1.3 to v1.4 but in new version i still use the old index.
I changed the new schema with old fields also.

I have fields in my schema -








but after upgarding when i am searching i got the reult like this -

*
ERROR:SCHEMA-INDEX-MISMATCH,stringValue=4194304
ERROR:SCHEMA-INDEX-MISMATCH,stringValue=0
ERROR:SCHEMA-INDEX-MISMATCH,stringValue=4
ERROR:SCHEMA-INDEX-MISMATCH,stringValue=78
ERROR:SCHEMA-INDEX-MISMATCH,stringValue=5
ERROR:SCHEMA-INDEX-MISMATCH,stringValue=228
*

Please help me why this error is coming and how i can i solve this problem
and how can i use the old index in Solr v1.4.




-- 
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.

Re: Index size

2010-02-25 Thread Otis Gospodnetic

It depends on many factors - how big those docs are (compare a tweet to a news 
article to a book chapter) whether you store the data or just index it, whether 
you compress it, how and how much you analyze the data, etc.

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 
> From: Jean-Sebastien Vachon 
> To: solr-user@lucene.apache.org
> Sent: Wed, February 24, 2010 8:57:21 AM
> Subject: Index size
> 
> Hi All,
> 
> I'm currently looking on integrating Solr and I'd like to have some hints on 
> the 
> size of the index (number of documents) I could possibly host on a server 
> running a Double-Quad server (16 cores) with 48Gb of RAM running Linux. 
> Basically, I need to determine how many of these servers would be required to 
> host about half a billion documents. Should I setup multiple Solr instances 
> (in 
> Virtual Machines or not) or should I run a single instance (with multicores 
> or 
> not) using all available memory as the cache ?
> 
> I also made some tests with shardings on this same server and I could not see 
> any improvement (at least not with 4.5 millions documents). Should all the 
> shards be hosted on different servers? I shall try with more documents in the 
> following days.
> 
> Thx

Re: Index size

2010-02-26 Thread Jean-Sebastien Vachon

Hi,

All the document can be up to 10K. Most if it comes from a single field which 
is both indexed and stored. 
The data is uncompressed because it would eat up to much CPU considering the 
volume we have. We have around 30 fields in all.
We also need to compute some facets as well as collapse the documents forming 
the result set and to be able to sort them on any field.

Thx

On 2010-02-25, at 5:50 PM, Otis Gospodnetic wrote:

> It depends on many factors - how big those docs are (compare a tweet to a 
> news article to a book chapter) whether you store the data or just index it, 
> whether you compress it, how and how much you analyze the data, etc.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> - Original Message 
>> From: Jean-Sebastien Vachon 
>> To: solr-user@lucene.apache.org
>> Sent: Wed, February 24, 2010 8:57:21 AM
>> Subject: Index size
>> 
>> Hi All,
>> 
>> I'm currently looking on integrating Solr and I'd like to have some hints on 
>> the 
>> size of the index (number of documents) I could possibly host on a server 
>> running a Double-Quad server (16 cores) with 48Gb of RAM running Linux. 
>> Basically, I need to determine how many of these servers would be required 
>> to 
>> host about half a billion documents. Should I setup multiple Solr instances 
>> (in 
>> Virtual Machines or not) or should I run a single instance (with multicores 
>> or 
>> not) using all available memory as the cache ?
>> 
>> I also made some tests with shardings on this same server and I could not 
>> see 
>> any improvement (at least not with 4.5 millions documents). Should all the 
>> shards be hosted on different servers? I shall try with more documents in 
>> the 
>> following days.
>> 
>> Thx 
>

Re: Optimize Index

2010-03-02 Thread Erick Erickson

My very first guess would be that you're removing an index that isn't
the one your SOLR configuration points at.

Second guess would be that your browser is caching the results of
your first query and not going to SOLR at all. Stranger things have
happened .

Third guess is you've mis-identified the core in your URL.

Can you check those three things and let us know if you still
have the problem?

Erick

On Tue, Mar 2, 2010 at 7:36 AM, Lee Smith  wrote:

> Hi All
>
> Is there a post request method to clean the index?
>
> I have removed my index folder and restarted solr and its still showing
> documents in the stats.
>
> I have run this post request:
> http://localhost:8983/solr/core1/update?optimize=true
>
> I get no errors but the stats are still show my 4 documents
>
> Hope you can advise.
>
> Thanks

Re: Optimize Index

2010-03-02 Thread Lee Smith

Ha

Now I feel stupid !!

I had a misspell in the data path and you were correct.

Can I ask Erik was the command correct though ?

Thank you

Lee

On 2 Mar 2010, at 13:54, Erick Erickson wrote:

> My very first guess would be that you're removing an index that isn't
> the one your SOLR configuration points at.
> 
> Second guess would be that your browser is caching the results of
> your first query and not going to SOLR at all. Stranger things have
> happened .
> 
> Third guess is you've mis-identified the core in your URL.
> 
> Can you check those three things and let us know if you still
> have the problem?
> 
> Erick
> 
> On Tue, Mar 2, 2010 at 7:36 AM, Lee Smith  wrote:
> 
>> Hi All
>> 
>> Is there a post request method to clean the index?
>> 
>> I have removed my index folder and restarted solr and its still showing
>> documents in the stats.
>> 
>> I have run this post request:
>> http://localhost:8983/solr/core1/update?optimize=true
>> 
>> I get no errors but the stats are still show my 4 documents
>> 
>> Hope you can advise.
>> 
>> Thanks

Fwd: index merge

2010-03-07 Thread Mark Fletcher

Hi,

I have created 2  identical cores coreX and coreY (both have different
dataDir values, but their index is same).
coreX - always serves the request when a user performs a search.
coreY - the updates will happen to this core and then I need to synchronize
it with coreX after the update process, so that coreX also has the
   latest data in it.  After coreX and coreY are synchronized, both
should again be identical again.

For this purpose I tried core merging of coreX and coreY once coreY is
updated with the latest set of data. But I find coreX to be containing
double the record count as in coreY.
(coreX = coreX+coreY)

Is there a problem in using MERGE concept here. If it is wrong can some one
pls suggest the best approach. I tried the various merges explained in my
previous mail.

Any help is deeply appreciated.

Thanks and Rgds,
Mark.



-- Forwarded message --
From: Mark Fletcher 
Date: Sat, Mar 6, 2010 at 9:17 AM
Subject: index merge
To: solr-user@lucene.apache.org
Cc: goks...@gmail.com


Hi,

I have a doubt regarding Index Merging:-

I have set up 2 cores COREX and COREY.
COREX - always serves user requests
COREY - gets updated with the latest values (dataDir is in a different
location from COREX)
I tried merging coreX and coreY at the end of COREY getting updated with the
latest data values so that COREX and COREY are having the latest data. So
the user who always queries COREX gets the latest data.Pls find the various
approaches I followed and the commands used.

I tried these merges:-
COREX = COREX and COREY merged
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreX/data/index&indexDir=/opt1/solr/coreY/data/index
'

COREX = COREY and COREY merged
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreY/data/index
'

COREX = COREY and COREA merged (COREA just contains the initial 2 seed
segments.. a dummy core)
curl '
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=coreX&indexDir=/opt/solr/coreY/data/index&indexDir=/opt1/solr/coreA/data/index
'

When I check the record count in COREX and COREY, COREX always contains
about double of what COREY has. Is everything fine here and just the record
count is different or is there something wrong.
Note:- I have only 2 cores here and I tried the X=X+Y approach, X=Y+Y and
X=Y+A approach where A is a dummy index. Never have the record counts
matched after the merging is done.

Can someone please help me understand why this record count difference
occurs and is there anything fundamentally wrong in my approach.

Thanks and Rgds,
Mark.

Re: index merge

2010-03-08 Thread Shalin Shekhar Mangar

Hi Mark,

On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher
wrote:

>
> I have created 2  identical cores coreX and coreY (both have different
> dataDir values, but their index is same).
> coreX - always serves the request when a user performs a search.
> coreY - the updates will happen to this core and then I need to synchronize
> it with coreX after the update process, so that coreX also has the
>   latest data in it.  After coreX and coreY are synchronized, both
> should again be identical again.
>
> For this purpose I tried core merging of coreX and coreY once coreY is
> updated with the latest set of data. But I find coreX to be containing
> double the record count as in coreY.
> (coreX = coreX+coreY)
>
> Is there a problem in using MERGE concept here. If it is wrong can some one
> pls suggest the best approach. I tried the various merges explained in my
> previous mail.
>
>
Index merge happens at the Lucene level which has no idea about uniqueKeys.
Therefore when you merge two indexes containing exactly the same documents
(by uniqueKey), you get double the document count.

Looking at your scenario, it seems to me that what you want to do is a swap
operation. coreX is serving the requests, coreY is updated and now you can
swap coreX with coreY so that new requests hit the updated index. I suggest
you look at the swap operation instead of index merge.

-- 
Regards,
Shalin Shekhar Mangar.

Re: index merge

2010-03-08 Thread Mark Fletcher

Hi Shalin,

Thank you for the reply.

I got your point. So I understand merge will just duplicate things.

I ran the SWAP command. Now:-
COREX has the dataDir pointing to the updated dataDir of COREY. So COREX has
the latest.
Again, COREY (on which the update regularly runs) is pointing to the old
index of COREX. So this now doesnt have the most updated index.

Now shouldn't I update the index of COREY (now pointing to the old COREX) so
that it has the latest footprint as in COREX (having the latest COREY
index)so that when the update again happens to COREY, it has the latest and
I again do the SWAP.

Is a physical copying of the index  named COREY (the latest and now datDir
of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
orginal non-updated index of COREX) the best way for this or is there any
other better option.

Once again, later when COREY is again updated with the latest, I will run
the SWAP again and it will be fine with COREX again pointing to its original
dataDir (now the updated one).So every even SWAP command run will point
COREX back to its original dataDir. (same case with COREY).

My only concern is after the SWAP is done, updating the old index (which was
serving previously and now replaced by the new index). What is the best way
to do that? Physically copy the latest index to the old one and make it in
sync with the latest one so that by the time it is to get the latest updates
it has the latest in it so that the new ones can be added to this and it
becomes the latest and is again swapped?

Please share your opinion. Once again your help is appreciated. I am kind of
going in circles with multiple indexs for some days!

Thanks and Rgds,
Mark.

On Mon, Mar 8, 2010 at 7:45 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Mark,
>
> On Sun, Mar 7, 2010 at 6:20 PM, Mark Fletcher
> wrote:
>
> >
> > I have created 2  identical cores coreX and coreY (both have different
> > dataDir values, but their index is same).
> > coreX - always serves the request when a user performs a search.
> > coreY - the updates will happen to this core and then I need to
> synchronize
> > it with coreX after the update process, so that coreX also has the
> >   latest data in it.  After coreX and coreY are synchronized,
> both
> > should again be identical again.
> >
> > For this purpose I tried core merging of coreX and coreY once coreY is
> > updated with the latest set of data. But I find coreX to be containing
> > double the record count as in coreY.
> > (coreX = coreX+coreY)
> >
> > Is there a problem in using MERGE concept here. If it is wrong can some
> one
> > pls suggest the best approach. I tried the various merges explained in my
> > previous mail.
> >
> >
> Index merge happens at the Lucene level which has no idea about uniqueKeys.
> Therefore when you merge two indexes containing exactly the same documents
> (by uniqueKey), you get double the document count.
>
> Looking at your scenario, it seems to me that what you want to do is a swap
> operation. coreX is serving the requests, coreY is updated and now you can
> swap coreX with coreY so that new requests hit the updated index. I suggest
> you look at the swap operation instead of index merge.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: index merge

2010-03-08 Thread Shalin Shekhar Mangar

Hi Mark,

On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher
wrote:

>
> I ran the SWAP command. Now:-
> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
> has the latest.
> Again, COREY (on which the update regularly runs) is pointing to the old
> index of COREX. So this now doesnt have the most updated index.
>
> Now shouldn't I update the index of COREY (now pointing to the old COREX)
> so that it has the latest footprint as in COREX (having the latest COREY
> index)so that when the update again happens to COREY, it has the latest and
> I again do the SWAP.
>
> Is a physical copying of the index  named COREY (the latest and now datDir
> of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
> orginal non-updated index of COREX) the best way for this or is there any
> other better option.
>
> Once again, later when COREY is again updated with the latest, I will run
> the SWAP again and it will be fine with COREX again pointing to its original
> dataDir (now the updated one).So every even SWAP command run will point
> COREX back to its original dataDir. (same case with COREY).
>
> My only concern is after the SWAP is done, updating the old index (which
> was serving previously and now replaced by the new index). What is the best
> way to do that? Physically copy the latest index to the old one and make it
> in sync with the latest one so that by the time it is to get the latest
> updates it has the latest in it so that the new ones can be added to this
> and it becomes the latest and is again swapped?
>

Perhaps it is best if we take a step back and understand why you need two
identical cores?

-- 
Regards,
Shalin Shekhar Mangar.

Re: index merge

2010-03-08 Thread Mark Fletcher

Hi Shalin,

Thank you for the mail.
My main purpose of having 2 identical cores
COREX - always serves user request
COREY - every day once, takes the updates/latest data and passess it on to
COREX.
is:-

Suppose say I have only one COREY and suppose a request comes to COREY while
the update of the latest data is happening on to it. Wouldn't it degrade
performance of the requests at that point of time?

So I was planning to keep COREX and COREY always identical. Once COREY has
the latest it should somehow sync with COREX so that COREX also now has the
latest. COREY keeps on getting the updates at a particular time of day and
it will again pass it on to COREX. This process continues everyday.

What is the best possible way to implement this?

Thanks,

Mark.

On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Hi Mark,
>
>  On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher <
> mark.fletcher2...@gmail.com> wrote:
>
>>
>> I ran the SWAP command. Now:-
>> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
>> has the latest.
>> Again, COREY (on which the update regularly runs) is pointing to the old
>> index of COREX. So this now doesnt have the most updated index.
>>
>> Now shouldn't I update the index of COREY (now pointing to the old COREX)
>> so that it has the latest footprint as in COREX (having the latest COREY
>> index)so that when the update again happens to COREY, it has the latest and
>> I again do the SWAP.
>>
>> Is a physical copying of the index  named COREY (the latest and now datDir
>> of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
>> orginal non-updated index of COREX) the best way for this or is there any
>> other better option.
>>
>> Once again, later when COREY is again updated with the latest, I will run
>> the SWAP again and it will be fine with COREX again pointing to its original
>> dataDir (now the updated one).So every even SWAP command run will point
>> COREX back to its original dataDir. (same case with COREY).
>>
>> My only concern is after the SWAP is done, updating the old index (which
>> was serving previously and now replaced by the new index). What is the best
>> way to do that? Physically copy the latest index to the old one and make it
>> in sync with the latest one so that by the time it is to get the latest
>> updates it has the latest in it so that the new ones can be added to this
>> and it becomes the latest and is again swapped?
>>
>
> Perhaps it is best if we take a step back and understand why you need two
> identical cores?
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: index merge

2010-03-08 Thread Shalin Shekhar Mangar

Hi Mark,

On Mon, Mar 8, 2010 at 9:23 PM, Mark Fletcher
wrote:

>
> My main purpose of having 2 identical cores
> COREX - always serves user request
> COREY - every day once, takes the updates/latest data and passess it on to
> COREX.
> is:-
>
> Suppose say I have only one COREY and suppose a request comes to COREY
> while the update of the latest data is happening on to it. Wouldn't it
> degrade performance of the requests at that point of time?
>

The thing to note is that both reads and writes are happening on the same
box. So when you swap cores, the OS has to cache the hot segments of the new
(inactive) index. If you were just re-opening the same (active) index, at
least some of the existing files could remain in the OS's file cache. I
think that may just degrade performance further so you should definitely
benchmark before going through with this.

The best practice is to use a master/slave architecture and separate the
writes and reads.

> So I was planning to keep COREX and COREY always identical. Once COREY has
> the latest it should somehow sync with COREX so that COREX also now has the
> latest. COREY keeps on getting the updates at a particular time of day and
> it will again pass it on to COREX. This process continues everyday.
>

You could use the same approach that Solr 1.3's snapinstaller script used.
It deletes the files and creates hard links to the new index files.

-- 
Regards,
Shalin Shekhar Mangar.

Re: index merge

2010-03-08 Thread Mark Miller


On 03/08/2010 10:53 AM, Mark Fletcher wrote:

Hi Shalin,

Thank you for the mail.
My main purpose of having 2 identical cores
COREX - always serves user request
COREY - every day once, takes the updates/latest data and passess it on to
COREX.
is:-

Suppose say I have only one COREY and suppose a request comes to COREY while
the update of the latest data is happening on to it. Wouldn't it degrade
performance of the requests at that point of time?
   
Yes - but your not going to help anything by using two indexes - best 
you can do it use two boxes. 2 indexes on the same box will actually
be worse than one if they are identical and you are swapping between 
them. Writes on an index will not affect reads in the way you are 
thinking - only in that its uses IO and CPU that the read process cant. 
Thats going to happen with 2 indexes on the same box too - except now 
you have way more data to cache and flip between, and you can't take any 
advantage of things just being written possibly being in the cache for 
reads.


Lucene indexes use a write once strategy - when writing new segments, 
you are not touching the segments being read from. Lucene is already 
doing the index juggling for you at the segment level.



So I was planning to keep COREX and COREY always identical. Once COREY has
the latest it should somehow sync with COREX so that COREX also now has the
latest. COREY keeps on getting the updates at a particular time of day and
it will again pass it on to COREX. This process continues everyday.

What is the best possible way to implement this?

Thanks,

Mark.


On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar<
shalinman...@gmail.com>  wrote:

   

Hi Mark,

  On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher<
mark.fletcher2...@gmail.com>  wrote:

 

I ran the SWAP command. Now:-
COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
has the latest.
Again, COREY (on which the update regularly runs) is pointing to the old
index of COREX. So this now doesnt have the most updated index.

Now shouldn't I update the index of COREY (now pointing to the old COREX)
so that it has the latest footprint as in COREX (having the latest COREY
index)so that when the update again happens to COREY, it has the latest and
I again do the SWAP.

Is a physical copying of the index  named COREY (the latest and now datDir
of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
orginal non-updated index of COREX) the best way for this or is there any
other better option.

Once again, later when COREY is again updated with the latest, I will run
the SWAP again and it will be fine with COREX again pointing to its original
dataDir (now the updated one).So every even SWAP command run will point
COREX back to its original dataDir. (same case with COREY).

My only concern is after the SWAP is done, updating the old index (which
was serving previously and now replaced by the new index). What is the best
way to do that? Physically copy the latest index to the old one and make it
in sync with the latest one so that by the time it is to get the latest
updates it has the latest in it so that the new ones can be added to this
and it becomes the latest and is again swapped?

   

Perhaps it is best if we take a step back and understand why you need two
identical cores?

--
Regards,
Shalin Shekhar Mangar.

 
   



--
- Mark

http://www.lucidimagination.com

Re: index merge

2010-03-11 Thread Mark Fletcher

Hi All,

Thank you for the very valuable suggestions.
I am planning to try using the Master - Slave configuration.

Best Rgds,
Mark.

On Mon, Mar 8, 2010 at 11:17 AM, Mark Miller  wrote:

> On 03/08/2010 10:53 AM, Mark Fletcher wrote:
>
>> Hi Shalin,
>>
>> Thank you for the mail.
>> My main purpose of having 2 identical cores
>> COREX - always serves user request
>> COREY - every day once, takes the updates/latest data and passess it on to
>> COREX.
>> is:-
>>
>> Suppose say I have only one COREY and suppose a request comes to COREY
>> while
>> the update of the latest data is happening on to it. Wouldn't it degrade
>> performance of the requests at that point of time?
>>
>>
> Yes - but your not going to help anything by using two indexes - best you
> can do it use two boxes. 2 indexes on the same box will actually
> be worse than one if they are identical and you are swapping between them.
> Writes on an index will not affect reads in the way you are thinking - only
> in that its uses IO and CPU that the read process cant. Thats going to
> happen with 2 indexes on the same box too - except now you have way more
> data to cache and flip between, and you can't take any advantage of things
> just being written possibly being in the cache for reads.
>
> Lucene indexes use a write once strategy - when writing new segments, you
> are not touching the segments being read from. Lucene is already doing the
> index juggling for you at the segment level.
>
>
> So I was planning to keep COREX and COREY always identical. Once COREY has
>> the latest it should somehow sync with COREX so that COREX also now has
>> the
>> latest. COREY keeps on getting the updates at a particular time of day and
>> it will again pass it on to COREX. This process continues everyday.
>>
>> What is the best possible way to implement this?
>>
>> Thanks,
>>
>> Mark.
>>
>>
>> On Mon, Mar 8, 2010 at 9:53 AM, Shalin Shekhar Mangar<
>> shalinman...@gmail.com>  wrote:
>>
>>
>>
>>> Hi Mark,
>>>
>>>  On Mon, Mar 8, 2010 at 7:38 PM, Mark Fletcher<
>>> mark.fletcher2...@gmail.com>  wrote:
>>>
>>>
>>>
>>>> I ran the SWAP command. Now:-
>>>> COREX has the dataDir pointing to the updated dataDir of COREY. So COREX
>>>> has the latest.
>>>> Again, COREY (on which the update regularly runs) is pointing to the old
>>>> index of COREX. So this now doesnt have the most updated index.
>>>>
>>>> Now shouldn't I update the index of COREY (now pointing to the old
>>>> COREX)
>>>> so that it has the latest footprint as in COREX (having the latest COREY
>>>> index)so that when the update again happens to COREY, it has the latest
>>>> and
>>>> I again do the SWAP.
>>>>
>>>> Is a physical copying of the index  named COREY (the latest and now
>>>> datDir
>>>> of COREX after SWAP) to the index COREX  (now the dataDir of COREY.. the
>>>> orginal non-updated index of COREX) the best way for this or is there
>>>> any
>>>> other better option.
>>>>
>>>> Once again, later when COREY is again updated with the latest, I will
>>>> run
>>>> the SWAP again and it will be fine with COREX again pointing to its
>>>> original
>>>> dataDir (now the updated one).So every even SWAP command run will point
>>>> COREX back to its original dataDir. (same case with COREY).
>>>>
>>>> My only concern is after the SWAP is done, updating the old index (which
>>>> was serving previously and now replaced by the new index). What is the
>>>> best
>>>> way to do that? Physically copy the latest index to the old one and make
>>>> it
>>>> in sync with the latest one so that by the time it is to get the latest
>>>> updates it has the latest in it so that the new ones can be added to
>>>> this
>>>> and it becomes the latest and is again swapped?
>>>>
>>>>
>>>>
>>> Perhaps it is best if we take a step back and understand why you need two
>>> identical cores?
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>>
>>
>>
>
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
>
>
>

Index field untokenized

2010-03-22 Thread Alessandro Falasca (KCTP)


Hi All,
I want to index some data untokenized (e.g. url), but I can't
find a way to do it.

I know there is a way to do it in solr configuration but I want
to specify this options directly in my solr xml.

This is a fragment of the xml that i post in slr and I want to know if is possible to add to some field (e.g. 
modsCollection.name.xlink:href) an extra attribute in some other way the information about how to index it.//


///
http://www.fao.org/faooa/schemas/eims/v0.9"; 
xmlns:mods="http://www.loc.gov/mods/v3";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; 
xmlns:eims="http://www.fao.org/faooa/schemas/eims/v0.9";
xmlns:xlink="http://www.w3.org/1999/xlink"; 
xmlns:xalan="http://xml.apache.org/xalan";
xmlns:l="http://lang.data"; 
xmlns:fn="http://www.w3.org/2005/xpath-functions";
xmlns:dcterms="http://purl.org/dc/terms/"; 
xmlns:ags="http://www.fao.org/agris/agmes/schemas/0.1/";
xmlns:uvalibadmin="http://dl.lib.virginia.edu/bin/admin/admin.dtd/";

xmlns:uvalibdesc="http://dl.lib.virginia.edu/bin/dtd/descmeta/descmeta.dtd";
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"; 
xmlns:dc="http://purl.org/dc/elements/1.1/";
xmlns:foxml="info:fedora/fedora-system:def/foxml#" 
xmlns:zs="http://www.loc.gov/zing/srw/";>

eims-document:1960

.
http://aims.fao.org/aos/v01/corporatebody/c_1962

iso639-2b




/Regards,
Alessandro


http://www.fao.org/faooa/schemas/eims/v0.9"; xmlns:mods="http://www.loc.gov/mods/v3";
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:eims="http://www.fao.org/faooa/schemas/eims/v0.9";
	xmlns:xlink="http://www.w3.org/1999/xlink"; xmlns:xalan="http://xml.apache.org/xalan";
	xmlns:l="http://lang.data"; xmlns:fn="http://www.w3.org/2005/xpath-functions";
	xmlns:dcterms="http://purl.org/dc/terms/"; xmlns:ags="http://www.fao.org/agris/agmes/schemas/0.1/";
	xmlns:uvalibadmin="http://dl.lib.virginia.edu/bin/admin/admin.dtd/";
	xmlns:uvalibdesc="http://dl.lib.virginia.edu/bin/dtd/descmeta/descmeta.dtd";
	xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"; xmlns:dc="http://purl.org/dc/elements/1.1/";
	xmlns:foxml="info:fedora/fedora-system:def/foxml#" xmlns:zs="http://www.loc.gov/zing/srw/";>
	
		eims-document:1960
		
		Active
		Note relative Ã  la rÃ©forme de l'ONU et de la FAO
		
		
		2010-03-11T13:37:44.537Z
		
		2010-03-11T13:39:15.819Z
		
		2
		AUDREC1
		Fedora API-M
		modifyDatastreamByValue
		
		DC
		fedoraAdmin
		
		2010-03-11T13:37:44.801Z
		
		Initial Import of this Object
		

		AUDREC2
		Fedora API-M
		addDatastream
		MODS
		fedoraAdmin
		
		2010-03-11T13:39:09.348Z
		


		AUDREC3
		Fedora API-M
		addDatastream
		AGRISFO
		fedoraAdmin
		
		2010-03-11T13:39:11.931Z
		


		AUDREC4
		Fedora API-M
		addDatastream
		EIMS
		fedoraAdmin
		
		2010-03-11T13:39:13.434Z
		


		AUDREC5
		Fedora API-M
		addDatastream
		SKOS
		fedoraAdmin
		
		2010-03-11T13:39:15.819Z
		



		fr
		Note relative Ã  la rÃ©forme de l'ONU et de la FAO
		
		pubid.fao.org:210159
		FAO



		info:fedora/eims-document:1960
		
		faooa:FRBR-EXPRESSION
		J8010



		3.3

		2006-06-29


		fr
		Note relative Ã  la rÃ©forme de l'ONU et de la FAO
		

		fao-aos-corporatebody
		corporate
		http://aims.fao.org/aos/v01/corporatebody/c_1962
		
		en
		FAO, Rome (Italy). Fisheries and Aquaculture
			Dept.

		marcrelator
		text
		Author
		marcrelator
		text


		conference
		en
		FAO Committee on Fisheries. Sub-Committee on
			Aquaculture (Sess. 4 : 6-10 Oct 2008 : Puerto Varas, Chile)

		marcrelator
		text
		Author
		marcrelator
		text


		type
		Conference
		type
		type
		Non-conventional
		type

		iso639-2b
		code
		fra
		iso639-2b
		code
		text
		French
		text

		jn
		J8010
		jn
		rn





		210159
		0
		3
		en
		KC



		1
		en
		Publication

Separate index files

2010-04-08 Thread S Goorov

Hello, there -

How often in Solr used possibility to store index in separate files for
different things, for example, products (at the one Solr instance)? The aim
is maintain separate files for backup, independent re-indexing, something
else(?). And in what extent useful that solutions? Thanks

Sergei

Re: index merge

2010-05-19 Thread uma m


Hi All,

  I am running solr in 64 bit HP-UX system. The total index size is about
5GB and when i try load any new document, solr tries to merge the existing
segments first and results in following error. I could see a temp file is
growng within index dir around 2GB in size and later it fails with this
exception. It looks like, by reaching Integer.MAXVALUE, the exception
occurs.

Exception in thread "Lucene Merge Thread #0"
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
File too large (errno:27)
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:351)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:315)
Caused by: java.io.IOException: File too large (errno:27)
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:456)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.flushBuffer(SimpleFSDirectory.java:192)
at
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
at
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
at
org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109)
at
org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexOutput.close(SimpleFSDirectory.java:199)
at org.apache.lucene.index.FieldsWriter.close(FieldsWriter.java:144)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:357)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:153)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5029)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4614)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

---

The solrconfig.xml contains default values for , 
sections as below.

  ^M
   ^M
false^M
^M
10^M
^M
^M
^M
^M
32^M
^M
1^M
1000^M
1^M
 ^M
^M
  ^
 ^M
^M
false^M
32^M
10^M
^M
^M
^M
 ^


Could anyone help me to resolve this exception?

Regards,
Uma
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/index-merge-tp472904p829810.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: index merge

2010-05-19 Thread Ahmet Arslan

> I am running solr in 64 bit HP-UX system. The total
> index size is about
> 5GB and when i try load any new document, solr tries to
> merge the existing
> segments first and results in following error. I could see
> a temp file is
> growng within index dir around 2GB in size and later it
> fails with this
> exception. It looks like, by reaching Integer.MAXVALUE, the
> exception
> occurs.

32 isn't 32MB ramBufferSizeMB too small?

Re: index merge

2010-05-20 Thread uma m


Hi All,
 
   The problem is resolved. It is purely due to filesystem. My filesystem is
of 32-bit, running on 64 bit OS. I changed to 64 bit filesystem and all
works as expected.

Uma
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/index-merge-tp472904p832053.html
Sent from the Solr - User mailing list archive at Nabble.com.

Rebuild an index

2010-05-28 Thread Sai . Thumuluri

Hi, 
We use Drupal as the CMS and Solr for our search engine needs and are
planning to have Solr Master-Slave replication setup across the data
centers. I am in the process of testing my replication - what is the
best means to delete the index on the Solr slave and then replicate a
fresh copy from Master?  We use Solr 1.3.

Thanks,
Sai Thumuluri

My Master solrconfig.xml is 

  

  startup
  commit
  commit
  schema.xml,synonyms.txt,stopwords.txt,elevate.xml

  

And my slave solrconfig.xml


  

  http://masterURL:8080/solr/replication
  01:00:00

stemming the index

2010-07-07 Thread sarfaraz masood

My index contains data of 2 different languages, English & German. Now which 
analyzer & stemmer should be applied on this data before feeding to index

-Sarfaraz

Update the index

2008-01-20 Thread Gavin

Hi,
Can some one point me to a location where it describes how to update an
already indexed document? I was thinking there is and  tag
explained somewhere but cant find it.

Thanks,
-- 
Gavin Selvaratnam,
Project Leader

hSenid Mobile Solutions
Phone: +94-11-2446623/4 
Fax: +94-11-2307579 

Web: http://www.hSenidMobile.com 
 
Make it happen

Disclaimer: This email and any files transmitted with it are confidential and 
intended solely for 
the use of the individual or entity to which they are addressed. The content 
and opinions 
contained in this email are not necessarily those of hSenid Software 
International. 
If you have received this email in error please contact the sender.

Index time synonyms

2008-01-24 Thread anuvenk


I have a hard time understanding the synonyms behaviour..especially because i
don't have the syn filter at index time.

If i have this synonym at index time

Alternative Sentence,Probation before Judgement,Pretrial Diversion

does all occurrence of 'alternative sentence' also get indexed as 'probation
judgement' and 'pretrial diversion' ?
or does it do this wierd grouping 
(alternative probation pretrial)(sentence diversion)judgement

so all occurrences of 'alternative' will be indexed as 'sentence' and
'diversion' ? Then what about the word 'judgement'?
Please someone help me understand this. I have another  question related to
synonyms posted here 
http://www.nabble.com/solr-synonyms-behaviour-td15051211.html
..please help with that too...


-- 
View this message in context: 
http://www.nabble.com/Index-time-synonyms-tp15073889p15073889.html
Sent from the Solr - User mailing list archive at Nabble.com.

Lucene index verifier

2008-02-07 Thread Lance Norskog

(Sorry, my Lucene java-user access is wonky.)
 
I would like to verify that my snapshots are not corrupt before I enable
them.
 
What is the simplest program to verify that a Lucene index is not corrupt?
 
Or, what is a Solr query that will verify that there is no corruption? With
the minimum amount of time?
 
Thanks,
 
Lance Norskog

Shared index base

2008-02-26 Thread Evgeniy Strokin

I know there was such discussions about the subject, but I want to ask again if 
somebody could share more information.
We are planning to have several separate servers for our search engine. One of 
them will be index/search server, and all others are search only.
We want to use SAN (BTW: should we consider something else?) and give access to 
it from all servers. So all servers will use the same index base, without any 
replication, same files.
Is this a good practice? Did somebody do the same? Any problems noticed? Or any 
suggestions, even about different configurations are highly appreciated.
 
Thanks,
Gene

Merging Solr index

2008-04-04 Thread Norskog, Lance

Hi-
 
http://wiki.apache.org/solr/MergingSolrIndexes recommends using the
Lucene contributed app IndexMergeTool to merge two Solr indexes. What
happens if both indexes have records with the same unique key? Will they
both go into the new index?
 
Is the implementation of unique IDs in the Solr java or in Lucene? If it
is in Solr, how would I hackup a Solr IndexMergeTool?
 
Cheers,
 
Lance Norskog

Re: Index splitting

2008-04-29 Thread Otis Gospodnetic

Hi Nico,

I don't think there is a tool to split an existing Lucene index, though I 
imagine one could write such a tool using 
http://lucene.apache.org/java/2_3_1/fileformats.html as a guide.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
> From: Nico Heid <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, April 29, 2008 4:10:09 AM
> Subject: Index splitting
> 
> Hi,
> Let me first roughly describe the scenario :-)
> 
> We're trying to index online stored data for some thousand users.
> The schema.xml has a custom identifier for the user, so FQ can be applied
> and further filtering is only done for the user (more important, the user
> doesn't get to see results from data not belonging to him)
> 
> Unfortunatelly, the Index might become quite big ( we're indexing more that
> 50 TB Data, all kind of files, full text (indexed only, not stored) where
> possible, elsewhere fileinfos (size, date) and meta if available)
> 
> So Question the is:
> 
> We're thinking of starting out with multiple Solr instances (either in their
> own containers or MultiCore, guess that's not the important point), on 1 to
> n machines. Lets just pretend: we do modulo 5 on the user number and assign
> it to one of the two machines. The index gets distributed on QuerySlaves (
> 1-m dependend on the need).
> 
> So now the Question:
> Is there a way to split a too big index into smaller ones? Do I have to
> create more instances at the beginning, so that I will not run out of power
> and space? (which will ad quite a bit of redundance of data)
> Lets say I miscalculated and used only 2 indices, but now I see I need at
> least 4.
> 
> Any idea will be very welcome,
> 
> Thanks,
> Nico
> 
> 
>

Re: Index splitting

2008-04-29 Thread Grant Ingersoll

I seem to recall Doug C. commenting on this:  http://lucene.markmail.org/search/?q=FilterIndexReader#query 
:FilterIndexReader%20from%3A%22Doug%20Cutting%22+page:1+mid:y673avueo43ufwhm+state:results


Not sure if that is exactly what you are looking for, but sounds  
similar.


-Grant

On Apr 29, 2008, at 1:10 PM, Otis Gospodnetic wrote:


Hi Nico,

I don't think there is a tool to split an existing Lucene index,  
though I imagine one could write such a tool using http://lucene.apache.org/java/2_3_1/fileformats.html 
 as a guide.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 

From: Nico Heid <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, April 29, 2008 4:10:09 AM
Subject: Index splitting

Hi,
Let me first roughly describe the scenario :-)

We're trying to index online stored data for some thousand users.
The schema.xml has a custom identifier for the user, so FQ can be  
applied
and further filtering is only done for the user (more important,  
the user

doesn't get to see results from data not belonging to him)

Unfortunatelly, the Index might become quite big ( we're indexing  
more that
50 TB Data, all kind of files, full text (indexed only, not stored)  
where

possible, elsewhere fileinfos (size, date) and meta if available)

So Question the is:

We're thinking of starting out with multiple Solr instances (either  
in their
own containers or MultiCore, guess that's not the important point),  
on 1 to
n machines. Lets just pretend: we do modulo 5 on the user number  
and assign
it to one of the two machines. The index gets distributed on  
QuerySlaves (

1-m dependend on the need).

So now the Question:
Is there a way to split a too big index into smaller ones? Do I  
have to
create more instances at the beginning, so that I will not run out  
of power

and space? (which will ad quite a bit of redundance of data)
Lets say I miscalculated and used only 2 indices, but now I see I  
need at

least 4.

Any idea will be very welcome,

Thanks,
Nico








--
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Index splitting

2008-04-29 Thread Norberto Meijome

On Tue, 29 Apr 2008 10:10:09 +0200
"Nico Heid" <[EMAIL PROTECTED]> wrote:

> So now the Question:
> Is there a way to split a too big index into smaller ones? Do I have to
> create more instances at the beginning, so that I will not run out of power
> and space? (which will ad quite a bit of redundance of data)
> Lets say I miscalculated and used only 2 indices, but now I see I need at
> least 4.

Hi Nico,
being able to split the index without having to reindex the lot would be a
nice option :)

One approach we use in a project I am working on is to split up the full extent
of your domain (user IDs) in equal parts from the start - with this we get n
clusters and it is as much as we will need to grow outwards . Then we grow
each cluster in depth as needed.

It obviously helps if you have an equal (or random) distribution across your
clusters (we do). Given that you probably won't know how many users you'll get
your case is different to ours. 

To even out your distribution of user-ids to cluster you can use a function of
the user-id (ie, md5(user_id) ) instead of user_id itself. 

HIH,
B
_
{Beto|Norberto|Numard} Meijome

Percusive Maintenance - The art of tuning or repairing equipment by hitting it.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Multiple Index creation

2008-05-06 Thread Vaijanath N. Rao


Hi All,

I tried to search within the SOLR archive, but could not find the answer 
of how can I create multiple index within SOLR. In case of lucene I can 
create an IndexWriter with a new Index, and hence can have multiple 
Index, I can allow search on that multiple index. How can I create in 
Solr a multiple Index.


--Thanks and Regards
Vaijanath

SOLR index size

2008-05-23 Thread Marshall Weir


Hi,

I'm using SOLR to keep track of customer complaints. I only need to  
keep recent complaints, but I want to keep as many as I can fit on my  
hard drive. Is there any way I can configure SOLR to dump old entries  
in the index when the index reaches a certain size? I'm using a month  
old version from trunk.


Thanks,
Marshall

Re: Index structuring

2008-05-31 Thread Noble Paul നോബിള്‍ नोब्ळ्

You could have been more specific on the dataset size.

If your data volumes are growing you can partition your index into
multiple shards.
http://wiki.apache.org/solr/DistributedSearch
--Noble

On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]> wrote:
>
> Dear Readers,
>
> I am a newbie in solr world. I have successfully deployed solr on my
> machine, and I am able to index a large DB table. I am pretty sure that
> internal index structure of solr is much capable to handle large data sets.
>
> But, say my data size keeps growing at jet speed, then what should be the
> index structure? Do I need to follow some specific index structuring
> patterns/algos for handling such massive data?
>
> I am sorry as I may be sounding novice in this area. I would appreciate your
> thoughts/suggestions.
>
> Regards,
> Ritesh Ambastha
> --
> View this message in context: 
> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Index structuring

2008-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्

For the datasize you are proposing , single index should be fine .Just
give the m/c enough RAM

Distributed search involves multiple requests made between shards
which may be an unncessary overhead.
--Noble

On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]> wrote:
>
> Thanks Noble,
>
> I maintain two separate indexes on my disk for two different search
> services.
> The index size of two are: 91MB and 615MB. I am pretty sure that these index
> size will grow in future, and may reach 10GB.
>
> My doubts :
>
> 1. When should I start partitioning my index?
> 2. Is there any performance issue with partitioning? For eg: A query on 1GB
> and 500MB indexed data will take same time to give the result? Or lesser the
> index size, lesser the response time?
>
>
> Regards,
> Ritesh Ambastha
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>
>> You could have been more specific on the dataset size.
>>
>> If your data volumes are growing you can partition your index into
>> multiple shards.
>> http://wiki.apache.org/solr/DistributedSearch
>> --Noble
>>
>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Dear Readers,
>>>
>>> I am a newbie in solr world. I have successfully deployed solr on my
>>> machine, and I am able to index a large DB table. I am pretty sure that
>>> internal index structure of solr is much capable to handle large data
>>> sets.
>>>
>>> But, say my data size keeps growing at jet speed, then what should be the
>>> index structure? Do I need to follow some specific index structuring
>>> patterns/algos for handling such massive data?
>>>
>>> I am sorry as I may be sounding novice in this area. I would appreciate
>>> your
>>> thoughts/suggestions.
>>>
>>> Regards,
>>> Ritesh Ambastha
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> --Noble Paul
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Index structuring

2008-06-04 Thread Ritesh Ambastha


Thanks Noble, 

That means, I can go ahead with single Index for long. 
:)

Regards,
Ritesh Ambastha

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> For the datasize you are proposing , single index should be fine .Just
> give the m/c enough RAM
> 
> Distributed search involves multiple requests made between shards
> which may be an unncessary overhead.
> --Noble
> 
> On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> wrote:
>>
>> Thanks Noble,
>>
>> I maintain two separate indexes on my disk for two different search
>> services.
>> The index size of two are: 91MB and 615MB. I am pretty sure that these
>> index
>> size will grow in future, and may reach 10GB.
>>
>> My doubts :
>>
>> 1. When should I start partitioning my index?
>> 2. Is there any performance issue with partitioning? For eg: A query on
>> 1GB
>> and 500MB indexed data will take same time to give the result? Or lesser
>> the
>> index size, lesser the response time?
>>
>>
>> Regards,
>> Ritesh Ambastha
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>>
>>> You could have been more specific on the dataset size.
>>>
>>> If your data volumes are growing you can partition your index into
>>> multiple shards.
>>> http://wiki.apache.org/solr/DistributedSearch
>>> --Noble
>>>
>>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
>>> <[EMAIL PROTECTED]>
>>> wrote:
>>>>
>>>> Dear Readers,
>>>>
>>>> I am a newbie in solr world. I have successfully deployed solr on my
>>>> machine, and I am able to index a large DB table. I am pretty sure that
>>>> internal index structure of solr is much capable to handle large data
>>>> sets.
>>>>
>>>> But, say my data size keeps growing at jet speed, then what should be
>>>> the
>>>> index structure? Do I need to follow some specific index structuring
>>>> patterns/algos for handling such massive data?
>>>>
>>>> I am sorry as I may be sounding novice in this area. I would appreciate
>>>> your
>>>> thoughts/suggestions.
>>>>
>>>> Regards,
>>>> Ritesh Ambastha
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> --Noble Paul
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-structuring-tp17576449p17643798.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index structuring

2008-06-04 Thread Ritesh Ambastha


Thanks Noble, 

I maintain two separate indexes on my disk for two different search
services.
The index size of two are: 91MB and 615MB. I am pretty sure that these index
size will grow in future, and may reach 10GB. 

My doubts :

1. When should I start partitioning my index?
2. Is there any performance issue with partitioning? For eg: A query on 1GB
and 500MB indexed data will take same time to give the result? Or lesser the
index size, lesser the response time? 


Regards,
Ritesh Ambastha

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> 
> You could have been more specific on the dataset size.
> 
> If your data volumes are growing you can partition your index into
> multiple shards.
> http://wiki.apache.org/solr/DistributedSearch
> --Noble
> 
> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> wrote:
>>
>> Dear Readers,
>>
>> I am a newbie in solr world. I have successfully deployed solr on my
>> machine, and I am able to index a large DB table. I am pretty sure that
>> internal index structure of solr is much capable to handle large data
>> sets.
>>
>> But, say my data size keeps growing at jet speed, then what should be the
>> index structure? Do I need to follow some specific index structuring
>> patterns/algos for handling such massive data?
>>
>> I am sorry as I may be sounding novice in this area. I would appreciate
>> your
>> thoughts/suggestions.
>>
>> Regards,
>> Ritesh Ambastha
>> --
>> View this message in context:
>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-structuring-tp17576449p17643690.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index structuring

2008-06-04 Thread Shalin Shekhar Mangar

A lot of this also depends on the number of documents. But we have
successfully used Solr with upto 10-12 million documents.

On Wed, Jun 4, 2008 at 4:10 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
wrote:

>
> Thanks Noble,
>
> That means, I can go ahead with single Index for long.
> :)
>
> Regards,
> Ritesh Ambastha
>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >
> > For the datasize you are proposing , single index should be fine .Just
> > give the m/c enough RAM
> >
> > Distributed search involves multiple requests made between shards
> > which may be an unncessary overhead.
> > --Noble
> >
> > On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> > wrote:
> >>
> >> Thanks Noble,
> >>
> >> I maintain two separate indexes on my disk for two different search
> >> services.
> >> The index size of two are: 91MB and 615MB. I am pretty sure that these
> >> index
> >> size will grow in future, and may reach 10GB.
> >>
> >> My doubts :
> >>
> >> 1. When should I start partitioning my index?
> >> 2. Is there any performance issue with partitioning? For eg: A query on
> >> 1GB
> >> and 500MB indexed data will take same time to give the result? Or lesser
> >> the
> >> index size, lesser the response time?
> >>
> >>
> >> Regards,
> >> Ritesh Ambastha
> >>
> >> Noble Paul നോബിള്‍ नोब्ळ् wrote:
> >>>
> >>> You could have been more specific on the dataset size.
> >>>
> >>> If your data volumes are growing you can partition your index into
> >>> multiple shards.
> >>> http://wiki.apache.org/solr/DistributedSearch
> >>> --Noble
> >>>
> >>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
> >>> <[EMAIL PROTECTED]>
> >>> wrote:
> >>>>
> >>>> Dear Readers,
> >>>>
> >>>> I am a newbie in solr world. I have successfully deployed solr on my
> >>>> machine, and I am able to index a large DB table. I am pretty sure
> that
> >>>> internal index structure of solr is much capable to handle large data
> >>>> sets.
> >>>>
> >>>> But, say my data size keeps growing at jet speed, then what should be
> >>>> the
> >>>> index structure? Do I need to follow some specific index structuring
> >>>> patterns/algos for handling such massive data?
> >>>>
> >>>> I am sorry as I may be sounding novice in this area. I would
> appreciate
> >>>> your
> >>>> thoughts/suggestions.
> >>>>
> >>>> Regards,
> >>>> Ritesh Ambastha
> >>>> --
> >>>> View this message in context:
> >>>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
> >>>> Sent from the Solr - User mailing list archive at Nabble.com.
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> --Noble Paul
> >>>
> >>>
> >>
> >> --
> >> View this message in context:
> >> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Index-structuring-tp17576449p17643798.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


-- 
Regards,
Shalin Shekhar Mangar.

Re: Index structuring

2008-06-04 Thread Ritesh Ambastha


The number of docs I have indexed till now is : 1,633,570
I am bit afraid as the number of indexed docs will grow atleast 5-10 times
in very near future. 

Regards,
Ritesh Ambastha 



Shalin Shekhar Mangar wrote:
> 
> A lot of this also depends on the number of documents. But we have
> successfully used Solr with upto 10-12 million documents.
> 
> On Wed, Jun 4, 2008 at 4:10 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
> wrote:
> 
>>
>> Thanks Noble,
>>
>> That means, I can go ahead with single Index for long.
>> :)
>>
>> Regards,
>> Ritesh Ambastha
>>
>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>> >
>> > For the datasize you are proposing , single index should be fine .Just
>> > give the m/c enough RAM
>> >
>> > Distributed search involves multiple requests made between shards
>> > which may be an unncessary overhead.
>> > --Noble
>> >
>> > On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha
>> <[EMAIL PROTECTED]>
>> > wrote:
>> >>
>> >> Thanks Noble,
>> >>
>> >> I maintain two separate indexes on my disk for two different search
>> >> services.
>> >> The index size of two are: 91MB and 615MB. I am pretty sure that these
>> >> index
>> >> size will grow in future, and may reach 10GB.
>> >>
>> >> My doubts :
>> >>
>> >> 1. When should I start partitioning my index?
>> >> 2. Is there any performance issue with partitioning? For eg: A query
>> on
>> >> 1GB
>> >> and 500MB indexed data will take same time to give the result? Or
>> lesser
>> >> the
>> >> index size, lesser the response time?
>> >>
>> >>
>> >> Regards,
>> >> Ritesh Ambastha
>> >>
>> >> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>> >>>
>> >>> You could have been more specific on the dataset size.
>> >>>
>> >>> If your data volumes are growing you can partition your index into
>> >>> multiple shards.
>> >>> http://wiki.apache.org/solr/DistributedSearch
>> >>> --Noble
>> >>>
>> >>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
>> >>> <[EMAIL PROTECTED]>
>> >>> wrote:
>> >>>>
>> >>>> Dear Readers,
>> >>>>
>> >>>> I am a newbie in solr world. I have successfully deployed solr on my
>> >>>> machine, and I am able to index a large DB table. I am pretty sure
>> that
>> >>>> internal index structure of solr is much capable to handle large
>> data
>> >>>> sets.
>> >>>>
>> >>>> But, say my data size keeps growing at jet speed, then what should
>> be
>> >>>> the
>> >>>> index structure? Do I need to follow some specific index structuring
>> >>>> patterns/algos for handling such massive data?
>> >>>>
>> >>>> I am sorry as I may be sounding novice in this area. I would
>> appreciate
>> >>>> your
>> >>>> thoughts/suggestions.
>> >>>>
>> >>>> Regards,
>> >>>> Ritesh Ambastha
>> >>>> --
>> >>>> View this message in context:
>> >>>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>> >>>> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> --Noble Paul
>> >>>
>> >>>
>> >>
>> >> --
>> >> View this message in context:
>> >> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > --Noble Paul
>> >
>> >
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Index-structuring-tp17576449p17643798.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Index-structuring-tp17576449p17643909.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index structuring

2008-06-04 Thread Noble Paul നോബിള്‍ नोब्ळ्

 Fot 16 mil docs it may not be necessary. Add the shards when you see
that perf is degrading.
--Noble

On Wed, Jun 4, 2008 at 4:17 PM, Ritesh Ambastha <[EMAIL PROTECTED]> wrote:
>
> The number of docs I have indexed till now is : 1,633,570
> I am bit afraid as the number of indexed docs will grow atleast 5-10 times
> in very near future.
>
> Regards,
> Ritesh Ambastha
>
>
>
> Shalin Shekhar Mangar wrote:
>>
>> A lot of this also depends on the number of documents. But we have
>> successfully used Solr with upto 10-12 million documents.
>>
>> On Wed, Jun 4, 2008 at 4:10 PM, Ritesh Ambastha <[EMAIL PROTECTED]>
>> wrote:
>>
>>>
>>> Thanks Noble,
>>>
>>> That means, I can go ahead with single Index for long.
>>> :)
>>>
>>> Regards,
>>> Ritesh Ambastha
>>>
>>> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>> >
>>> > For the datasize you are proposing , single index should be fine .Just
>>> > give the m/c enough RAM
>>> >
>>> > Distributed search involves multiple requests made between shards
>>> > which may be an unncessary overhead.
>>> > --Noble
>>> >
>>> > On Wed, Jun 4, 2008 at 4:02 PM, Ritesh Ambastha
>>> <[EMAIL PROTECTED]>
>>> > wrote:
>>> >>
>>> >> Thanks Noble,
>>> >>
>>> >> I maintain two separate indexes on my disk for two different search
>>> >> services.
>>> >> The index size of two are: 91MB and 615MB. I am pretty sure that these
>>> >> index
>>> >> size will grow in future, and may reach 10GB.
>>> >>
>>> >> My doubts :
>>> >>
>>> >> 1. When should I start partitioning my index?
>>> >> 2. Is there any performance issue with partitioning? For eg: A query
>>> on
>>> >> 1GB
>>> >> and 500MB indexed data will take same time to give the result? Or
>>> lesser
>>> >> the
>>> >> index size, lesser the response time?
>>> >>
>>> >>
>>> >> Regards,
>>> >> Ritesh Ambastha
>>> >>
>>> >> Noble Paul നോബിള്‍ नोब्ळ् wrote:
>>> >>>
>>> >>> You could have been more specific on the dataset size.
>>> >>>
>>> >>> If your data volumes are growing you can partition your index into
>>> >>> multiple shards.
>>> >>> http://wiki.apache.org/solr/DistributedSearch
>>> >>> --Noble
>>> >>>
>>> >>> On Sat, May 31, 2008 at 9:02 PM, Ritesh Ambastha
>>> >>> <[EMAIL PROTECTED]>
>>> >>> wrote:
>>> >>>>
>>> >>>> Dear Readers,
>>> >>>>
>>> >>>> I am a newbie in solr world. I have successfully deployed solr on my
>>> >>>> machine, and I am able to index a large DB table. I am pretty sure
>>> that
>>> >>>> internal index structure of solr is much capable to handle large
>>> data
>>> >>>> sets.
>>> >>>>
>>> >>>> But, say my data size keeps growing at jet speed, then what should
>>> be
>>> >>>> the
>>> >>>> index structure? Do I need to follow some specific index structuring
>>> >>>> patterns/algos for handling such massive data?
>>> >>>>
>>> >>>> I am sorry as I may be sounding novice in this area. I would
>>> appreciate
>>> >>>> your
>>> >>>> thoughts/suggestions.
>>> >>>>
>>> >>>> Regards,
>>> >>>> Ritesh Ambastha
>>> >>>> --
>>> >>>> View this message in context:
>>> >>>> http://www.nabble.com/Index-structuring-tp17576449p17576449.html
>>> >>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> --Noble Paul
>>> >>>
>>> >>>
>>> >>
>>> >> --
>>> >> View this message in context:
>>> >> http://www.nabble.com/Index-structuring-tp17576449p17643690.html
>>> >> Sent from the Solr - User mailing list archive at Nabble.com.
>>> >>
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > --Noble Paul
>>> >
>>> >
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Index-structuring-tp17576449p17643798.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/Index-structuring-tp17576449p17643909.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
--Noble Paul

Re: Updating index

2008-06-17 Thread Otis Gospodnetic

Mihails,

Update is done as delete + re-add.  You may also want to look at SOLR-139 in 
Solr JIRA.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
> From: Mihails Agafonovs <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, June 17, 2008 6:25:26 AM
> Subject: Updating index
> 
> Hi!
> 
> Updating index with post.jar just replaces the index with the defined
> xml's. But if there are, for example, two fields in all xml's that
> were changed, is there a way to update only these fields (incremental
> update)? If there are a lot of large xml's, it would be performance
> slowdown each time rewriting the index, and also an unreal job to
> change the fields manually.
> Ar cieņu, Mihails

Deleting Solr index

2008-06-18 Thread Mihails Agafonovs

How can I clear the whole Solr index?
 Ar cieņu, Mihails

Automated Index Creation

2008-07-08 Thread Willie Wong

Hi,

Sorry if this question sounds daft but I was wondering if there was 
anything built into Solr that allows you to automate the creation of new 
indexes once they reach a certain size or point in time.  I looked briefly 
at the documentation on CollectionDestribution, but it seems more geared 
to towards replicatting to other production servers...I'm looking for 
something that is more along the lines of archiving indexes for later 
use...


Thanks,

Willie

Re: Index partioning

2008-09-01 Thread Erik Hatcher

That wiki page is purely an idea proposal at this time, not a feature  
of Solr (yet or perhaps ever).


Erik


On Sep 1, 2008, at 2:23 AM, sanraj25 wrote:



Hi
 I read the doument on  http://wiki.apache.org/solr/IndexPartitioning
Now i want partition  my solr index into two. Based on that document I
changed solrconfig.xml. But I  can't visible  any partitioned folder  
other
than default one. I need help on index partitioning.give some  
suggestion to

this.
Thanks in advance

-Santhanaraj

--
View this message in context: 
http://www.nabble.com/Index-partioning-tp19249441p19249441.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index partioning

2008-09-08 Thread John Almberg

That wiki page is purely an idea proposal at this time, not a  
feature  of Solr (yet or perhaps ever).


Erik


I found this thread in the archive...

I'm responsible for a number of ruby on rails websites, all of which  
need search.  Solr seems to have everything I need, but I am  
wondering what's the best way to maintain multiple indexes?


Multiple Solr instances on different ports?

Any help, much appreciated.

-- John

P.S., I don't know how up-to-date it is, but I have Erik's book on  
order from Amazon... hoping that will help.

Re: Index partioning

2008-09-08 Thread Chris Hostetter

: I found this thread in the archive...
: 
: I'm responsible for a number of ruby on rails websites, all of which need
: search.  Solr seems to have everything I need, but I am wondering what's the
: best way to maintain multiple indexes?
: 
: Multiple Solr instances on different ports?

having multiple indexes is a different beast from the "Index Partitioning" 
topic this thread was discussing ... there's some good info on the wiki 
about the various options (they each have their trade offs to consider)

http://wiki.apache.org/solr/MultipleIndexes

-Hoss

Re: Index partioning

2008-09-10 Thread Jason Rennie

We do both #2 and #4 from the Wiki page.  If the schemas have a lot of
overlap and you don't foresee the need to scale to multiple machines (either
due to index size or amount of traffic), it may be best to put all the data
in a single index with different type fields (#4); this certainly minimizes
maintenance.  #1 or #2 seems like a better choice if it is likely you will
eventually need to physically separate the indexes.

Jason

On Mon, Sep 8, 2008 at 6:15 PM, Chris Hostetter <[EMAIL PROTECTED]>wrote:

> : I found this thread in the archive...
> :
> : I'm responsible for a number of ruby on rails websites, all of which need
> : search.  Solr seems to have everything I need, but I am wondering what's
> the
> : best way to maintain multiple indexes?
> :
> : Multiple Solr instances on different ports?
>
> having multiple indexes is a different beast from the "Index Partitioning"
> topic this thread was discussing ... there's some good info on the wiki
> about the various options (they each have their trade offs to consider)
>
>http://wiki.apache.org/solr/MultipleIndexes
>
> -Hoss
>
>

-- 
Jason Rennie
Head of Machine Learning Technologies, StyleFeeder
http://www.stylefeeder.com/
Samantha's blog & pictures: http://samanthalyrarennie.blogspot.com/

Re: Lucene index

2008-09-23 Thread Shalin Shekhar Mangar

On Tue, Sep 23, 2008 at 5:33 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:

>
> Hi,
> Current we are using Lucene api to create index.
>
> It creates index in a directory with 3 files like
>
> xxx.cfs , deletable & segments.
>
> If I am creating Lucene indexes from Solr, these file will be created or
> not?

The lucene index will be created in the solr_home inside the data/index
directory.

> Please give me example on MySQL data base instead of hsqldb
>

If you are talking about DataImportHandler then there is no difference in
the configuration except for using the MySql driver instead of hsqldb.

-- 
Regards,
Shalin Shekhar Mangar.

RE: Lucene index

2008-09-23 Thread Dinesh Gupta

atalogues
doc.add(new Field("clg",(String) 
data.get("Catalogues"),Field.Store.YES,Field.Index.TOKENIZED));
//doc.add(Field.Text("clg", (String) data.get("Catalogues")));

//Product Delivery Cities
doc.add(new Field("dcty",(String) 
data.get("DelCities"),Field.Store.YES,Field.Index.TOKENIZED));
// Additional Information
    //Top Selling Count
String sellerCount=((Long)data.get("SellCount")).toString();
doc.add(new 
Field("bsc",sellerCount,Field.Store.YES,Field.Index.TOKENIZED));


I am preparing data from querying databse.
Please tell me how can I migrate my logic to Solr.
I have spend more than a week.
But have got nothing.
Please help me.

Can I attach my files here?

Thanks in Advance

Regards
Dinesh Gupta

> Date: Tue, 23 Sep 2008 18:53:07 +0530
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene index
> 
> On Tue, Sep 23, 2008 at 5:33 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:
> 
> >
> > Hi,
> > Current we are using Lucene api to create index.
> >
> > It creates index in a directory with 3 files like
> >
> > xxx.cfs , deletable & segments.
> >
> > If I am creating Lucene indexes from Solr, these file will be created or
> > not?
> 
> 
> The lucene index will be created in the solr_home inside the data/index
> directory.
> 
> 
> > Please give me example on MySQL data base instead of hsqldb
> >
> 
> If you are talking about DataImportHandler then there is no difference in
> the configuration except for using the MySql driver instead of hsqldb.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.

_
Want to explore the world? Visit MSN Travel for the best deals.
http://in.msn.com/coxandkings

Re: Lucene index

2008-09-23 Thread Shalin Shekhar Mangar

Hi Dinesh,

This seems straightforward for Solr. You can use the embedded jetty server
for a start. Look at the tutorial on how to get started.

You'll need to modify the schema.xml to define all the fields that you want
to index. The wiki page at http://wiki.apache.org/solr/SchemaXml is a good
start on how to do that. Each field in your code will have a counterpart in
the schema.xml with appropriate flags (indexed/stored/tokenized etc.)

Once that is complete, try to modify the DataImportHandler's hsqldb example
for your mysql database.

On Tue, Sep 23, 2008 at 7:01 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:

>
> Hi Shalin Shekhar,
>
> Let me explain my issue.
>
> I have some tables in my database like
>
> Product
> Category
> Catalogue
> Keywords
> Seller
> Brand
> Country_city_group
> etc.
> I have a class that represent  product document as
>
> Document doc = new Document();
>// Keywords which can be used directly for search
>doc.add(new Field("id",(String)
> data.get("PRN"),Field.Store.YES,Field.Index.UN_TOKENIZED));
>
>// Sorting fields]
>String priceString = (String) data.get("Price");
>if (priceString == null)
>priceString = "0";
>long price = 0;
>try {
>price = (long) Double.parseDouble(priceString);
>} catch (Exception e) {
>
>}
>
>doc.add(new
> Field("prc",NumberUtils.pad(price),Field.Store.YES,Field.Index.UN_TOKENIZED));
>Date createDate = (Date) data.get("CreateDate");
>if (createDate == null) createDate = new Date();
>
>doc.add(new Field("cdt",String.valueOf(createDate.getTime()),
> Field.Store.NO,Field.Index.UN_TOKENIZED));
>
>Date modiDate = (Date) data.get("ModiDate");
>if (modiDate == null) modiDate = new Date();
>
>doc.add(new Field("mdt",String.valueOf(modiDate.getTime()),
> Field.Store.NO,Field.Index.UN_TOKENIZED));
>//doc.add(Field.UnStored("cdt",
> String.valueOf(createDate.getTime(;
>
>// Additional fields for search
>doc.add(new Field("bnm",(String)
> data.get("Brand"),Field.Store.YES,Field.Index.TOKENIZED));
>doc.add(new Field("bnm1",(String) data.get("Brand1"),Field.Store.NO
> ,Field.Index.UN_TOKENIZED));
>//doc.add(Field.Text("bnm", (String) data.get("Brand")));
> //Tokenized and Unstored
>doc.add(new Field("bid",(String)
> data.get("BrandId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
>//doc.add(Field.Keyword("bid", (String) data.get("BrandId"))); //
> untokenized &
>doc.add(new Field("grp",(String) data.get("Group"),Field.Store.NO
> ,Field.Index.TOKENIZED));
>//doc.add(Field.Text("grp", (String) data.get("Group")));
>doc.add(new Field("gid",(String)
> data.get("GroupId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
>//doc.add(Field.Keyword("gid", (String) data.get("GroupId"))); //New
>doc.add(new Field("snm",(String)
> data.get("Seller"),Field.Store.YES,Field.Index.UN_TOKENIZED));
>//doc.add(Field.Text("snm", (String) data.get("Seller")));
>doc.add(new Field("sid",(String)
> data.get("SellerId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
>//doc.add(Field.Keyword("sid", (String) data.get("SellerId"))); //
> New
>doc.add(new Field("ttl",(String)
> data.get("Title"),Field.Store.YES,Field.Index.TOKENIZED));
>//doc.add(Field.UnStored("ttl", (String) data.get("Title"), true));
>
>String title1 = (String) data.get("Title");
>title1 = removeSpaces(title1);
>doc.add(new Field("ttl1",title1,Field.Store.NO
> ,Field.Index.UN_TOKENIZED));
>
>doc.add(new Field("ttl2",title1,Field.Store.NO
> ,Field.Index.TOKENIZED));
>//doc.add(Field.UnStored("ttl", (String) data.get("Title"), true));
>
>// ColumnC - Product Sequence
>String productSeq = (String) data.get("ProductSeq");
>if (productSeq == null) productSeq = "";
>doc.add(new Field("seq",productSeq,Field.Store.NO
> ,Field.Index.UN_TOKENIZED));
>//doc.add(Field.Keyword("seq", productSeq));
>
>// New Added
>doc.add(new Field("sdc",(String) data.get("SpecialDescript

RE: Lucene index

2008-09-24 Thread Dinesh Gupta


Hi Shalin Shekhar,

 First of all thanks to you for quick replying.

  I have done the things that you have explained here

Since I am creating indexes in multi threads   and it takes 6-10 hours to 
creating for approx. 3 lac products

I am using hibernate to access DB & applying custom logic to prepare data and 
putting in a map
and finally writing to index.

Now can I achieve this.

I am able to search by using solr web admin
but not able to add.
Please tell me how can I attach my file to you.

Thanks

Regards,
Dinesh Gupta

> Date: Tue, 23 Sep 2008 19:36:22 +0530
> From: [EMAIL PROTECTED]
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene index
> 
> Hi Dinesh,
> 
> This seems straightforward for Solr. You can use the embedded jetty server
> for a start. Look at the tutorial on how to get started.
> 
> You'll need to modify the schema.xml to define all the fields that you want
> to index. The wiki page at http://wiki.apache.org/solr/SchemaXml is a good
> start on how to do that. Each field in your code will have a counterpart in
> the schema.xml with appropriate flags (indexed/stored/tokenized etc.)
> 
> Once that is complete, try to modify the DataImportHandler's hsqldb example
> for your mysql database.
> 
> On Tue, Sep 23, 2008 at 7:01 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:
> 
> >
> > Hi Shalin Shekhar,
> >
> > Let me explain my issue.
> >
> > I have some tables in my database like
> >
> > Product
> > Category
> > Catalogue
> > Keywords
> > Seller
> > Brand
> > Country_city_group
> > etc.
> > I have a class that represent  product document as
> >
> > Document doc = new Document();
> >// Keywords which can be used directly for search
> >doc.add(new Field("id",(String)
> > data.get("PRN"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >
> >// Sorting fields]
> >String priceString = (String) data.get("Price");
> >if (priceString == null)
> >priceString = "0";
> >long price = 0;
> >try {
> >price = (long) Double.parseDouble(priceString);
> >} catch (Exception e) {
> >
> >}
> >
> >doc.add(new
> > Field("prc",NumberUtils.pad(price),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >Date createDate = (Date) data.get("CreateDate");
> >if (createDate == null) createDate = new Date();
> >
> >doc.add(new Field("cdt",String.valueOf(createDate.getTime()),
> > Field.Store.NO,Field.Index.UN_TOKENIZED));
> >
> >Date modiDate = (Date) data.get("ModiDate");
> >if (modiDate == null) modiDate = new Date();
> >
> >doc.add(new Field("mdt",String.valueOf(modiDate.getTime()),
> > Field.Store.NO,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.UnStored("cdt",
> > String.valueOf(createDate.getTime(;
> >
> >// Additional fields for search
> >doc.add(new Field("bnm",(String)
> > data.get("Brand"),Field.Store.YES,Field.Index.TOKENIZED));
> >doc.add(new Field("bnm1",(String) data.get("Brand1"),Field.Store.NO
> > ,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Text("bnm", (String) data.get("Brand")));
> > //Tokenized and Unstored
> >doc.add(new Field("bid",(String)
> > data.get("BrandId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("bid", (String) data.get("BrandId"))); //
> > untokenized &
> >doc.add(new Field("grp",(String) data.get("Group"),Field.Store.NO
> > ,Field.Index.TOKENIZED));
> >//doc.add(Field.Text("grp", (String) data.get("Group")));
> >doc.add(new Field("gid",(String)
> > data.get("GroupId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("gid", (String) data.get("GroupId"))); //New
> >doc.add(new Field("snm",(String)
> > data.get("Seller"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Text("snm", (String) data.get("Seller")));
> >doc.add(new Field("sid",(String)
> > data.get("SellerId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> >//doc.add(Field.Keyword("sid", (String) data.get("SellerId"))); //
> > New

Re: Lucene index

2008-09-24 Thread Shalin Shekhar Mangar

Hi Dinesh,

There are two ways in which you can import data from databases.

1. Use your custom code with the Solrj client library to upload documents to
Solr -- http://wiki.apache.org/solr/Solrj
2. Use DataImportHandler and write data-config.xml and custom Transformers
-- http://wiki.apache.org/solr/DataImportHandler

Take a look at both and use the one which suits you best.

On Wed, Sep 24, 2008 at 6:37 PM, Dinesh Gupta <[EMAIL PROTECTED]>wrote:

>
> Hi Shalin Shekhar,
>
>  First of all thanks to you for quick replying.
>
>  I have done the things that you have explained here
>
> Since I am creating indexes in multi threads   and it takes 6-10 hours to
> creating for approx. 3 lac products
>
> I am using hibernate to access DB & applying custom logic to prepare data
> and putting in a map
> and finally writing to index.
>
> Now can I achieve this.
>
> I am able to search by using solr web admin
> but not able to add.
> Please tell me how can I attach my file to you.
>
> Thanks
>
> Regards,
> Dinesh Gupta
>
> > Date: Tue, 23 Sep 2008 19:36:22 +0530
> > From: [EMAIL PROTECTED]
> > To: solr-user@lucene.apache.org
> > Subject: Re: Lucene index
> >
> > Hi Dinesh,
> >
> > This seems straightforward for Solr. You can use the embedded jetty
> server
> > for a start. Look at the tutorial on how to get started.
> >
> > You'll need to modify the schema.xml to define all the fields that you
> want
> > to index. The wiki page at http://wiki.apache.org/solr/SchemaXml is a
> good
> > start on how to do that. Each field in your code will have a counterpart
> in
> > the schema.xml with appropriate flags (indexed/stored/tokenized etc.)
> >
> > Once that is complete, try to modify the DataImportHandler's hsqldb
> example
> > for your mysql database.
> >
> > On Tue, Sep 23, 2008 at 7:01 PM, Dinesh Gupta <
> [EMAIL PROTECTED]>wrote:
> >
> > >
> > > Hi Shalin Shekhar,
> > >
> > > Let me explain my issue.
> > >
> > > I have some tables in my database like
> > >
> > > Product
> > > Category
> > > Catalogue
> > > Keywords
> > > Seller
> > > Brand
> > > Country_city_group
> > > etc.
> > > I have a class that represent  product document as
> > >
> > > Document doc = new Document();
> > >// Keywords which can be used directly for search
> > >doc.add(new Field("id",(String)
> > > data.get("PRN"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >
> > >// Sorting fields]
> > >String priceString = (String) data.get("Price");
> > >if (priceString == null)
> > >priceString = "0";
> > >long price = 0;
> > >try {
> > >price = (long) Double.parseDouble(priceString);
> > >} catch (Exception e) {
> > >
> > >}
> > >
> > >doc.add(new
> > >
> Field("prc",NumberUtils.pad(price),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >Date createDate = (Date) data.get("CreateDate");
> > >if (createDate == null) createDate = new Date();
> > >
> > >doc.add(new Field("cdt",String.valueOf(createDate.getTime()),
> > > Field.Store.NO,Field.Index.UN_TOKENIZED));
> > >
> > >Date modiDate = (Date) data.get("ModiDate");
> > >if (modiDate == null) modiDate = new Date();
> > >
> > >doc.add(new Field("mdt",String.valueOf(modiDate.getTime()),
> > > Field.Store.NO,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.UnStored("cdt",
> > > String.valueOf(createDate.getTime(;
> > >
> > >// Additional fields for search
> > >doc.add(new Field("bnm",(String)
> > > data.get("Brand"),Field.Store.YES,Field.Index.TOKENIZED));
> > >doc.add(new Field("bnm1",(String) data.get("Brand1"),
> Field.Store.NO
> > > ,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Text("bnm", (String) data.get("Brand")));
> > > //Tokenized and Unstored
> > >doc.add(new Field("bid",(String)
> > > data.get("BrandId"),Field.Store.YES,Field.Index.UN_TOKENIZED));
> > >//doc.add(Field.Keyword("bid", (String) data.get("BrandId")));
> //
&g

Re: Index partitioning

2008-10-29 Thread Chris Hostetter


: I want to partition my index based on category information. Also, while 
: indexing I want to store particular category data to corresponding index 
: partition. In the same way I need to search for category information on 
: corresponding partition..   I found some information on wiki link 
: http://wiki.apache.org/solr/IndexPartitioning. But it couldn't help much 

I've updated that document to reflect it's status as an "whiteboard" that 
hasn't been updated in a while, and that some of the ideas expressed there 
can achieved using a combination of distributed search and multicore.

deciding which "core" to update when a new doc comes in (ie: based on 
category in your example) or which core to search against when a query 
comes in is something your application would need do -- the distributed 
searching functionality doesn't provide a feature like that.  how it would 
work would be somewhat dependent on your use case (if you want every 
category to have it's own 'partition' it's trivial; if you want to hash on 
the category name it's a little more code but straightforward; if you have 
complex rules about how certain categories shoudl be grouped together in 
the same partition -- then you need to implement those special rules in 
your client code.

-Hoss

Re: Index Builder

2006-03-05 Thread Yonik Seeley

On 3/5/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> What/where is the Index Builder that is referred to in  
> http://wiki.apache.org/solr/CollectionBuilding?

It's currently client-supplied (i.e. there isn't one).

Having all Solr users have to write their own builders (code that gets
data from a source and posts XML documents) certainly isn't optimal.

It would be nice if we could give Solr a database URL with some SQL,
and have it automatically slurp and index the records.  It would also
be nice to be able to grab documents from a CSV or other simple
structured text file and index them.

These ideas are on already on the task list on the (currently down) Wiki.

-Yonik

Re: Index Builder

2006-03-05 Thread Grant Ingersoll

I had a feeling that was the case.  So, I was thinking I could write a driver 
program that takes in my files and then calls the API directly.  Is this 
doable?  How do you guys do it on your live site?  Do you do it all through 
HTTP requests or through a driver that calls the API?  I think I would prefer 
the API calls for bulk loading.  Where should I look for these?

-Grant

Yonik Seeley <[EMAIL PROTECTED]> wrote: On 3/5/06, Grant Ingersoll  wrote:
> What/where is the Index Builder that is referred to in  
> http://wiki.apache.org/solr/CollectionBuilding?

It's currently client-supplied (i.e. there isn't one).

Having all Solr users have to write their own builders (code that gets
data from a source and posts XML documents) certainly isn't optimal.

It would be nice if we could give Solr a database URL with some SQL,
and have it automatically slurp and index the records.  It would also
be nice to be able to grab documents from a CSV or other simple
structured text file and index them.

These ideas are on already on the task list on the (currently down) Wiki.

-Yonik

--
Grant Ingersoll
http://www.grantingersoll.com

-
 Yahoo! Mail
 Use Photomail to share photos without annoying attachments.

Re: Index Builder

2006-03-05 Thread Chris Hostetter


: I had a feeling that was the case.  So, I was thinking I could write a
: driver program that takes in my files and then calls the API directly.
: Is this doable?  How do you guys do it on your live site?  Do you do it
: all through HTTP requests or through a driver that calls the API?  I
: think I would prefer the API calls for bulk loading.  Where should I
: look for these?

Once upon a time, I agrued for having an robust update API, and a way to
write "updater plugins" that would run within the Solr JVM ... and I was
talked out of it in favor doing everything over HTTP.  So yeah ... that's
what I do: build/update entirely over HTTP.

>From what i remember of the internal update API, you could probably write
a new subclass of UpdateHandler that you register in the solrconfig.xml
which pulled most of the data from wherever you want -- but it would still
need to be triggered by (minimal) "" messages over HTTP.

alternately, you could write your own Servlet with load-on-startup="true"
that used the internal update methods directly.




-Hoss

Re: Index Builder

2006-03-07 Thread Yonik Seeley

On 3/5/06, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> So, I was thinking I could write a driver program that takes in my files and 
> then calls the API directly.  Is this doable?

It's doable...
While it will be more efficient, it's not clear how much you will
gain, esp if you run with multiple CPUs (IndexWriting is highly
synchronized).

Check out the UpdateHandler abstract class:
  public abstract int addDoc(AddUpdateCommand cmd) throws IOException;
  public abstract void delete(DeleteUpdateCommand cmd) throws IOException;
  public abstract void deleteByQuery(DeleteUpdateCommand cmd) throws
IOException;
  public abstract void commit(CommitUpdateCommand cmd) throws IOException;
  public abstract void close() throws IOException;

While the implementation of the UpdateHandler is pluggable, there
isn't a place to plug in different client handlers (like there is with
RequestHandler).  You could create another servlet in the same webapp
and get the current UpdateHandler (SolrCore.updateHandler) and use
that to update the index.

Seems like there isn't a getter for SolrCore.updateHandler... feel
free to sumbit a patch if you want to go this route.

You could even drop down to a lower level and use DocumentBuilder to
create your own Lucene Document instances and write them with an
IndexWriter yourself.

-Yonik

>  Do you do it all through HTTP requests or through a driver that calls the 
> API?
> I think I would prefer the API calls for bulk loading.  Where should I look 
> for these?
>
> -Grant
>
> Yonik Seeley <[EMAIL PROTECTED]> wrote: On 3/5/06, Grant Ingersoll  wrote:
> > What/where is the Index Builder that is referred to in  
> > http://wiki.apache.org/solr/CollectionBuilding?
>
> It's currently client-supplied (i.e. there isn't one).
>
> Having all Solr users have to write their own builders (code that gets
> data from a source and posts XML documents) certainly isn't optimal.
>
> It would be nice if we could give Solr a database URL with some SQL,
> and have it automatically slurp and index the records.  It would also
> be nice to be able to grab documents from a CSV or other simple
> structured text file and index them.
>
> These ideas are on already on the task list on the (currently down) Wiki.
>
> -Yonik

add/update index

2006-07-27 Thread Tricia Williams


Hi,

   I have created a process which uses xsl to convert my data to the form 
indicated in the examples so that it can be added to the index as the solr 
tutorial indicates:


  
value
...
  


   In some cases the xsl process will create a field element with no data. 
(ie )  Is this considered bad input and will not be 
accepted?  Or is this something that solr should deal with?  Currently for 
each field element with no data I receive the message:

java.lang.NullPointerException
 at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:78)
 at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:74)
 at org.apache.solr.core.SolrCore.readDoc(SolrCore.java:917)
 at org.apache.solr.core.SolrCore.update(SolrCore.java:685)
 at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:52)
 ...


   Just curious if the gurus out there think I should deal with the null 
values in my xsl process or if this can be dealt with in solr itself?


Thanks,
Tricia

ps.  Thanks for the timely fix for the UTF-8 issue!

Index-time Boosting

2006-10-20 Thread Walter Underwood

I'm trying to figure out how to set per-field boosts in Solr at index time.
For example, if I want the title to be boosted by a factor of 8, I could
do that in a query, or I could add the title text with a boost of 8 to the
default text field along with the body text (with a boost of 1).

For other engines I've worked with, this gives a lot more performance at
the cost of some flexibility -- you need to reindex to change the
weightings.

I don't see an obvious way to do this in a Solr schema, though it might
make sense to add a boost attribute to copyField.

Any ideas? Did I miss something?

wunder
-- 
Walter Underwood
Search Guru, Netflix

Starting an index...

2007-02-21 Thread Jack L


I have played with the "example" directory for a while.
Everything seems to work well. Now I'd like to start my own
index and I have a few questions.

1. I suppose I can start from copying the whole example
directory and name it myindex. I understand that I need
to modify the solr/conf/schema.xml to suit my data. Besides
that, is there anything else that I must/should change?
I'll take a look at the stopwords.txt, etc. to see if any
changes is required. How about solr.war? Anything else I
need to customize? (I'm not a heavy java developer.)

2. For each index, do I need to copy this directory and start
a solr instance? Is it possible to run one solr instance
for multiple indices?

3. solr comes with jetty and it seems to work pretty well.
Is there any reason that I should switch to tomcat for
production servers?

-- 
Thanks,
Jack

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Auto index update

2007-03-28 Thread netaji . k

Hello,

Can anybody suggest me of what is the best method to implement auto index
update on SOLR from mysql database.

thanks and regards
aditya

Re: index problem

2007-03-29 Thread Yonik Seeley


On 3/29/07, James liu <[EMAIL PROTECTED]> wrote:

i use freebsd6, tomcat 6(without install)+jdk1.5_07+php5+mssql

i debug my program and data is ok before update to do index

and index process is ok. no error.

but i find index file not what i wanna.  it have changed.

tomcat6's server.xml,,i added "URIEncoding="UTF-8"

data send to solr do index by curl (with utf-8)


anyone know how to fix it?


Can you reproduce with a small example that shows
1) the document being sent to Solr (complete with HTTP headers, or at
least the curl command)
2) the query that shows the problem

-Yonik

Re: Index Files

2007-03-29 Thread Yonik Seeley


On 3/29/07, Michael Beccaria <[EMAIL PROTECTED]> wrote:

Simple curious question from a newbie:

Can I have another computer index my data, then copy the index folder
files into my live system and run a commit?

The project idea I have is for a library catalog which will update
holdings information (whether a book is checked out) for an item record
(also a solr\lucene record). My collection is small enough that it can
index the entire library collection in about 20 minutes. If I have
another computer continually indexing, then copy those files to my live
system and commit, will that successfully update the index?


Yes, this is essentially what Solr's distribution scripts do in an
automated way.

-Yonik

Re: index problem

2007-03-29 Thread James liu


Problem i fix it.

Thks,Yonik.

2007/3/30, Yonik Seeley <[EMAIL PROTECTED]>:


On 3/29/07, James liu <[EMAIL PROTECTED]> wrote:
> i use freebsd6, tomcat 6(without install)+jdk1.5_07+php5+mssql
>
> i debug my program and data is ok before update to do index
>
> and index process is ok. no error.
>
> but i find index file not what i wanna.  it have changed.
>
> tomcat6's server.xml,,i added "URIEncoding="UTF-8"
>
> data send to solr do index by curl (with utf-8)
>
>
> anyone know how to fix it?

Can you reproduce with a small example that shows
1) the document being sent to Solr (complete with HTTP headers, or at
least the curl command)
2) the query that shows the problem

-Yonik





--
regards
jl

Question: index performance

2007-04-13 Thread James liu


i find it will be OutOfMemory when i get more that 10k records.

so now i index 10k records( 5k / record)

if i use for to index more data. it always show OutOfMemory.


i use top to moniter  and find index finish, free memory is 125m,,and
sometime it will be 218m

it show me solr index finish and use sometime free memory?


how can i index more data than 10k records and doesn't stop by OutOfMemory.

tomcat i set memory 512m.


--
regards
jl

Re: Index corruptions?

2007-05-03 Thread Bill Au


In additional to snapshot, you can also make backup copies of your Solr
index using the backup script.
Backup are created the same way as snapshots using hard links.  Each one is
a viable full index.

Bill

On 5/3/07, Charlie Jackson <[EMAIL PROTECTED]> wrote:


I have a couple of questions regarding index corruptions.



1) Has anyone using Solr in a production environment ever experienced an
index corruption? If so, how frequently do they occur?



2) It seems like the CollectionDistribution setup would be a good way to
put in place a recovery plan for (or at least have some viable backups
of) the index. However, I have a small concern that if the index gets
corrupted on the master server, the corruption would propagate down to
the slave servers as well. Is this concern unfounded? Also, each of the
snapshots taken by snapshooter are viable full indexes, correct? If so,
that means I'd have a backup of the index each and every time a commit
(or optimize for that matter) is done, which would be awesome.



One of our biggest requirements for the indexing process is to have a
good backup/recover strategy in place and I want to make sure Solr will
be able to provide that.



Thanks in advance!



Charlie

Re: Index corruptions?

2007-05-07 Thread Tom Hill


Hi Charlie,

On 5/3/07, Charlie Jackson <[EMAIL PROTECTED]> wrote:


I have a couple of questions regarding index corruptions.

1) Has anyone using Solr in a production environment ever experienced an
index corruption? If so, how frequently do they occur?



I once had all slaves complain about a missing file in the index. The master
never had a problem. The problem went away at the next snapshot.

Is the "cp-lr" in snapshot really guaranteed to be atomic? Or is it just
fast, and unlikely to be interrupted?

This has only occurred once over the last 5  months.

2) It seems like the CollectionDistribution setup would be a good way to

put in place a recovery plan for (or at least have some viable backups
of) the index. However, I have a small concern that if the index gets
corrupted on the master server, the corruption would propagate down to
the slave servers as well. Is this concern unfounded?



I would expect this to be true.

Also, each of the

snapshots taken by snapshooter are viable full indexes, correct? If so,
that means I'd have a backup of the index each and every time a commit
(or optimize for that matter) is done, which would be awesome.



That's my understanding.

Tom

Re: Index corruptions?

2007-05-09 Thread Yonik Seeley


On 5/7/07, Tom Hill <[EMAIL PROTECTED]> wrote:

Is the "cp-lr" in snapshot really guaranteed to be atomic? Or is it just
fast, and unlikely to be interrupted?


It's called from Solr within a synchronized context, and it's
guaranteed that no index changes (via Solr at least) will happen
concurrently.

-Yonik

Re: Index Concurrency

2007-05-09 Thread Yonik Seeley


On 5/9/07, joestelmach <[EMAIL PROTECTED]> wrote:

My first intuition is to give each user their own index. My thinking here is
that querying would be faster (since each user's index would be much smaller
than one big index,) and, more importantly, that I would dodge any
concurrency issues stemming from multiple threads trying to update the same
index simultaneously.  I realize that Lucene implements a locking mechanism
to protect against concurrent access, but I seem to hit the lock access
timeout quite easily with only a couple threads.

After looking at solr, I would really like to take advantage of the many
features it adds to Lucene, but it doesn't look like I'll be able to achieve
multiple indexes.


No, not currently.  Start your implementation with just a single
index... unless it is very large, it will likely be fast enough.

Solr also handles all the concurrency issues, and you should never hit
"lock access timeout" when updating from multiple threads.

-Yonik

Re: Index Concurrency

2007-05-09 Thread joestelmach


Yonik,

Thanks for  your fast reply.

> No, not currently.  Start your implementation with just a single
> index... unless it is very large, it will likely be fast enough.

My index will get quite large

> Solr also handles all the concurrency issues, and you should never hit
> "lock access timeout" when updating from multiple threads.

Does solr provide any additional concurrency control over what Lucene
provides?  In my simple testing of indexing 2,000 messages, solr would issue
lock access timeouts with as little as 10 threads.   Running all 2,000
messages through sequentially yields no problems at all.   Actually, I'm
able churn through over 100,000 messages when no threads are involved.  Am I
missing some concurrency settings?

Thanks,
Joe


-- 
View this message in context: 
http://www.nabble.com/Index-Concurrency-tf3718634.html#a10406382
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index Concurrency

2007-05-09 Thread Yonik Seeley


On 5/9/07, joestelmach <[EMAIL PROTECTED]> wrote:

Does solr provide any additional concurrency control over what Lucene
provides?


Yes, coordination between the main index searcher, the index writer,
and the index reader needed to delete other documents.


In my simple testing of indexing 2,000 messages, solr would issue
lock access timeouts with as little as 10 threads.


That's weird... I've never seen that.
The lucene write lock is only obtained when the IndexWriter is created.
Can you post the relevant part of the log file where the exception happens?

Also, unless you have at least 6 CPU cores or so, you are unlikely to
see greater throughput with 10 threads.  If you add multiple documents
per HTTP-POST (such that HTTP latency is minimized), the best setting
would probably be nThreads == nCores.  For a single doc per POST, more
threads will serve to cover the latency and keep Solr busy.

-Yonik

Re: Index Concurrency

2007-05-10 Thread Otis Gospodnetic

Though, isn't there a recent patch to allow multiple indices under a single 
Solr instance in JIRA?

Otis
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, May 9, 2007 6:32:33 PM
Subject: Re: Index Concurrency

On 5/9/07, joestelmach <[EMAIL PROTECTED]> wrote:
> My first intuition is to give each user their own index. My thinking here is
> that querying would be faster (since each user's index would be much smaller
> than one big index,) and, more importantly, that I would dodge any
> concurrency issues stemming from multiple threads trying to update the same
> index simultaneously.  I realize that Lucene implements a locking mechanism
> to protect against concurrent access, but I seem to hit the lock access
> timeout quite easily with only a couple threads.
>
> After looking at solr, I would really like to take advantage of the many
> features it adds to Lucene, but it doesn't look like I'll be able to achieve
> multiple indexes.

No, not currently.  Start your implementation with just a single
index... unless it is very large, it will likely be fast enough.

Solr also handles all the concurrency issues, and you should never hit
"lock access timeout" when updating from multiple threads.

-Yonik

Re: Index Concurrency

2007-05-10 Thread joestelmach



> Yes, coordination between the main index searcher, the index writer,
> and the index reader needed to delete other documents.

Can you point me to any documentation/code that describes this
implementation?

> That's weird... I've never seen that.
> The lucene write lock is only obtained when the IndexWriter is created.
> Can you post the relevant part of the log file where the exception
> happens?

After doing some more testing, I believe it was a stale lock file that was
causing me to have these lock issues yesterday - sorry for the false alarm
:)

> Also, unless you have at least 6 CPU cores or so, you are unlikely to
> see greater throughput with 10 threads.  If you add multiple documents
> per HTTP-POST (such that HTTP latency is minimized), the best setting
> would probably be nThreads == nCores.  For a single doc per POST, more
> threads will serve to cover the latency and keep Solr busy.

I agree with your thinking here.  My requirement for a large number of
threads is somewhat of an artifact of my current system design.  I'm trying
not to serialize the system's processing at the point of indexing.
-- 
View this message in context: 
http://www.nabble.com/Index-Concurrency-tf3718634.html#a10424207
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Index Concurrency

2007-05-11 Thread Yonik Seeley

On 5/10/07, joestelmach <[EMAIL PROTECTED]> wrote:

> Yes, coordination between the main index searcher, the index writer,
> and the index reader needed to delete other documents.

Can you point me to any documentation/code that describes this
implementation?

Look at SolrCore.getSearcher() and DirectUpdateHandler2.

-Yonik

Delete entire index

2007-06-13 Thread Matt Mitchell


Hi,
Is there a way to have Solr completely remove the current index?  
 ?


We're still in development and so our schema is wavering. Anytime we  
make a change and want to re-index we first have to:


stop tomcat (or the solr webapp)
manually remove the data/index
restart tomcat (or the solr webapp)

The removing of the data/index directory is where we have the most  
trouble, because of the file permissions. The data/index directory is  
owned by tomcat/tomcat so in order to remove it, we have to issue  
sudo rm which we'd like to avoid.


Ideally if we could just tell Solr to delete all data without having  
to do anymore manual work, it'd be great! : )


Something else that would help is if we tell Tomcat/Solr which user/ 
group and/or permission to use on the data/index directory when it's  
created.


Any thoughts on this?

Matt

solr index problem

2007-07-17 Thread James liu


when i index 1.7m docs and 4k-5k per doc.

OutOfMemory happen when it finish index ~1.13m docs

I just restart tomcat , delete all lock and restart do index.

No error or warning infor until it finish.


anyone know why? or have the same error?

--
regards
jl

Re: Optimize index

2007-08-08 Thread Matthew Runo

You optimize by sending a  command to the SOLR update  
handler. I'm not sure about the different index formats though..


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Aug 8, 2007, at 12:32 PM, Jae Joo wrote:

Does anyone know how to optimize the index and what the difference  
between

compound format and stand format?

Thanks,

Jae Joo

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1884 matches

Mail list logo