date:20090324

On Wed, Mar 25, 2009 at 10:14 AM, Gargate, Siddharth wrote:

> Hi all,
>Can we specify the index-time boost value for a particular field in
> schema.xml?
>

No. You can specify it along with the document when you add it to Solr.

-- 
Regards,
Shalin Shekhar Mangar.

Snapinstaller + Overlapping onDeckSearchers Problems

2009-03-24 Thread Cloude Porteus

We have been running our solr slaves without autowarming our new searchers
for a long time, but that was causing us 50-75 requests in 20+ seconds
timeframe after every update on the slaves. I have turned on autowarming and
that has fixed our slow response times, but I'm running into occasional
Overlapping onDeckSearchers.

We have replication setup and are using the snapinstaller script every 10
minutes:

/home/solr/bin/snappuller -M util01 -P 18984 -D /home/solr/write/data -S
/home/solr/logs -d /home/solr/read/data -u instruct;
/home/solr/bin/snapinstaller -M util01 -S /home/solr/write/logs -d
/home/solr/read/data -u instruct

Here's what a successful update/commit log looks like:

[14:13:02.510] start
commit(optimize=false,waitFlush=false,waitSearcher=true)
[14:13:02.522] Opening searc...@e9b4bb main
[14:13:02.524] end_commit_flush
[14:13:02.525] autowarming searc...@e9b4bb main from searc...@159e6e8 main
[14:13:02.525]
filterCache{lookups=1809739,hits=1766607,hitratio=0.97,inserts=43211,evictions=0,
size=43154,cumulative_lookups=1809739,cumulative_hits=1766607,cumulative_hitratio=0.97,cumulative_inserts=43211,cumulative_evictions=0}
--
[14:15:42.372] {commit=} 0 159964
[14:15:42.373] /update  0 159964

Here's what a unsuccessful update/commit log looks like, where the /update
took too long and we started another commit:

[21:03:03.829] start
commit(optimize=false,waitFlush=false,waitSearcher=true)
[21:03:03.836] Opening searc...@b2f2d6 main
[21:03:03.836] end_commit_flush
[21:03:03.836] autowarming searc...@b2f2d6 main from searc...@103c520 main
[21:03:03.836]
filterCache{lookups=1062196,hits=1062160,hitratio=0.99,inserts=49144,evictions=0,size=48353,cumulative_lookups=259485564,cumulative_hits=259426904,cumulative_hitratio=0.99,cumulative_inserts=68467,cumulative_evictions=0}
--
[21:23:04.794] start
commit(optimize=false,waitFlush=false,waitSearcher=true)
[21:23:04.794] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
[21:23:04.802] Opening searc...@f11bc main
[21:23:04.802] end_commit_flush
--
[21:24:55.987] {commit=} 0 1312158
[21:24:55.987] /update  0 1312158


I don't understand why this sometimes takes two minutes between the start
commit & /update and sometimes takes 20 minutes? One of our caches has about
~40,000 items, but I can't imagine it taking 20 minutes to autowarm a
searcher.

It would be super handy if the Snapinstaller script would wait until the
previous one was done before starting a new one, but I'm not sure how to
make that happen.

Thanks for any help with this.

best,
cloude

-- 
VP of Product Development
Instructables.com

http://www.instructables.com/member/lebowski

Index time boost

2009-03-24 Thread Gargate, Siddharth

Hi all,
Can we specify the index-time boost value for a particular field in
schema.xml?
 
Thanks,
Siddharth

Re: Not able to configure multicore

2009-03-24 Thread mitulpatel




hossman wrote:
> 
> 
> : I am facing a problem related to multiple cores configuration. I have
> placed
> : a solr.xml file in solr.home directory. eventhough when I am trying to
> : access http://localhost:8983/solr/admin/cores it gives me tomcat error. 
> : 
> : Can anyone tell me what can be possible issue with this??
> 
> not without knowing exactly what the tomcat error message is, what your 
> solr.xml file looks like, what log messages you see on startup, etc...
> 
> -Hoss
> 
> 
Hello Hoss,

Thanks for reply.

Here is the error message shown on browser:
HTTP Status 404 - /solr2/admin/cores
type Status report
message /solr2/admin/cores
description The requested resource (/solr2/admin/cores) is not available.

and here is the solr.xml file.

 
  
  
 



-- 
View this message in context: 
http://www.nabble.com/Not-able-to-configure-multicore-tp22682691p22695098.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Delta import

On Wed, Mar 25, 2009 at 2:25 AM, AlexxelA wrote:

>
> Ok i'm ok with the fact the solr gonna do X request to database for X
> update.. but when i try to run the delta-import command with 2 row to
> update is it normal that its kinda really slow ~ 1 document fetched / sec ?
>
>
Not really, I've seen 1000x faster. Try firing a few of those queries on the
database directly. Are they slow? Is the database remote?

-- 
Regards,
Shalin Shekhar Mangar.

Re: lucene-java version mismatches

On Wed, Mar 25, 2009 at 3:23 AM, Paul Libbrecht  wrote:

>
> could I suggest that the maven repositories are populated next-time a
> release of "solr-specific-lucenes" are made?

But they are? It is inside the org.apache.solr group since those lucene jars
are released by Solr -- http://repo2.maven.org/maven2/org/apache/solr/

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr index deletion


Hm, you are not saying much about what you've tried.  Could it be your Solr 
home is wrong and not even pointing to the index you just checked?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Nasseam Elkarra 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, March 24, 2009 7:47:08 PM
> Subject: Re: Solr index deletion
> 
> The tool says there are no problems. Solr is pointing to the right directory 
> so 
> not sure what is preventing it from returning any results. Any ideas? Here is 
> the output:
> 
> Segments file=segments_2 numSegments=1 version=FORMAT_USER_DATA [Lucene 2.9]
>   1 of 1: name=_0 docCount=18021
> compound=false
> hasProx=true
> numFiles=9
> size (MB)=8.389
> has deletions [delFileName=_0_1.del]
> test: open reader.OK [18 deleted docs]
> test: fields, norms...OK [35 fields]
> test: terms, freq, prox...OK [60492 terms; 1157700 terms/docs pairs; 
> 1224063 
> tokens]
> test: stored fields...OK [386828 total field count; avg 21.487 fields 
> per doc]
> test: term vectorsOK [0 total vector count; avg 0 term/freq 
> vector 
> fields per doc]
> 
> No problems were detected with this index.
> 
> --
> 
> Thanks,
> Nasseam
> 
> 
> On Mar 24, 2009, at 1:34 PM, Otis Gospodnetic wrote:
> 
> > 
> > There is, it's called CheckIndex and it is a part of Lucene (and Lucene 
> > jars 
> that come with Solr, I believe):
> > 
> > 
> http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html
> > 
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> >> From: Nasseam Elkarra 
> >> To: solr-user@lucene.apache.org
> >> Sent: Tuesday, March 24, 2009 4:21:50 PM
> >> Subject: Re: Solr index deletion
> >> 
> >> Correction: index was not deleted. The folder is still there with the index
> >> files in it but a *:* query returns 0 results. Is there a tool to check the
> >> health of an index?
> >> 
> >> Thanks,
> >> Nasseam
> >> 
> >> On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:
> >> 
> >>> 
> >>> Somehow that sounds very unlikely.  Have you looked at logs?  What have 
> >>> you
> >> found from Solr there?  I am not checking the sources, but I don't think 
> there
> >> is any place in Solr where the index directory gets deleted.
> >>> 
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>> 
> >>> 
> >>> 
> >>> - Original Message 
>  From: Nasseam Elkarra
>  To: solr-user@lucene.apache.org
>  Sent: Tuesday, March 24, 2009 2:35:22 PM
>  Subject: Solr index deletion
>  
>  On a few occasions, our development server crashed and in the process 
>  solr
>  deleted the index folder. We are suspecting another app on the server 
> caused
> >> an
>  OutOfMemoryException on Tomcat causing all apps including solr to crash.
>  
>  So my question is why is solr deleting the index? We are not doing any
> >> updates
>  to the index only reading from it so any insight would be appreciated.
>  
>  Thank you,
>  Nasseam
> >>> 
> >

Re: autocommit and crashing tomcat

2009-03-24 Thread Jacob Singh

Hi Yonik,

Thanks for the response.  If I shut down tomcat cleanly, does it
commit all uncommitted documents?

Best,
Jacob

-- Forwarded message --
From: Yonik Seeley 
Date: Tue, Mar 24, 2009 at 8:48 PM
Subject: Re: autocommit and crashing tomcat
To: solr-user@lucene.apache.org

On Tue, Mar 24, 2009 at 5:52 AM, Jacob Singh  wrote:
> If I'm using autocommit, and I have a crash of tomcat (or the whole
> machine) while there are still docs pending, will I lose those
> documents in limbo

Yep.

> If the answer is "they go away": Is there anyway to ensure integrity
> of an update?

You can only be sure that the docs are on the disk after you have done a commit.

An optional transaction log would be part of high-availability for
writes, is something we should eventually get to though.

-Yonik
http://www.lucidimagination.com

-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com

get all facets

2009-03-24 Thread Ashish P


Can I get all the facets in QueryResponse??
Thanks,
Ashish
-- 
View this message in context: 
http://www.nabble.com/get-all-facets-tp22693809p22693809.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr index deletion

The tool says there are no problems. Solr is pointing to the right  
directory so not sure what is preventing it from returning any  
results. Any ideas? Here is the output:


Segments file=segments_2 numSegments=1 version=FORMAT_USER_DATA  
[Lucene 2.9]

  1 of 1: name=_0 docCount=18021
compound=false
hasProx=true
numFiles=9
size (MB)=8.389
has deletions [delFileName=_0_1.del]
test: open reader.OK [18 deleted docs]
test: fields, norms...OK [35 fields]
test: terms, freq, prox...OK [60492 terms; 1157700 terms/docs  
pairs; 1224063 tokens]
test: stored fields...OK [386828 total field count; avg  
21.487 fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/ 
freq vector fields per doc]


No problems were detected with this index.

--

Thanks,
Nasseam


On Mar 24, 2009, at 1:34 PM, Otis Gospodnetic wrote:



There is, it's called CheckIndex and it is a part of Lucene (and  
Lucene jars that come with Solr, I believe):


http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Nasseam Elkarra 
To: solr-user@lucene.apache.org
Sent: Tuesday, March 24, 2009 4:21:50 PM
Subject: Re: Solr index deletion

Correction: index was not deleted. The folder is still there with  
the index
files in it but a *:* query returns 0 results. Is there a tool to  
check the

health of an index?

Thanks,
Nasseam

On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:



Somehow that sounds very unlikely.  Have you looked at logs?  What  
have you
found from Solr there?  I am not checking the sources, but I don't  
think there

is any place in Solr where the index directory gets deleted.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Nasseam Elkarra
To: solr-user@lucene.apache.org
Sent: Tuesday, March 24, 2009 2:35:22 PM
Subject: Solr index deletion

On a few occasions, our development server crashed and in the  
process solr
deleted the index folder. We are suspecting another app on the  
server caused

an
OutOfMemoryException on Tomcat causing all apps including solr to  
crash.


So my question is why is solr deleting the index? We are not  
doing any

updates
to the index only reading from it so any insight would be  
appreciated.


Thank you,
Nasseam

Re: Hardware Questions...

2009-03-24 Thread Shashi Kant

Have you looked at http://wiki.apache.org/solr/SolrPerformanceData
?

On Tue, Mar 24, 2009 at 4:51 PM, solr  wrote:

> We have three Solr servers (several two processor Dell PowerEdge
> servers). I'd like to get three newer servers and I wanted to see what
> we should be getting. I'm thinking the following...
>
>
>
> Dell PowerEdge 2950 III
>
> 2x2.33GHz/12M 1333MHz Quad Core
>
> 16GB RAM
> 6 x 146GB 15K RPM RAID-5 drives
>
>
>
> How do people spec out servers, especially CPU, memory and disk? Is this
> all based on the number of doc's, indexes, etc...
>
>
>
> Also, what are people using for benchmarking and monitoring Solr? Thanks
> - Mike
>
>

Re: Trivial question: request for id when indexing using CURL & ExtractingRequestHandler

2009-03-24 Thread Erik Hatcher

If your text field is not stored, then it won't be available in  
results.  That's the likely explanation.  Seems like all is well.


Erik

On Mar 24, 2009, at 11:34 PM, Chris Muktar wrote:


Fantastic thank you!

I'm executing this:
curl -F "te...@zheng.doc" -F 'commit=true'
http://localhost:8983/solr/update/extract?ext.def.fl=text 
\&ext.literal.id=2


however performing the query
http://localhost:8983/solr/select?q=id:2

produces the output but without a text field. I'm not sure if it's  
being
extracted & indexed correctly. The commit is going through though.  
This is

using the example schema. Any thoughts? XML response follows...


-

0
2
-

id:2


-

-

2
0
2
2009-03-24T22:27:00.714Z

Re: delta-import commit=false doesn't seems to work

2009-03-24 Thread sunnyfr


Hi,
Sorry I still don't know what should I do ???
I can see in my log which clearly optimize somewhere even if my command is
deltaimport&optimize=false
is it a parameter to add to the commit or to the snappuller or ???


Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM
org.apache.solr.handler.dataimport.SolrWriter persistStartTime INFO: Wrote
last indexed time to dataimport.properties
Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM
org.apache.solr.handler.dataimport.DocBuilder commit INFO: Full Import
completed successfully
Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM
org.apache.solr.update.DirectUpdateHandler2 commit INFO: start
commit(optimize=true,waitFlush=false,waitSearcher=true)

thanks a lot for your help


sunnyfr wrote:
> 
> Like you can see, I did that and I've no information in my DIH but you can
> notice in my logs and even my segments 
> that and optimize is fired alone automaticly?
> 
> 
> Noble Paul നോബിള്‍  नोब्ळ् wrote:
>> 
>> just hit the DIH without any command and you may be able to see the
>> status of the last import. It can tell you whether a commit/optimize
>> was performed
>> 
>> On Fri, Mar 20, 2009 at 7:07 PM, sunnyfr  wrote:
>>>
>>> Thanks I gave more information there :
>>> http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-td22601442.html
>>>
>>> thanks a lot Paul
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ् wrote:

 sorry, the whole thing was commented . I did not notice that. I'll
 look into that

 2009/3/20 Noble Paul നോബിള്‍  नोब्ळ् :
> you have set autoCommit every x minutes . it must have invoked commit
> automatically
>
>
> On Thu, Mar 19, 2009 at 4:17 PM, sunnyfr  wrote:
>>
>> Hi,
>>
>> Even if I hit command=delta-import&commit=false&optimize=false
>> I still have commit set in my logs and sometimes even optimize=true,
>>
>> About optimize I wonder if it comes from commitment too close and one
>> is
>> not
>> done, but still I don't know really.
>>
>> Any idea?
>>
>> Thanks a lot,
>> --
>> View this message in context:
>> http://www.nabble.com/delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22597630p22597630.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> --Noble Paul
>



 --
 --Noble Paul


>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Re%3A-delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22614216p22620439.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
>> 
>> -- 
>> --Noble Paul
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Re%3A-delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22614216p22691417.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Trivial question: request for id when indexing using CURL & ExtractingRequestHandler

2009-03-24 Thread Chris Muktar

Fantastic thank you!

I'm executing this:
curl -F "te...@zheng.doc" -F 'commit=true'
http://localhost:8983/solr/update/extract?ext.def.fl=text\&ext.literal.id=2

however performing the query
http://localhost:8983/solr/select?q=id:2

produces the output but without a text field. I'm not sure if it's being
extracted & indexed correctly. The commit is going through though. This is
using the example schema. Any thoughts? XML response follows...


-

0
2
-

id:2


-

-

2
0
2
2009-03-24T22:27:00.714Z

Re: lucene-java version mismatches

2009-03-24 Thread Paul Libbrecht





Le 24-mars-09 à 11:14, Shalin Shekhar Mangar a écrit :

On Tue, Mar 24, 2009 at 3:30 PM, Paul Libbrecht  
 wrote:

Is there a lucene version that solr-lucene-core-1.3.0 is?


The lucene jars shipped with Solr 1.3.0 were 2.4-dev built from svn  
revision
r691741. You can check out the source from lucene's svn using that  
revision

number.


thanks,

that's useful,

could I suggest that the maven repositories are populated next-time a  
release of "solr-specific-lucenes" are made?


paul

smime.p7s
Description: S/MIME cryptographic signature

Re: Trivial question: request for id when indexing using CURL & ExtractingRequestHandler


Deja-Vu...

http://www.nabble.com/Missing-required-field%3A-id-Using-ExtractingRequestHandler-to22611039.html

: I'm performing this operation:
: 
: curl http://localhost:8983/solr/update/extract?ext.def.fl=text --data-binary
: @ZOLA.doc -H 'Content-type:text/html'
: 
: in order to index word document ZOLA.doc into Solr using the example
: schema.xml. It says I have not provided an 'id', which is a required field.
: I'm not sure how (syntactically) to provide the id- should it be part of the
: query string? And if so, how?


-Hoss

Re: Not able to configure multicore


: I am facing a problem related to multiple cores configuration. I have placed
: a solr.xml file in solr.home directory. eventhough when I am trying to
: access http://localhost:8983/solr/admin/cores it gives me tomcat error. 
: 
: Can anyone tell me what can be possible issue with this??

not without knowing exactly what the tomcat error message is, what your 
solr.xml file looks like, what log messages you see on startup, etc...




-Hoss

Re: Response schema for an update.


: Subject: Response schema for an update.
: In-Reply-To: 
: References: <69de18140903230141t38dbcd28n40bbcc944ddb0...@mail.gmail.com>
:  

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is "hidden" in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking





-Hoss

RE: Exact Match


: Depending on your needs, you might want to do some sort of minimal
: analysis on the field (ignore punctuation, lowercase,...) Here's the
: text_exact field that I use:

Deans reply is a great example of what "exact" is a vague term.  

with a TextField you can get an "exact" match using a simple phrase query 
(ie: putting quotes arround the input) assuming your meaning of "exact" is 
that all the tokens appera together in sequence, and assuming your 
analyzer doesn't change things in a way that makes a phrase search match 
in a way that you don't consider "exact enough"

if you want to ensure that the documents contains exactly what the user 
queried for, no more and no less, then using a copyField into StrField is 
really the best way to do that.




-Hoss

Re: Commit is taking very long time


: My application is in prod and quite frequently�getting NullPointerException.
...
: java.lang.NullPointerException
: at 
com.fm.search.incrementalindex.service.AuctionCollectionServiceImpl.indexData(AuctionCollectionServiceImpl.java:251)
: at 
com.fm.search.incrementalindex.service.AuctionCollectionServiceImpl.process(AuctionCollectionServiceImpl.java:135)
: at 
com.fm.search.job.SearchIndexingJob.executeInternal(SearchIndexingJob.java:68)
: at 
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
: at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
: at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)

that stack trace doesn't suggest anything remotely related to Solr.  none 
of those classes are in teh Solr code base -- without having any idea what 
the code on line 251 of your AuctionCollectionServiceImpl class looks 
like, no one could even begin to speculate what is causing the NPE.   Even 
if we know what line 251 looks like, uinderstanding why some refrence on 
that line is null would probably require knowing a whole lot more about 
your application.




-Hoss

Re: Delta import

2009-03-24 Thread AlexxelA


Ok i'm ok with the fact the solr gonna do X request to database for X
update.. but when i try to run the delta-import command with 2 row to
update is it normal that its kinda really slow ~ 1 document fetched / sec ?



Noble Paul നോബിള്‍  नोब्ळ् wrote:
> 
> not possible really,
> 
> that may not be useful to a lot of users because there may be too many
> changed ids and the 'IN' part can be really long.
> 
> You can raise an issue anyway
> 
> 
> 
> On Mon, Mar 23, 2009 at 9:30 PM, AlexxelA 
> wrote:
>>
>> I'm using the delta-import command.
>>
>> Here's the deltaQuery and deltaImportQuery i use :
>>
>> select uid from profil_view where last_modified >
>> '${dataimporter.last_index_time}'
>> select * from profil_view where uid='${dataimporter.delta.uid}
>>
>> When i look at the delta import status i see that the total request to
>> datasource equal the number of modification i had.  Is it possible to
>> make
>> only one request to database and fetch all modification ?
>>
>> select * from profil_view where uid in ('${dataimporter.delta.ALLuid}')
>> (something like that).
>> --
>> View this message in context:
>> http://www.nabble.com/Delta-import-tp22663196p22663196.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> --Noble Paul
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Delta-import-tp22663196p22689588.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Field tokenizer question


: as far as I know solr.StrField is not analized but it is indexed as is
: (verbatim).

correct ... but there is definitely a bug here if the analysis.jsp 
is implying that an analyzer is being used...

https://issues.apache.org/jira/browse/SOLR-1086




-Hoss

Hardware Questions...

2009-03-24 Thread solr

We have three Solr servers (several two processor Dell PowerEdge
servers). I'd like to get three newer servers and I wanted to see what
we should be getting. I'm thinking the following...

 

Dell PowerEdge 2950 III 

2x2.33GHz/12M 1333MHz Quad Core 

16GB RAM 
6 x 146GB 15K RPM RAID-5 drives

 

How do people spec out servers, especially CPU, memory and disk? Is this
all based on the number of doc's, indexes, etc...

 

Also, what are people using for benchmarking and monitoring Solr? Thanks
- Mike

Re: Solr index deletion


There is, it's called CheckIndex and it is a part of Lucene (and Lucene jars 
that come with Solr, I believe):

http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html

 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Nasseam Elkarra 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, March 24, 2009 4:21:50 PM
> Subject: Re: Solr index deletion
> 
> Correction: index was not deleted. The folder is still there with the index 
> files in it but a *:* query returns 0 results. Is there a tool to check the 
> health of an index?
> 
> Thanks,
> Nasseam
> 
> On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:
> 
> > 
> > Somehow that sounds very unlikely.  Have you looked at logs?  What have you 
> found from Solr there?  I am not checking the sources, but I don't think 
> there 
> is any place in Solr where the index directory gets deleted.
> > 
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > 
> > 
> > 
> > - Original Message 
> >> From: Nasseam Elkarra 
> >> To: solr-user@lucene.apache.org
> >> Sent: Tuesday, March 24, 2009 2:35:22 PM
> >> Subject: Solr index deletion
> >> 
> >> On a few occasions, our development server crashed and in the process solr
> >> deleted the index folder. We are suspecting another app on the server 
> >> caused 
> an
> >> OutOfMemoryException on Tomcat causing all apps including solr to crash.
> >> 
> >> So my question is why is solr deleting the index? We are not doing any 
> updates
> >> to the index only reading from it so any insight would be appreciated.
> >> 
> >> Thank you,
> >> Nasseam
> >

Re: Solr index deletion

Correction: index was not deleted. The folder is still there with the  
index files in it but a *:* query returns 0 results. Is there a tool  
to check the health of an index?


Thanks,
Nasseam

On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:



Somehow that sounds very unlikely.  Have you looked at logs?  What  
have you found from Solr there?  I am not checking the sources, but  
I don't think there is any place in Solr where the index directory  
gets deleted.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Nasseam Elkarra 
To: solr-user@lucene.apache.org
Sent: Tuesday, March 24, 2009 2:35:22 PM
Subject: Solr index deletion

On a few occasions, our development server crashed and in the  
process solr
deleted the index folder. We are suspecting another app on the  
server caused an
OutOfMemoryException on Tomcat causing all apps including solr to  
crash.


So my question is why is solr deleting the index? We are not doing  
any updates
to the index only reading from it so any insight would be  
appreciated.


Thank you,
Nasseam

Re: Multi-select on more than one facet field

2009-03-24 Thread Yonik Seeley

On Tue, Mar 24, 2009 at 2:29 PM, Nasseam Elkarra  wrote:
> Looking at the example here:
> http://wiki.apache.org/solr/SimpleFacetParameters#head-4ba81c89b265c3b5992e3292718a0d100f7251ef
>
> This being the query for selecting PDF:
> q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype
>
> How would you do the query for selecting PDF OR Excel AND, assuming there is
> another facet field named author, where author is Mike?

If author is not a multi-select facet (i.e. you already selected
author:Mike and hence wish to no longer get other counts for the
author field) then:

q=mainquery
&fq=status:public
&fq={!tag=dt}doctype:(PDF OR Excel)
&fq=author:Mike
&facet=on&facet.field={!ex=dt}doctype

If author *is* multi-select, then you wish to get facet counts for the
author field, ignoring the author:Mike restriction for the author
facet only:

q=mainquery
&fq=status:public
&fq={!tag=dt}doctype:(PDF OR Excel)
&fq={!tag=auth}author:Mike
&facet=on&facet.field={!ex=dt}doctype
&facet.field={!ex=auth}author


-Yonik
http://www.lucidimagination.com

Re: How to Index IP address

2009-03-24 Thread Alexandre Rafalovitch

Well,

A log file is theoretically structured. Every log record is a - very -
flat set of fields. So, every log file line would be a Lucene
document. Then, one could use Solr to search, filter and facet
records.

Of course, this requires parsing log file back into record components.
Most log files were created for output, not for re-input. But if you
can parse it back, you might be able to do custom data import. Or, if
you can intercept log file before it hits serialization, you might be
able to index the fields directly.

Or you could just buy Splunk ( http://www.splunk.com/ ) and be done
with it. Parsing and visualizing log files is exactly what they set
out to deal with. No (great) open source solution yet.

Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)

On Tue, Mar 24, 2009 at 2:40 PM, Matthew Runo  wrote:
> Well, I think you'll have the same problem. Lucene, and Solr (since it's
> built on Lucene) are both going to expect a structured document as input.
> Once you send in a bunch of documents, you can then query them for whatever
> you want to find.

Re: Problem with Facet Date Query

:
: This is my query:
:
q=productPublicationDate_product_dt:[*%20TO%20NOW]&facet=true&facet.field=productPublicationDate_product_dt:[*%20TO%20NOW]&qt=dismaxrequest

that specific error is happening because you are passing this string...

productPublicationDate_product_dt:[*%20TO%20NOW]

...to the facet.field param. that parameter expects the name of a field,
and it will then facet on all the indexed values. what you are passing it
isn't the name of a field, you are passing it a query string. if you want
the faceting count for a query string, use the facet.query param, which
you already seem to be doing with a different range of dates by hardcoding
it into your solrconfig...

: I have entered this field in solrConfig.xml also in the below manner.
:
:
: cat
: manu_exact
: price:[* TO 500]
: price:[500 TO *]
: productPublicationDate_product_dt:[* TO
: NOW/DAY-1MONTH]^2.2
:

I'm not entirely sure what it is you are trying to do, but you're also
going to have problems because you are using the "standard" query syntax
in your q param, but you have specified qt=dismax.

Please explain what your *goal* is and then people can help you explain
how to achieve your goal ... what you've got here in your example makes no
sense, and it's not clear what advice to give you to get it to make
sense without knowing what it is you want to do. This is similar to an XY
Problem...

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue. Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341

-Hoss

Re: Trivial question: request for id when indexing using CURL & ExtractingRequestHandler

2009-03-24 Thread Chris Muktar

I've tried this too, still no luck:
curl http://localhost:8983/solr/update/extract?ext.def.fl=text -F id=123 -F
te...@zola.doc


2009/3/24 Chris Muktar 

> I'm performing this operation:
>
> curl http://localhost:8983/solr/update/extract?ext.def.fl=text--data-binary 
> @ZOLA.doc -H 'Content-type:text/html'
>
> in order to index word document ZOLA.doc into Solr using the example
> schema.xml. It says I have not provided an 'id', which is a required field.
> I'm not sure how (syntactically) to provide the id- should it be part of the
> query string? And if so, how?
>
> Any help much appreciated!
> Thanks!
> Chris.
>

Re: Solr index deletion


Somehow that sounds very unlikely.  Have you looked at logs?  What have you 
found from Solr there?  I am not checking the sources, but I don't think there 
is any place in Solr where the index directory gets deleted.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Nasseam Elkarra 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, March 24, 2009 2:35:22 PM
> Subject: Solr index deletion
> 
> On a few occasions, our development server crashed and in the process solr 
> deleted the index folder. We are suspecting another app on the server caused 
> an 
> OutOfMemoryException on Tomcat causing all apps including solr to crash.
> 
> So my question is why is solr deleting the index? We are not doing any 
> updates 
> to the index only reading from it so any insight would be appreciated.
> 
> Thank you,
> Nasseam

Re: How to Index IP address

2009-03-24 Thread Matthew Runo

Well, I think you'll have the same problem. Lucene, and Solr (since
it's built on Lucene) are both going to expect a structured document
as input. Once you send in a bunch of documents, you can then query
them for whatever you want to find.

A quick search of the internets found me this Apache Labs project -
called Pinpoint. It's designed to take log data in, and build an index
out of it. I'm not sure how developed it is, but it might be a good
starting point for you. There are probably other projects out there
along the same lines.. Here's Pinpoint: http://svn.apache.org/repos/asf/labs/pinpoint/trunk/

Why do you want to use Solr / Lucene to look through your files? If
you have a huge dataset, some people are using Hadoop (a version of
Google's MapReduce) to look through very large sets of logfiles: http://www.lexemetech.com/2008/01/hadoop-and-log-file-analysis.html

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 24, 2009, at 10:28 AM, nga pham wrote:

Do you think luence is better to filter out a particular IP address
from a

txt file?

Thank you Runo,
Nga

On Tue, Mar 24, 2009 at 10:21 AM, Matthew Runo
wrote:

I don't think that Solr is the best thing to use for searching a
text file.

I'd use grep myself, if you're on a unix-like system.

To use solr, you'd need to throw each network 'event' (GET, POST,
etc etc)
into an XML document, and post those into Solr so it could generate
the

index. You could then do things like
ip:10.206.158.154 to find a specific IP address, or even ip:
10.206.158* to

get a subnet.

Perhaps the thing that's building your text file could post to Solr
instead?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 24, 2009, at 9:32 AM, nga pham wrote:

Hi All,

I have a txt file, that captured all of my network traffic. How
can I use

Solr to filter out a particular IP address?

Thank you,
Nga.

Solr index deletion

On a few occasions, our development server crashed and in the process  
solr deleted the index folder. We are suspecting another app on the  
server caused an OutOfMemoryException on Tomcat causing all apps  
including solr to crash.


So my question is why is solr deleting the index? We are not doing any  
updates to the index only reading from it so any insight would be  
appreciated.


Thank you,
Nasseam

Multi-select on more than one facet field


Looking at the example here:
http://wiki.apache.org/solr/SimpleFacetParameters#head-4ba81c89b265c3b5992e3292718a0d100f7251ef

This being the query for selecting PDF:
q=mainquery&fq=status:public&fq={! 
tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype


How would you do the query for selecting PDF OR Excel AND, assuming  
there is another facet field named author, where author is Mike?


Thank you,
Nasseam

Trivial question: request for id when indexing using CURL & ExtractingRequestHandler

2009-03-24 Thread Chris Muktar

I'm performing this operation:

curl http://localhost:8983/solr/update/extract?ext.def.fl=text --data-binary
@ZOLA.doc -H 'Content-type:text/html'

in order to index word document ZOLA.doc into Solr using the example
schema.xml. It says I have not provided an 'id', which is a required field.
I'm not sure how (syntactically) to provide the id- should it be part of the
query string? And if so, how?

Any help much appreciated!
Thanks!
Chris.

Re: Update field values without re-extracting text?


Hi Dan,

We should turn this into a FAQ.  In the mean time, have a look at SOLR-139 and 
the issue linked to that one.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Dan A. Dickey 
> To: solr-user@lucene.apache.org
> Sent: Tuesday, March 24, 2009 11:43:35 AM
> Subject: Update field values without re-extracting text?
> 
> I'd like to be able to index various documents and have the text extracted
> from them using the DataImportHandler.  I think I have this working just fine.
> 
> However, I'd later like to be able to update a field value or several, without
> re-extracting the text all over again with the DIH.  Yes - and if possible, 
> only
> update one or a few of the field values and leave the rest as is.
> 
> I haven't seen a way to do this - can it be done?
> What do I need to read yet to accomplish this?  Can someone point me in
> the right direction to do this?  Thanks!
> -Dan
> 
> -- 
> Dan A. Dickey | Senior Software Engineer

Re: How to Index IP address

2009-03-24 Thread nga pham

Do you think luence is better to filter out a particular IP address from a
txt file?

Thank you Runo,
Nga

On Tue, Mar 24, 2009 at 10:21 AM, Matthew Runo  wrote:

> I don't think that Solr is the best thing to use for searching a text file.
> I'd use grep myself, if you're on a unix-like system.
>
> To use solr, you'd need to throw each network 'event' (GET, POST, etc etc)
> into an XML document, and post those into Solr so it could generate the
> index. You could then do things like
> ip:10.206.158.154 to find a specific IP address, or even ip:10.206.158* to
> get a subnet.
>
> Perhaps the thing that's building your text file could post to Solr
> instead?
>
> Thanks for your time!
>
> Matthew Runo
> Software Engineer, Zappos.com
> mr...@zappos.com - 702-943-7833
>
>
> On Mar 24, 2009, at 9:32 AM, nga pham wrote:
>
> Hi All,
>>
>> I have a txt file, that captured all of my network traffic.  How can I use
>> Solr to filter out a particular IP address?
>>
>> Thank you,
>> Nga.
>>
>
>

Re: How to Index IP address

2009-03-24 Thread Matthew Runo

I don't think that Solr is the best thing to use for searching a text  
file. I'd use grep myself, if you're on a unix-like system.


To use solr, you'd need to throw each network 'event' (GET, POST, etc  
etc) into an XML document, and post those into Solr so it could  
generate the index. You could then do things like
ip:10.206.158.154 to find a specific IP address, or even ip: 
10.206.158* to get a subnet.


Perhaps the thing that's building your text file could post to Solr  
instead?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 24, 2009, at 9:32 AM, nga pham wrote:


Hi All,

I have a txt file, that captured all of my network traffic.  How can  
I use

Solr to filter out a particular IP address?

Thank you,
Nga.

Streaming results of analysis to shards ... possible?

2009-03-24 Thread Cass Costello

Hello all,

Our application involves a high index write rate - anywhere from a few
dozen to many thousands of docs per sec.  The write rate is frequently
higher than the read rate (though not always), and our index must be
as fresh as possible (we'd like search results to be no more than a
couple of seconds out of date). We're considering many approaches to
achieving our desired TCO.

We've noted that the indexing process can be quite costly.  Our latest
POC shards the total index over N machines which effectively
distributes the indexing load and keeps refresh and and search
response times decent, but to maintain performance during peak write
rates, we've had to make N a much larger number than we'd like.

One idea we're floating would be to do all the analysis centrally,
perhaps on N/4 machines, and then stream the raw tokens and data
directly to the read "slaves," who would (hopefully) need to do
nothing more than manage segments and readers.

We have some very rough math that makes the approach compelling, but
before diving in wholesale, we thought we'd ask if anyone else has
taken a similar approach.   Thoughts?

Sincerely,

Cass Costello
www.stubhub.com

Re: external fields storage

2009-03-24 Thread Mark Miller


Andrey Klochkov wrote:

On Tue, Mar 24, 2009 at 4:43 PM, Mark Miller  wrote:

  

Thats a tall order. It almost sounds as if you want to be able to not use
the index to store fields, but have them still fully functional as if
indexed. That would be quite the magic trick.




Look here, people wanted exactly the same feature in 2004. Is it still not
implemented?

http://www.gossamer-threads.com/lists/lucene/java-user/8672

--
Andrew Klochkov

  
Right - I was exaggerating your description a bit. It reads as if you 
want it to have all the same power as an indexed field. So I made a bad 
joke. If you want to be able to search the field, its index entry needs 
to be updated anyway. I don't see how you get search on external stored 
fields without having to update and keep them in the index - external 
field storage is simple to add on your own, either using that skwish 
library, or even a basic database. You can then do id to offset mapping 
like that guy is looking for - simply add the id to Lucene and do your 
content updates with the external db.


--
- Mark

http://www.lucidimagination.com

How to Index IP address

2009-03-24 Thread nga pham

Hi All,

I have a txt file, that captured all of my network traffic.  How can I use
Solr to filter out a particular IP address?

Thank you,
Nga.

Re: Optimize

2009-03-24 Thread sunnyfr


thanks for  your answer, then what fire merging because in my log i've
optimize=true, if it's not optimization because I don't fire it, it must me
marging how can I stop this??

Thanks a lot,


Shalin Shekhar Mangar wrote:
> 
> No, optimize is not automatic. You have to invoke it yourself just like
> commits.
> 
> Take a look at the following for examples:
> http://wiki.apache.org/solr/UpdateXmlMessages
> 
> On Thu, Oct 2, 2008 at 2:03 PM, sunnyfr  wrote:
> 
>>
>>
>> Hi,
>>
>> Can somebody explain me a bit how works optimize?
>>  I read the doc but didn't get really what fire optimize.
>>
>> Thanks a lot,
>> --
>> View this message in context:
>> http://www.nabble.com/Optimize-tp19775320p19775320.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Optimize-tp19775320p22684113.html
Sent from the Solr - User mailing list archive at Nabble.com.

Fwd: multicore solrconfig issues

2009-03-24 Thread Audrey Foo

No problem Kimani. I am forwarding this message to the mailing list, in case
that it can help others.
Audrey

-- Forwarded message --
From: Kimani Nielsen 
Date: Tue, Mar 24, 2009 at 8:57 AM
Subject: Re: multicore solrconfig issues
To: Audrey Foo 


Audrey,
  Yep that was my problem as well! Thank you so much for your helpful reply.
Funny thing was the application never complained about a missing
elevation.xml config file. Thanks again.

- Kimani

On Tue, Mar 24, 2009 at 11:48, Audrey Foo  wrote:

> Hi Kimani
> Yes, I thought I had copied all xml files, but was missing elevate.xml
>
> Thanks
> Audrey
>
> On Tue, Mar 24, 2009 at 7:55 AM,  wrote:
>
>> Hi,
>>  I am running into the exact same error when setting up a multi-core
>> configuration using Websphere6.1. Were you able to find the solution to
>> this?
>>
>> - Kimani
>>
>> Audrey Foo-2 wrote:
>> >
>> > Hi
>> >
>> > I am using most recent drupal apachesolr module with solr 1.4 nightly
>> > build
>> >
>> > * solrconfig.xml ==>
>> >
>> http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.15&view=markup&pathrev=DRUPAL-6--1-0-BETA5
>> > * schema.xml ==>
>> >
>> http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.30&view=markup&pathrev=DRUPAL-6--1-0-BETA5
>> >
>> > and attempting to use the multicore functionality
>> > * copied the txt files from example/solr/conf to
>> > example/multicore/core0/conf
>> > * copied the xml files above to example/multicore/core0/conf
>> > * started jetty:  java -Dsolr.solr.home=multicore -jar start.jar
>> >
>> > It throws these severe errors on bootstrap
>> > SEVERE: java.lang.NullPointerException
>> > at
>> >
>> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
>> >  at
>> >
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
>> >  at
>> >
>> org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51)
>> > at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1163)
>> >  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:123)
>> >  at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
>> > at
>> >
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
>> >  at java.lang.Thread.run(Thread.java:613)
>> >
>> > Any suggestions, about what to try further?
>> >
>> > Thanks
>> > AF
>> >
>> >
>> Quoted from:
>> http://www.nabble.com/multicore-solrconfig-issues-tp22591761p22591761.html
>>
>>
>

Update field values without re-extracting text?

2009-03-24 Thread Dan A. Dickey

I'd like to be able to index various documents and have the text extracted
from them using the DataImportHandler.  I think I have this working just fine.

However, I'd later like to be able to update a field value or several, without
re-extracting the text all over again with the DIH.  Yes - and if possible, only
update one or a few of the field values and leave the rest as is.

I haven't seen a way to do this - can it be done?
What do I need to read yet to accomplish this?  Can someone point me in
the right direction to do this?  Thanks!
-Dan

-- 
Dan A. Dickey | Senior Software Engineer

Re: Not able to configure multicore

2009-03-24 Thread Mark Miller


mitulpatel wrote:

Hello Friends,

I am newbee to solr. so sorry for silly question.

I am facing a problem related to multiple cores configuration. I have placed
a solr.xml file in solr.home directory. eventhough when I am trying to
access http://localhost:8983/solr/admin/cores it gives me tomcat error. 


Can anyone tell me what can be possible issue with this??

Thanks,
Mitul Patel
  

Have you set adminPath="/admin/cores" in ?

--
- Mark

http://www.lucidimagination.com

Re: autocommit and crashing tomcat

2009-03-24 Thread Yonik Seeley

On Tue, Mar 24, 2009 at 5:52 AM, Jacob Singh  wrote:
> If I'm using autocommit, and I have a crash of tomcat (or the whole
> machine) while there are still docs pending, will I lose those
> documents in limbo

Yep.

> If the answer is "they go away": Is there anyway to ensure integrity
> of an update?

You can only be sure that the docs are on the disk after you have done a commit.

An optional transaction log would be part of high-availability for
writes, is something we should eventually get to though.

-Yonik
http://www.lucidimagination.com

Not able to configure multicore

2009-03-24 Thread mitulpatel


Hello Friends,

I am newbee to solr. so sorry for silly question.

I am facing a problem related to multiple cores configuration. I have placed
a solr.xml file in solr.home directory. eventhough when I am trying to
access http://localhost:8983/solr/admin/cores it gives me tomcat error. 

Can anyone tell me what can be possible issue with this??

Thanks,
Mitul Patel
-- 
View this message in context: 
http://www.nabble.com/Not-able-to-configure-multicore-tp22682691p22682691.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: external fields storage

2009-03-24 Thread Andrey Klochkov

On Tue, Mar 24, 2009 at 4:43 PM, Mark Miller  wrote:

> Thats a tall order. It almost sounds as if you want to be able to not use
> the index to store fields, but have them still fully functional as if
> indexed. That would be quite the magic trick.

Look here, people wanted exactly the same feature in 2004. Is it still not
implemented?

http://www.gossamer-threads.com/lists/lucene/java-user/8672

--
Andrew Klochkov

Re: external fields storage

2009-03-24 Thread Andrey Klochkov

>
>>
>> Our index could be much smaller if we could store some of fields not in
>> index directly but in some kind of external storage.
>> All I've found until now is ExternalFileField class which shows that it's
>> possible to implement such a storage, but I'm quite sure that the
>> requirement is common and there should be some existing implementations.
>> Also it would be good to be able to search using these fields, to include
>> them in the search result sets and to update them with standard Solr
>> update
>> handlers.
>>
>>
>>
> Thats a tall order. It almost sounds as if you want to be able to not use
> the index to store fields, but have them still fully functional as if
> indexed. That would be quite the magic trick.


Well there's a number of posts in different mail lists which talk about the
same requirements so I wonder is lucene/solr/smth else doesn't implement
something like this.

For example, see this post:
http://markmail.org/message/t4lv2hqtret4p62g?q=lucene+storing+fields+in+external+storage&page=1&refer=bmode2h2dwjpymba#query:lucene%20storing%20fields%20in%20external%20storage+page:1+mid:t4lv2hqtret4p62g+state:results



> You might check out http://skwish.sourceforge.net/. Its a cool little
> library that lets you store arbitrary data keyed by an auto generated id.


We already have the storage (Coherence), we just want to make it accessible
through standard Solr API and not to create an additional logic above Solr.
I mean the logic which processes result sets and add additional fields to it
by taking values from the external storage.  And in the case of that custom
post-search logic we also will have to implement some additional
filtering/ordering/etc of result sets based on values of that "external"
fields. So the question is is it possible to use Solr/Lucene features to use
external field storage for some of fields.

-- 
Andrew Klochkov

Re: external fields storage

2009-03-24 Thread Mark Miller


Andrey Klochkov wrote:

Hi Solr users

Our index could be much smaller if we could store some of fields not in
index directly but in some kind of external storage.
All I've found until now is ExternalFileField class which shows that it's
possible to implement such a storage, but I'm quite sure that the
requirement is common and there should be some existing implementations.
Also it would be good to be able to search using these fields, to include
them in the search result sets and to update them with standard Solr update
handlers.

  
Thats a tall order. It almost sounds as if you want to be able to not 
use the index to store fields, but have them still fully functional as 
if indexed. That would be quite the magic trick.


You might check out http://skwish.sourceforge.net/. Its a cool little 
library that lets you store arbitrary data keyed by an auto generated id.


--
- Mark

http://www.lucidimagination.com

external fields storage

2009-03-24 Thread Andrey Klochkov

Hi Solr users

Our index could be much smaller if we could store some of fields not in
index directly but in some kind of external storage.
All I've found until now is ExternalFileField class which shows that it's
possible to implement such a storage, but I'm quite sure that the
requirement is common and there should be some existing implementations.
Also it would be good to be able to search using these fields, to include
them in the search result sets and to update them with standard Solr update
handlers.

-- 
Andrew Klochkov

Re: Combination of solr.xml and solrconfig.xml

On Tue, Mar 24, 2009 at 4:16 PM, Kraus, Ralf | pixelhouse GmbH <
r...@pixelhouse.de> wrote:

> Hi,
>
>> question ;-)
>>
>> http://java.sun.com/dtd/web-app_2_3.dtd"; [
>>
>>   > "/var/lib/tomcat5.5/webapps/solr/default_solrconfig.xml">
>>
>> ]>
>>
>> Is there a chance to set the "home directory" using a variable ? For
>> example an unix enviroment variable ?
>>
>> Greets -Ralf-
>>
> No chance ?
>

One can use system variables in solrconfig.xml through the ${var-name}
syntax but that is expanded only for DOM elements. It may not work for the
entity includes though I haven't tried.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Combination of solr.xml and solrconfig.xml

2009-03-24 Thread Kraus, Ralf | pixelhouse GmbH


Hi,

question ;-)

http://java.sun.com/dtd/web-app_2_3.dtd"; [

   "/var/lib/tomcat5.5/webapps/solr/default_solrconfig.xml">


]>

Is there a chance to set the "home directory" using a variable ? For 
example an unix enviroment variable ?


Greets -Ralf-

No chance ?

Greets -Ralf-

Re: lucene-java version mismatches

2009-03-24 Thread Noble Paul നോബിള്‍ नोब्ळ्

On Tue, Mar 24, 2009 at 3:30 PM, Paul Libbrecht  wrote:

>
> Is there a lucene version that solr-lucene-core-1.3.0 is?

The lucene jars shipped with Solr 1.3.0 were 2.4-dev built from svn revision
r691741. You can check out the source from lucene's svn using that revision
number.

-- 
Regards,
Shalin Shekhar Mangar.

lucene-java version mismatches

2009-03-24 Thread Paul Libbrecht



Hello list,

I have a hard time in a project that's not yet fully converted to solr  
with multiple versions of the lucene core classes. I can "switch over  
to the ones of solr" (solr-lucene-core-1.3.0) but they are  
incompatible with lucene-core-2.3.1 and don't share the same version  
number but also don't make sources available.


Is there a lucene version that solr-lucene-core-1.3.0 is?
Is there  a danger for me to migrate some tools that use lucene- 
core-2.3.1 to use solr-lucene-core-1.3.0?


thanks

paul

smime.p7s
Description: S/MIME cryptographic signature

autocommit and crashing tomcat

2009-03-24 Thread Jacob Singh

Hi,

If I'm using autocommit, and I have a crash of tomcat (or the whole
machine) while there are still docs pending, will I lose those
documents in limbo, or will I just be able to restart and then the
commit will run?

If the answer is "they go away": Is there anyway to ensure integrity
of an update?  I'd like to make a patch to help out with this, where
would one do it?

Thanks a bunch!
Jacob

-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com

Re:

We do not set an conn_timeoout,read_timeout for the httpclient in snappuller.

I guess it should be set to some very high value say 1hr for
read-timeout and say 1 minute for conn_timeout and we can make it
configurable .

--Noble

On Tue, Mar 24, 2009 at 2:13 PM, Shalin Shekhar Mangar
 wrote:
> We should obviously get to the bottom of this. But I was thinking, should we
> have some sort of timeouts on the SnapPuller in the slave to avoid such
> scenarios? Locking out snap pulls forever is not a good idea.
>
> On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley 
> wrote:
>
>> So this is only one slave that hangs up and not the master?
>> Can you get thread dumps on both the master and the slave during a hang?
>>
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>> On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn 
>> wrote:
>> > We are having an intermittent problem with replication. We reindex
>> nightly
>> > which usually means there are 2 commits during replication then a final
>> > commit/optimize at the end.  For some reason the replication will hang
>> > occasionally with the following screenshot.  This is frustrating as it
>> will
>> > completely stall out any further replications. Additionally, it seems to
>> > only happen on reindex and it will strike 1 server randomly but not
>> always
>> > the same server.
>> >
>> >
>> > In case the screen shot doesn’t come through:
>> >
>> > Master        http://10.66.209.38:8080/solr/zeta-main/replication
>> >     Latest Index Version:1233423827699, Generation: 6237
>> >     Replicatable Index Version:0, Generation: 0
>> > Poll Interval     00:05:00
>> > Local Index     Index Version: 1233423827684, Generation: 6222
>> >     Location: /opt/solr-data/zeta-main/index
>> >     Size: 1.29 GB
>> >     Times Replicated Since Startup: 3591
>> >     Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009
>> >     Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009
>> >     Config Files Replicated: [synonyms.txt]
>> >     Times Config Files Replicated Since Startup: 4
>> >     Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009
>> > Current Replication Status     Start Time: Mon Mar 23 00:22:55 PDT 2009
>> >     Files Downloaded: 12 / 163
>> >     Downloaded: 4.12 MB / 1.41 GB [0.0%]
>> >     Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%]
>> >     Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163
>> > bytes/s
>> >
>> >
>> >
>> > --
>> > Jeff Newburn
>> > Software Engineer, Zappos.com
>> > jnewb...@zappos.com - 702-943-7562
>> >
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
--Noble Paul

Re:

We should obviously get to the bottom of this. But I was thinking, should we
have some sort of timeouts on the SnapPuller in the slave to avoid such
scenarios? Locking out snap pulls forever is not a good idea.

On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley wrote:

> So this is only one slave that hangs up and not the master?
> Can you get thread dumps on both the master and the slave during a hang?
>
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn 
> wrote:
> > We are having an intermittent problem with replication. We reindex
> nightly
> > which usually means there are 2 commits during replication then a final
> > commit/optimize at the end.  For some reason the replication will hang
> > occasionally with the following screenshot.  This is frustrating as it
> will
> > completely stall out any further replications. Additionally, it seems to
> > only happen on reindex and it will strike 1 server randomly but not
> always
> > the same server.
> >
> >
> > In case the screen shot doesn’t come through:
> >
> > Masterhttp://10.66.209.38:8080/solr/zeta-main/replication
> > Latest Index Version:1233423827699, Generation: 6237
> > Replicatable Index Version:0, Generation: 0
> > Poll Interval 00:05:00
> > Local Index Index Version: 1233423827684, Generation: 6222
> > Location: /opt/solr-data/zeta-main/index
> > Size: 1.29 GB
> > Times Replicated Since Startup: 3591
> > Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009
> > Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009
> > Config Files Replicated: [synonyms.txt]
> > Times Config Files Replicated Since Startup: 4
> > Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009
> > Current Replication Status Start Time: Mon Mar 23 00:22:55 PDT 2009
> > Files Downloaded: 12 / 163
> > Downloaded: 4.12 MB / 1.41 GB [0.0%]
> > Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%]
> > Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163
> > bytes/s
> >
> >
> >
> > --
> > Jeff Newburn
> > Software Engineer, Zappos.com
> > jnewb...@zappos.com - 702-943-7562
> >
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: search individual words but facet on delimiter