caching

2009-03-24 Thread revas
If i don't explicity set any default  query in the solrconfig.xml  for
caching and make use of the default config file,does solr do the caching
automatically based on the query?

Thanks


Dynamic range Facets

2009-03-24 Thread Ashish P

my documents (products) have a price field, and I want to have 
a dynamically calculated range facet for that in the response. 

E.g. I want to have this in the response 
price:[* TO 20]  - 23 
price:[20 TO 40] - 42 
price:[40 TO *]  - 33 
if prices are between 0 and 60 
but 
price:[* TO 100]   - 23 
price:[100 TO 200] - 42 
price:[200 TO *]   - 33 
if prices are between 0 and 300 

So the question is how to get the dynamic facets response from solr.

This is same question as previously posted back in 2007. But still waits an
answer??
Is there any solution on this??
-- 
View this message in context: 
http://www.nabble.com/Dynamic-range-Facets-tp22675413p22675413.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problem for replication : segment optimized automaticly

2009-03-24 Thread sunnyfr

How can I stop this?




Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 if the DIH status does not say that it optimized, it is lucene
 mergeing the segments
 
 On Mon, Mar 23, 2009 at 8:15 PM, sunnyfr johanna...@gmail.com wrote:

 I checked this out but It doesn't say nothing about optimizing.
 I'm sure it's lucene part about merging or I don't know ...??


 Noble Paul നോബിള്‍  नोब्ळ् wrote:

 the easiest way to find out what DIH did is to hit it's status
 command. It will give you a brief description of what all it did
 during last import

 On Mon, Mar 23, 2009 at 12:59 AM, Shalin Shekhar Mangar
 shalinman...@gmail.com wrote:
 Lucene will automatically merge segments when they exceed the
 mergeFactor.
 This may be one reason but I'm not sure.

 I checked DataImportHandler's code again. It won't optimize if
 optimize=false is specified.

 On Mon, Mar 23, 2009 at 12:43 AM, sunnyfr johanna...@gmail.com wrote:


 Do you have any idea ???
 :(

 cheer,


 sunnyfr wrote:
 
  Hi everybody ... still me :)
  hoo happy day :)
 
  Just, I dont get where I miss something, I will try to be clear.
 
  this is my index folder (and we can notice the evolution according
 to
 the
  delta import every 30mn) :
 
  r...@search-01:/data/solr# ls video/data/index/
  _2bel.fdt  _2bel.fnm  _2bel.nrm  _2bel.tii  _2bel.tvd  _2bel.tvx
  _2bem.fdt  _2bem.fnm  _2bem.nrm  _2bem.tii  _2bem.tvd  _2bem.tvx
  _2ben.frq  _2ben.prx  _2ben.tis  _2beo.fdx  segments.gen
  _2bel.fdx  _2bel.frq  _2bel.prx  _2bel.tis  _2bel.tvf  _2bel_1.del
  _2bem.fdx  _2bem.frq  _2bem.prx  _2bem.tis  _2bem.tvf  _2ben.fnm
  _2ben.nrm  _2ben.tii  _2beo.fdt  _2beo.fnm  segments_230x
  r...@search-01:/data/solr# ls video/data/index/
  _2bel.fdt  _2bel.frq  _2bel.tii  _2bel.tvf    _2bem.fdt  _2bem.frq
  _2bem.tii  _2bem.tvf  _2ben.frq  _2ben.tii  _2beo.fdx  _2beo.nrm
  _2beo.tis  _2beo.tvx
  _2bel.fdx  _2bel.nrm  _2bel.tis  _2bel.tvx    _2bem.fdx  _2bem.nrm
  _2bem.tis  _2bem.tvx  _2ben.nrm  _2ben.tis  _2beo.fnm  _2beo.prx
  _2beo.tvd  segments.gen
  _2bel.fnm  _2bel.prx  _2bel.tvd  _2bel_1.del  _2bem.fnm  _2bem.prx
  _2bem.tvd  _2ben.fnm  _2ben.prx  _2beo.fdt  _2beo.frq  _2beo.tii
  _2beo.tvf  segments_230x
  r...@search-01:/data/solr# ls video/data/index/
  _2beo.fdt  _2beo.fdx  _2beo.fnm  _2beo.frq  _2beo.nrm  _2beo.prx
  _2beo.tii  _2beo.tis  _2beo.tvd  _2beo.tvf  _2beo.tvx  segments.gen
  segments_230y
 
  So as you can notice my segments increased which is perfect for my
  replication (faster to get JUST last segments)
  But like you can see in my last ls my segment has been optimized.
 
  Like I can notice in my log :
  Mar 19 15:42:37 search-01 jsvc.exec[23255]: Mar 19, 2009 3:42:37 PM
  org.apache.solr.handler.dataimport.DocBuilder commit INFO: Full
 Import
  completed successfully without optimization
  Mar 19 15:42:37 search-01 jsvc.exec[23255]: Mar 19, 2009 3:42:37 PM
  org.apache.solr.update.DirectUpdateHandler2 commit INFO: start
  commit(optimize=true,waitFlush=false,waitSearcher=true)
 
  But I didn't fire any optimize, my delta-import is fired like :
  /solr/video/dataimport?command=delta-importoptimize=false
 
  Solrconfig.xml : autocommit turnd off
  !--luceneAutoCommitfalse/luceneAutoCommit--
  !-- commitLockTimeout18500/commitLockTimeout--
  !-- autocommit pending docs if certain criteria are met
    autoCommit
    maxDocs1/maxDocs
    /autoCommit
  --
 
  Maybe it comes from lucene parameters?
      !-- options specific to the main on-disk lucene index --
      useCompoundFilefalse/useCompoundFile
      ramBufferSizeMB50/ramBufferSizeMB
      mergeFactor50/mergeFactor
      !-- Deprecated --
      !--maxBufferedDocs1000/maxBufferedDocs--
      maxMergeDocs2147483647/maxMergeDocs
      maxFieldLength1/maxFieldLength
 
  Thanks a lot for your help,
  Sunny
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-tp22601442p22649412.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 Shalin Shekhar Mangar.




 --
 --Noble Paul



 --
 View this message in context:
 http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-tp22601442p22661545.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-tp22601442p22675729.html
Sent from the Solr - User mailing list archive at Nabble.com.



search individual words but facet on delimiter

2009-03-24 Thread Ashish P

I want following output from solr:
I index a field with value - A B;C D;E F
I have applied a pattern tokenizer on this field because I know the value
will contain ;
fieldtype name=conditionText class=solr.TextField 
analyzer
tokenizer class=solr.PatternTokenizerFactory pattern=; /
/analyzer
/fieldtype

So it indexes A B, C D, E F properly... So I get facets 
A B (1)
C D (1)
E F (1)
This is the exact output of facets I want.

But I also want to search this document when I just search individual word
'A' or 'D' etc. 
So I want facets exactly same as above but at the same time to be able to
search on individual words also.

Is there a way to achieve this???
Thanks in advance,
Ashish
-- 
View this message in context: 
http://www.nabble.com/search-individual-words-but-facet-on-delimiter-tp22676007p22676007.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: search individual words but facet on delimiter

2009-03-24 Thread Shalin Shekhar Mangar
On Tue, Mar 24, 2009 at 2:03 PM, Ashish P ashish.ping...@gmail.com wrote:


 So it indexes A B, C D, E F properly... So I get facets
 A B (1)
 C D (1)
 E F (1)
 This is the exact output of facets I want.

 But I also want to search this document when I just search individual word
 'A' or 'D' etc.
 So I want facets exactly same as above but at the same time to be able to
 search on individual words also.


Yes, you can create another field whose type is text (or anything which can
tokenize on whitespace and punctuation). Use the copyField directive to copy
the contents into both your original and the above field. Search on the
above field and facet on the original field.

-- 
Regards,
Shalin Shekhar Mangar.


Re: no subject

2009-03-24 Thread Shalin Shekhar Mangar
We should obviously get to the bottom of this. But I was thinking, should we
have some sort of timeouts on the SnapPuller in the slave to avoid such
scenarios? Locking out snap pulls forever is not a good idea.

On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 So this is only one slave that hangs up and not the master?
 Can you get thread dumps on both the master and the slave during a hang?


 -Yonik
 http://www.lucidimagination.com


 On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn jnewb...@zappos.com
 wrote:
  We are having an intermittent problem with replication. We reindex
 nightly
  which usually means there are 2 commits during replication then a final
  commit/optimize at the end.  For some reason the replication will hang
  occasionally with the following screenshot.  This is frustrating as it
 will
  completely stall out any further replications. Additionally, it seems to
  only happen on reindex and it will strike 1 server randomly but not
 always
  the same server.
 
 
  In case the screen shot doesn’t come through:
 
  Masterhttp://10.66.209.38:8080/solr/zeta-main/replication
  Latest Index Version:1233423827699, Generation: 6237
  Replicatable Index Version:0, Generation: 0
  Poll Interval 00:05:00
  Local Index Index Version: 1233423827684, Generation: 6222
  Location: /opt/solr-data/zeta-main/index
  Size: 1.29 GB
  Times Replicated Since Startup: 3591
  Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009
  Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009
  Config Files Replicated: [synonyms.txt]
  Times Config Files Replicated Since Startup: 4
  Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009
  Current Replication Status Start Time: Mon Mar 23 00:22:55 PDT 2009
  Files Downloaded: 12 / 163
  Downloaded: 4.12 MB / 1.41 GB [0.0%]
  Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%]
  Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163
  bytes/s
 
 
 
  --
  Jeff Newburn
  Software Engineer, Zappos.com
  jnewb...@zappos.com - 702-943-7562
 




-- 
Regards,
Shalin Shekhar Mangar.


Re: no subject

2009-03-24 Thread Noble Paul നോബിള്‍ नोब्ळ्
We do not set an conn_timeoout,read_timeout for the httpclient in snappuller.

I guess it should be set to some very high value say 1hr for
read-timeout and say 1 minute for conn_timeout and we can make it
configurable .

--Noble

On Tue, Mar 24, 2009 at 2:13 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
 We should obviously get to the bottom of this. But I was thinking, should we
 have some sort of timeouts on the SnapPuller in the slave to avoid such
 scenarios? Locking out snap pulls forever is not a good idea.

 On Mon, Mar 23, 2009 at 8:57 PM, Yonik Seeley 
 yo...@lucidimagination.comwrote:

 So this is only one slave that hangs up and not the master?
 Can you get thread dumps on both the master and the slave during a hang?


 -Yonik
 http://www.lucidimagination.com


 On Mon, Mar 23, 2009 at 10:44 AM, Jeff Newburn jnewb...@zappos.com
 wrote:
  We are having an intermittent problem with replication. We reindex
 nightly
  which usually means there are 2 commits during replication then a final
  commit/optimize at the end.  For some reason the replication will hang
  occasionally with the following screenshot.  This is frustrating as it
 will
  completely stall out any further replications. Additionally, it seems to
  only happen on reindex and it will strike 1 server randomly but not
 always
  the same server.
 
 
  In case the screen shot doesn’t come through:
 
  Master        http://10.66.209.38:8080/solr/zeta-main/replication
      Latest Index Version:1233423827699, Generation: 6237
      Replicatable Index Version:0, Generation: 0
  Poll Interval     00:05:00
  Local Index     Index Version: 1233423827684, Generation: 6222
      Location: /opt/solr-data/zeta-main/index
      Size: 1.29 GB
      Times Replicated Since Startup: 3591
      Previous Replication Done At: Mon Mar 23 00:18:03 PDT 2009
      Config Files Replicated At: Wed Mar 18 06:07:53 PDT 2009
      Config Files Replicated: [synonyms.txt]
      Times Config Files Replicated Since Startup: 4
      Next Replication Cycle At: Mon Mar 23 00:27:55 PDT 2009
  Current Replication Status     Start Time: Mon Mar 23 00:22:55 PDT 2009
      Files Downloaded: 12 / 163
      Downloaded: 4.12 MB / 1.41 GB [0.0%]
      Downloading File: _5no.tis, Downloaded: 0 bytes / 629.57 KB [0.0%]
      Time Elapsed: 26371s, Estimated Time Remaining: 9216278s, Speed: 163
  bytes/s
 
 
 
  --
  Jeff Newburn
  Software Engineer, Zappos.com
  jnewb...@zappos.com - 702-943-7562
 




 --
 Regards,
 Shalin Shekhar Mangar.




-- 
--Noble Paul


autocommit and crashing tomcat

2009-03-24 Thread Jacob Singh
Hi,

If I'm using autocommit, and I have a crash of tomcat (or the whole
machine) while there are still docs pending, will I lose those
documents in limbo, or will I just be able to restart and then the
commit will run?

If the answer is they go away: Is there anyway to ensure integrity
of an update?  I'd like to make a patch to help out with this, where
would one do it?

Thanks a bunch!
Jacob

-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


lucene-java version mismatches

2009-03-24 Thread Paul Libbrecht


Hello list,

I have a hard time in a project that's not yet fully converted to solr  
with multiple versions of the lucene core classes. I can switch over  
to the ones of solr (solr-lucene-core-1.3.0) but they are  
incompatible with lucene-core-2.3.1 and don't share the same version  
number but also don't make sources available.


Is there a lucene version that solr-lucene-core-1.3.0 is?
Is there  a danger for me to migrate some tools that use lucene- 
core-2.3.1 to use solr-lucene-core-1.3.0?


thanks

paul

smime.p7s
Description: S/MIME cryptographic signature


Re: lucene-java version mismatches

2009-03-24 Thread Shalin Shekhar Mangar
On Tue, Mar 24, 2009 at 3:30 PM, Paul Libbrecht p...@activemath.org wrote:


 Is there a lucene version that solr-lucene-core-1.3.0 is?


The lucene jars shipped with Solr 1.3.0 were 2.4-dev built from svn revision
r691741. You can check out the source from lucene's svn using that revision
number.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Combination of solr.xml and solrconfig.xml

2009-03-24 Thread Kraus, Ralf | pixelhouse GmbH

Hi,

question ;-)

!DOCTYPE config SYSTEM http://java.sun.com/dtd/web-app_2_3.dtd; [

   !ENTITY default_solrconfig SYSTEM 
/var/lib/tomcat5.5/webapps/solr/default_solrconfig.xml


]

Is there a chance to set the home directory using a variable ? For 
example an unix enviroment variable ?


Greets -Ralf-

No chance ?

Greets -Ralf-


Re: Combination of solr.xml and solrconfig.xml

2009-03-24 Thread Shalin Shekhar Mangar
On Tue, Mar 24, 2009 at 4:16 PM, Kraus, Ralf | pixelhouse GmbH 
r...@pixelhouse.de wrote:

 Hi,

 question ;-)

 !DOCTYPE config SYSTEM http://java.sun.com/dtd/web-app_2_3.dtd; [

   !ENTITY default_solrconfig SYSTEM
 /var/lib/tomcat5.5/webapps/solr/default_solrconfig.xml

 ]

 Is there a chance to set the home directory using a variable ? For
 example an unix enviroment variable ?

 Greets -Ralf-

 No chance ?


One can use system variables in solrconfig.xml through the ${var-name}
syntax but that is expanded only for DOM elements. It may not work for the
entity includes though I haven't tried.

-- 
Regards,
Shalin Shekhar Mangar.


external fields storage

2009-03-24 Thread Andrey Klochkov
Hi Solr users

Our index could be much smaller if we could store some of fields not in
index directly but in some kind of external storage.
All I've found until now is ExternalFileField class which shows that it's
possible to implement such a storage, but I'm quite sure that the
requirement is common and there should be some existing implementations.
Also it would be good to be able to search using these fields, to include
them in the search result sets and to update them with standard Solr update
handlers.

-- 
Andrew Klochkov


Re: external fields storage

2009-03-24 Thread Mark Miller

Andrey Klochkov wrote:

Hi Solr users

Our index could be much smaller if we could store some of fields not in
index directly but in some kind of external storage.
All I've found until now is ExternalFileField class which shows that it's
possible to implement such a storage, but I'm quite sure that the
requirement is common and there should be some existing implementations.
Also it would be good to be able to search using these fields, to include
them in the search result sets and to update them with standard Solr update
handlers.

  
Thats a tall order. It almost sounds as if you want to be able to not 
use the index to store fields, but have them still fully functional as 
if indexed. That would be quite the magic trick.


You might check out http://skwish.sourceforge.net/. Its a cool little 
library that lets you store arbitrary data keyed by an auto generated id.


--
- Mark

http://www.lucidimagination.com





Re: external fields storage

2009-03-24 Thread Andrey Klochkov


 Our index could be much smaller if we could store some of fields not in
 index directly but in some kind of external storage.
 All I've found until now is ExternalFileField class which shows that it's
 possible to implement such a storage, but I'm quite sure that the
 requirement is common and there should be some existing implementations.
 Also it would be good to be able to search using these fields, to include
 them in the search result sets and to update them with standard Solr
 update
 handlers.



 Thats a tall order. It almost sounds as if you want to be able to not use
 the index to store fields, but have them still fully functional as if
 indexed. That would be quite the magic trick.


Well there's a number of posts in different mail lists which talk about the
same requirements so I wonder is lucene/solr/smth else doesn't implement
something like this.

For example, see this post:
http://markmail.org/message/t4lv2hqtret4p62g?q=lucene+storing+fields+in+external+storagepage=1refer=bmode2h2dwjpymba#query:lucene%20storing%20fields%20in%20external%20storage+page:1+mid:t4lv2hqtret4p62g+state:results



 You might check out http://skwish.sourceforge.net/. Its a cool little
 library that lets you store arbitrary data keyed by an auto generated id.


We already have the storage (Coherence), we just want to make it accessible
through standard Solr API and not to create an additional logic above Solr.
I mean the logic which processes result sets and add additional fields to it
by taking values from the external storage.  And in the case of that custom
post-search logic we also will have to implement some additional
filtering/ordering/etc of result sets based on values of that external
fields. So the question is is it possible to use Solr/Lucene features to use
external field storage for some of fields.

-- 
Andrew Klochkov


Re: external fields storage

2009-03-24 Thread Andrey Klochkov
On Tue, Mar 24, 2009 at 4:43 PM, Mark Miller markrmil...@gmail.com wrote:

 Thats a tall order. It almost sounds as if you want to be able to not use
 the index to store fields, but have them still fully functional as if
 indexed. That would be quite the magic trick.


Look here, people wanted exactly the same feature in 2004. Is it still not
implemented?

http://www.gossamer-threads.com/lists/lucene/java-user/8672

--
Andrew Klochkov


Not able to configure multicore

2009-03-24 Thread mitulpatel

Hello Friends,

I am newbee to solr. so sorry for silly question.

I am facing a problem related to multiple cores configuration. I have placed
a solr.xml file in solr.home directory. eventhough when I am trying to
access http://localhost:8983/solr/admin/cores it gives me tomcat error. 

Can anyone tell me what can be possible issue with this??

Thanks,
Mitul Patel
-- 
View this message in context: 
http://www.nabble.com/Not-able-to-configure-multicore-tp22682691p22682691.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Not able to configure multicore

2009-03-24 Thread Mark Miller

mitulpatel wrote:

Hello Friends,

I am newbee to solr. so sorry for silly question.

I am facing a problem related to multiple cores configuration. I have placed
a solr.xml file in solr.home directory. eventhough when I am trying to
access http://localhost:8983/solr/admin/cores it gives me tomcat error. 


Can anyone tell me what can be possible issue with this??

Thanks,
Mitul Patel
  

Have you set adminPath=/admin/cores in cores?

--
- Mark

http://www.lucidimagination.com





Update field values without re-extracting text?

2009-03-24 Thread Dan A. Dickey
I'd like to be able to index various documents and have the text extracted
from them using the DataImportHandler.  I think I have this working just fine.

However, I'd later like to be able to update a field value or several, without
re-extracting the text all over again with the DIH.  Yes - and if possible, only
update one or a few of the field values and leave the rest as is.

I haven't seen a way to do this - can it be done?
What do I need to read yet to accomplish this?  Can someone point me in
the right direction to do this?  Thanks!
-Dan

-- 
Dan A. Dickey | Senior Software Engineer


Fwd: multicore solrconfig issues

2009-03-24 Thread Audrey Foo
No problem Kimani. I am forwarding this message to the mailing list, in case
that it can help others.
Audrey

-- Forwarded message --
From: Kimani Nielsen kniel...@gmail.com
Date: Tue, Mar 24, 2009 at 8:57 AM
Subject: Re: multicore solrconfig issues
To: Audrey Foo chry...@gmail.com


Audrey,
  Yep that was my problem as well! Thank you so much for your helpful reply.
Funny thing was the application never complained about a missing
elevation.xml config file. Thanks again.

- Kimani

On Tue, Mar 24, 2009 at 11:48, Audrey Foo chry...@gmail.com wrote:

 Hi Kimani
 Yes, I thought I had copied all xml files, but was missing elevate.xml

 Thanks
 Audrey

 On Tue, Mar 24, 2009 at 7:55 AM, kniel...@gmail.com wrote:

 Hi,
  I am running into the exact same error when setting up a multi-core
 configuration using Websphere6.1. Were you able to find the solution to
 this?

 - Kimani

 Audrey Foo-2 wrote:
 
  Hi
 
  I am using most recent drupal apachesolr module with solr 1.4 nightly
  build
 
  * solrconfig.xml ==
 
 http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.15view=markuppathrev=DRUPAL-6--1-0-BETA5
  * schema.xml ==
 
 http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.30view=markuppathrev=DRUPAL-6--1-0-BETA5
 
  and attempting to use the multicore functionality
  * copied the txt files from example/solr/conf to
  example/multicore/core0/conf
  * copied the xml files above to example/multicore/core0/conf
  * started jetty:  java -Dsolr.solr.home=multicore -jar start.jar
 
  It throws these severe errors on bootstrap
  SEVERE: java.lang.NullPointerException
  at
 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:173)
   at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
   at
 
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:51)
  at org.apache.solr.core.SolrCore$4.call(SolrCore.java:1163)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
  at java.util.concurrent.FutureTask.run(FutureTask.java:123)
   at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
   at java.lang.Thread.run(Thread.java:613)
 
  Any suggestions, about what to try further?
 
  Thanks
  AF
 
 
 Quoted from:
 http://www.nabble.com/multicore-solrconfig-issues-tp22591761p22591761.html





Re: Optimize

2009-03-24 Thread sunnyfr

thanks for  your answer, then what fire merging because in my log i've
optimize=true, if it's not optimization because I don't fire it, it must me
marging how can I stop this??

Thanks a lot,


Shalin Shekhar Mangar wrote:
 
 No, optimize is not automatic. You have to invoke it yourself just like
 commits.
 
 Take a look at the following for examples:
 http://wiki.apache.org/solr/UpdateXmlMessages
 
 On Thu, Oct 2, 2008 at 2:03 PM, sunnyfr johanna...@gmail.com wrote:
 


 Hi,

 Can somebody explain me a bit how works optimize?
  I read the doc but didn't get really what fire optimize.

 Thanks a lot,
 --
 View this message in context:
 http://www.nabble.com/Optimize-tp19775320p19775320.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 
 

-- 
View this message in context: 
http://www.nabble.com/Optimize-tp19775320p22684113.html
Sent from the Solr - User mailing list archive at Nabble.com.



How to Index IP address

2009-03-24 Thread nga pham
Hi All,

I have a txt file, that captured all of my network traffic.  How can I use
Solr to filter out a particular IP address?

Thank you,
Nga.


Re: external fields storage

2009-03-24 Thread Mark Miller

Andrey Klochkov wrote:

On Tue, Mar 24, 2009 at 4:43 PM, Mark Miller markrmil...@gmail.com wrote:

  

Thats a tall order. It almost sounds as if you want to be able to not use
the index to store fields, but have them still fully functional as if
indexed. That would be quite the magic trick.




Look here, people wanted exactly the same feature in 2004. Is it still not
implemented?

http://www.gossamer-threads.com/lists/lucene/java-user/8672

--
Andrew Klochkov

  
Right - I was exaggerating your description a bit. It reads as if you 
want it to have all the same power as an indexed field. So I made a bad 
joke. If you want to be able to search the field, its index entry needs 
to be updated anyway. I don't see how you get search on external stored 
fields without having to update and keep them in the index - external 
field storage is simple to add on your own, either using that skwish 
library, or even a basic database. You can then do id to offset mapping 
like that guy is looking for - simply add the id to Lucene and do your 
content updates with the external db.


--
- Mark

http://www.lucidimagination.com





Streaming results of analysis to shards ... possible?

2009-03-24 Thread Cass Costello
Hello all,

Our application involves a high index write rate - anywhere from a few
dozen to many thousands of docs per sec.  The write rate is frequently
higher than the read rate (though not always), and our index must be
as fresh as possible (we'd like search results to be no more than a
couple of seconds out of date). We're considering many approaches to
achieving our desired TCO.

We've noted that the indexing process can be quite costly.  Our latest
POC shards the total index over N machines which effectively
distributes the indexing load and keeps refresh and and search
response times decent, but to maintain performance during peak write
rates, we've had to make N a much larger number than we'd like.

One idea we're floating would be to do all the analysis centrally,
perhaps on N/4 machines, and then stream the raw tokens and data
directly to the read slaves, who would (hopefully) need to do
nothing more than manage segments and readers.

We have some very rough math that makes the approach compelling, but
before diving in wholesale, we thought we'd ask if anyone else has
taken a similar approach.   Thoughts?

Sincerely,

Cass Costello
www.stubhub.com


Re: How to Index IP address

2009-03-24 Thread Matthew Runo
I don't think that Solr is the best thing to use for searching a text  
file. I'd use grep myself, if you're on a unix-like system.


To use solr, you'd need to throw each network 'event' (GET, POST, etc  
etc) into an XML document, and post those into Solr so it could  
generate the index. You could then do things like
ip:10.206.158.154 to find a specific IP address, or even ip: 
10.206.158* to get a subnet.


Perhaps the thing that's building your text file could post to Solr  
instead?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 24, 2009, at 9:32 AM, nga pham wrote:


Hi All,

I have a txt file, that captured all of my network traffic.  How can  
I use

Solr to filter out a particular IP address?

Thank you,
Nga.




Re: How to Index IP address

2009-03-24 Thread nga pham
Do you think luence is better to filter out a particular IP address from a
txt file?

Thank you Runo,
Nga

On Tue, Mar 24, 2009 at 10:21 AM, Matthew Runo mr...@zappos.com wrote:

 I don't think that Solr is the best thing to use for searching a text file.
 I'd use grep myself, if you're on a unix-like system.

 To use solr, you'd need to throw each network 'event' (GET, POST, etc etc)
 into an XML document, and post those into Solr so it could generate the
 index. You could then do things like
 ip:10.206.158.154 to find a specific IP address, or even ip:10.206.158* to
 get a subnet.

 Perhaps the thing that's building your text file could post to Solr
 instead?

 Thanks for your time!

 Matthew Runo
 Software Engineer, Zappos.com
 mr...@zappos.com - 702-943-7833


 On Mar 24, 2009, at 9:32 AM, nga pham wrote:

 Hi All,

 I have a txt file, that captured all of my network traffic.  How can I use
 Solr to filter out a particular IP address?

 Thank you,
 Nga.





Re: Update field values without re-extracting text?

2009-03-24 Thread Otis Gospodnetic

Hi Dan,

We should turn this into a FAQ.  In the mean time, have a look at SOLR-139 and 
the issue linked to that one.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Dan A. Dickey dan.dic...@savvis.net
 To: solr-user@lucene.apache.org
 Sent: Tuesday, March 24, 2009 11:43:35 AM
 Subject: Update field values without re-extracting text?
 
 I'd like to be able to index various documents and have the text extracted
 from them using the DataImportHandler.  I think I have this working just fine.
 
 However, I'd later like to be able to update a field value or several, without
 re-extracting the text all over again with the DIH.  Yes - and if possible, 
 only
 update one or a few of the field values and leave the rest as is.
 
 I haven't seen a way to do this - can it be done?
 What do I need to read yet to accomplish this?  Can someone point me in
 the right direction to do this?  Thanks!
 -Dan
 
 -- 
 Dan A. Dickey | Senior Software Engineer



Trivial question: request for id when indexing using CURL ExtractingRequestHandler

2009-03-24 Thread Chris Muktar
I'm performing this operation:

curl http://localhost:8983/solr/update/extract?ext.def.fl=text --data-binary
@ZOLA.doc -H 'Content-type:text/html'

in order to index word document ZOLA.doc into Solr using the example
schema.xml. It says I have not provided an 'id', which is a required field.
I'm not sure how (syntactically) to provide the id- should it be part of the
query string? And if so, how?

Any help much appreciated!
Thanks!
Chris.


Multi-select on more than one facet field

2009-03-24 Thread Nasseam Elkarra

Looking at the example here:
http://wiki.apache.org/solr/SimpleFacetParameters#head-4ba81c89b265c3b5992e3292718a0d100f7251ef

This being the query for selecting PDF:
q=mainqueryfq=status:publicfq={! 
tag=dt}doctype:pdffacet=onfacet.field={!ex=dt}doctype


How would you do the query for selecting PDF OR Excel AND, assuming  
there is another facet field named author, where author is Mike?


Thank you,
Nasseam


Solr index deletion

2009-03-24 Thread Nasseam Elkarra
On a few occasions, our development server crashed and in the process  
solr deleted the index folder. We are suspecting another app on the  
server caused an OutOfMemoryException on Tomcat causing all apps  
including solr to crash.


So my question is why is solr deleting the index? We are not doing any  
updates to the index only reading from it so any insight would be  
appreciated.


Thank you,
Nasseam


Re: How to Index IP address

2009-03-24 Thread Matthew Runo
Well, I think you'll have the same problem. Lucene, and Solr (since  
it's built on Lucene) are both going to expect a structured document  
as input. Once you send in a bunch of documents, you can then query  
them for whatever you want to find.


A quick search of the internets found me this Apache Labs project -  
called Pinpoint. It's designed to take log data in, and build an index  
out of it. I'm not sure how developed it is, but it might be a good  
starting point for you. There are probably other projects out there  
along the same lines.. Here's Pinpoint: http://svn.apache.org/repos/asf/labs/pinpoint/trunk/


Why do you want to use Solr / Lucene to look through your files? If  
you have a huge dataset, some people are using Hadoop (a version of  
Google's MapReduce) to look through very large sets of logfiles: http://www.lexemetech.com/2008/01/hadoop-and-log-file-analysis.html


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833

On Mar 24, 2009, at 10:28 AM, nga pham wrote:

Do you think luence is better to filter out a particular IP address  
from a

txt file?

Thank you Runo,
Nga

On Tue, Mar 24, 2009 at 10:21 AM, Matthew Runo mr...@zappos.com  
wrote:


I don't think that Solr is the best thing to use for searching a  
text file.

I'd use grep myself, if you're on a unix-like system.

To use solr, you'd need to throw each network 'event' (GET, POST,  
etc etc)
into an XML document, and post those into Solr so it could generate  
the

index. You could then do things like
ip:10.206.158.154 to find a specific IP address, or even ip: 
10.206.158* to

get a subnet.

Perhaps the thing that's building your text file could post to Solr
instead?

Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
mr...@zappos.com - 702-943-7833


On Mar 24, 2009, at 9:32 AM, nga pham wrote:

Hi All,


I have a txt file, that captured all of my network traffic.  How  
can I use

Solr to filter out a particular IP address?

Thank you,
Nga.








Re: Solr index deletion

2009-03-24 Thread Otis Gospodnetic

Somehow that sounds very unlikely.  Have you looked at logs?  What have you 
found from Solr there?  I am not checking the sources, but I don't think there 
is any place in Solr where the index directory gets deleted.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Nasseam Elkarra nass...@bodukai.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, March 24, 2009 2:35:22 PM
 Subject: Solr index deletion
 
 On a few occasions, our development server crashed and in the process solr 
 deleted the index folder. We are suspecting another app on the server caused 
 an 
 OutOfMemoryException on Tomcat causing all apps including solr to crash.
 
 So my question is why is solr deleting the index? We are not doing any 
 updates 
 to the index only reading from it so any insight would be appreciated.
 
 Thank you,
 Nasseam



Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler

2009-03-24 Thread Chris Muktar
I've tried this too, still no luck:
curl http://localhost:8983/solr/update/extract?ext.def.fl=text -F id=123 -F
te...@zola.doc


2009/3/24 Chris Muktar ch...@wikijob.co.uk

 I'm performing this operation:

 curl http://localhost:8983/solr/update/extract?ext.def.fl=text--data-binary 
 @ZOLA.doc -H 'Content-type:text/html'

 in order to index word document ZOLA.doc into Solr using the example
 schema.xml. It says I have not provided an 'id', which is a required field.
 I'm not sure how (syntactically) to provide the id- should it be part of the
 query string? And if so, how?

 Any help much appreciated!
 Thanks!
 Chris.



Re: Problem with Facet Date Query

2009-03-24 Thread Chris Hostetter
: 
: This is my query: 
: 
q=productPublicationDate_product_dt:[*%20TO%20NOW]facet=truefacet.field=productPublicationDate_product_dt:[*%20TO%20NOW]qt=dismaxrequest

that specific error is happening because you are passing this string...

productPublicationDate_product_dt:[*%20TO%20NOW]

...to the facet.field param.  that parameter expects the name of a field, 
and it will then facet on all the indexed values.  what you are passing it 
isn't the name of a field, you are passing it a query string.  if you want 
the faceting count for a query string, use the facet.query param, which 
you already seem to be doing with a different range of dates by hardcoding 
it into your solrconfig...

: I have entered this field in solrConfig.xml also in the below manner.
: 
: lst name=invariants
:   str name=facet.fieldcat/str
:   str name=facet.fieldmanu_exact/str
:   str name=facet.queryprice:[* TO 500]/str
:   str name=facet.queryprice:[500 TO *]/str
: str name=facet.queryproductPublicationDate_product_dt:[* TO
: NOW/DAY-1MONTH]^2.2/str
: /lst

I'm not entirely sure what it is you are trying to do, but you're also 
going to have problems because you are using the standard query syntax 
in your q param, but you have specified qt=dismax.

Please explain what your *goal* is and then people can help you explain 
how to achieve your goal ... what you've got here in your example makes no 
sense, and it's not clear what advice to give you to get it to make 
sense without knowing what it is you want to do.  This is similar to an XY 
Problem...

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341




-Hoss



Re: How to Index IP address

2009-03-24 Thread Alexandre Rafalovitch
Well,

A log file is theoretically structured. Every log record is a - very -
flat set of fields. So, every log file line would be a Lucene
document. Then, one could use Solr to search, filter and facet
records.

Of course, this requires parsing log file back into record components.
Most log files were created for output, not for re-input. But if you
can parse it back, you might be able to do custom data import. Or, if
you can intercept log file before it hits serialization, you might be
able to index the fields directly.

Or you could just buy Splunk ( http://www.splunk.com/ ) and be done
with it. Parsing and visualizing log files is exactly what they set
out to deal with. No (great) open source solution yet.

Regards,
Alex.
Personal blog: http://blog.outerthoughts.com/
Research group: http://www.clt.mq.edu.au/Research/
- I think age is a very high price to pay for maturity (Tom Stoppard)


On Tue, Mar 24, 2009 at 2:40 PM, Matthew Runo mr...@zappos.com wrote:
 Well, I think you'll have the same problem. Lucene, and Solr (since it's
 built on Lucene) are both going to expect a structured document as input.
 Once you send in a bunch of documents, you can then query them for whatever
 you want to find.


Re: Multi-select on more than one facet field

2009-03-24 Thread Yonik Seeley
On Tue, Mar 24, 2009 at 2:29 PM, Nasseam Elkarra nass...@bodukai.com wrote:
 Looking at the example here:
 http://wiki.apache.org/solr/SimpleFacetParameters#head-4ba81c89b265c3b5992e3292718a0d100f7251ef

 This being the query for selecting PDF:
 q=mainqueryfq=status:publicfq={!tag=dt}doctype:pdffacet=onfacet.field={!ex=dt}doctype

 How would you do the query for selecting PDF OR Excel AND, assuming there is
 another facet field named author, where author is Mike?

If author is not a multi-select facet (i.e. you already selected
author:Mike and hence wish to no longer get other counts for the
author field) then:

q=mainquery
fq=status:public
fq={!tag=dt}doctype:(PDF OR Excel)
fq=author:Mike
facet=onfacet.field={!ex=dt}doctype

If author *is* multi-select, then you wish to get facet counts for the
author field, ignoring the author:Mike restriction for the author
facet only:

q=mainquery
fq=status:public
fq={!tag=dt}doctype:(PDF OR Excel)
fq={!tag=auth}author:Mike
facet=onfacet.field={!ex=dt}doctype
facet.field={!ex=auth}author


-Yonik
http://www.lucidimagination.com


Re: Solr index deletion

2009-03-24 Thread Nasseam Elkarra
Correction: index was not deleted. The folder is still there with the  
index files in it but a *:* query returns 0 results. Is there a tool  
to check the health of an index?


Thanks,
Nasseam

On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:



Somehow that sounds very unlikely.  Have you looked at logs?  What  
have you found from Solr there?  I am not checking the sources, but  
I don't think there is any place in Solr where the index directory  
gets deleted.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Nasseam Elkarra nass...@bodukai.com
To: solr-user@lucene.apache.org
Sent: Tuesday, March 24, 2009 2:35:22 PM
Subject: Solr index deletion

On a few occasions, our development server crashed and in the  
process solr
deleted the index folder. We are suspecting another app on the  
server caused an
OutOfMemoryException on Tomcat causing all apps including solr to  
crash.


So my question is why is solr deleting the index? We are not doing  
any updates
to the index only reading from it so any insight would be  
appreciated.


Thank you,
Nasseam






Re: Solr index deletion

2009-03-24 Thread Otis Gospodnetic

There is, it's called CheckIndex and it is a part of Lucene (and Lucene jars 
that come with Solr, I believe):

http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html

 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Nasseam Elkarra nass...@bodukai.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, March 24, 2009 4:21:50 PM
 Subject: Re: Solr index deletion
 
 Correction: index was not deleted. The folder is still there with the index 
 files in it but a *:* query returns 0 results. Is there a tool to check the 
 health of an index?
 
 Thanks,
 Nasseam
 
 On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:
 
  
  Somehow that sounds very unlikely.  Have you looked at logs?  What have you 
 found from Solr there?  I am not checking the sources, but I don't think 
 there 
 is any place in Solr where the index directory gets deleted.
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: Nasseam Elkarra 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, March 24, 2009 2:35:22 PM
  Subject: Solr index deletion
  
  On a few occasions, our development server crashed and in the process solr
  deleted the index folder. We are suspecting another app on the server 
  caused 
 an
  OutOfMemoryException on Tomcat causing all apps including solr to crash.
  
  So my question is why is solr deleting the index? We are not doing any 
 updates
  to the index only reading from it so any insight would be appreciated.
  
  Thank you,
  Nasseam
  



Hardware Questions...

2009-03-24 Thread solr
We have three Solr servers (several two processor Dell PowerEdge
servers). I'd like to get three newer servers and I wanted to see what
we should be getting. I'm thinking the following...

 

Dell PowerEdge 2950 III 

2x2.33GHz/12M 1333MHz Quad Core 

16GB RAM 
6 x 146GB 15K RPM RAID-5 drives

 

How do people spec out servers, especially CPU, memory and disk? Is this
all based on the number of doc's, indexes, etc...

 

Also, what are people using for benchmarking and monitoring Solr? Thanks
- Mike



Re: Field tokenizer question

2009-03-24 Thread Chris Hostetter

: as far as I know solr.StrField is not analized but it is indexed as is
: (verbatim).

correct ... but there is definitely a bug here if the analysis.jsp 
is implying that an analyzer is being used...

https://issues.apache.org/jira/browse/SOLR-1086




-Hoss



Re: Delta import

2009-03-24 Thread AlexxelA

Ok i'm ok with the fact the solr gonna do X request to database for X
update.. but when i try to run the delta-import command with 2 row to
update is it normal that its kinda really slow ~ 1 document fetched / sec ?



Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 not possible really,
 
 that may not be useful to a lot of users because there may be too many
 changed ids and the 'IN' part can be really long.
 
 You can raise an issue anyway
 
 
 
 On Mon, Mar 23, 2009 at 9:30 PM, AlexxelA alexandre.boudrea...@canoe.ca
 wrote:

 I'm using the delta-import command.

 Here's the deltaQuery and deltaImportQuery i use :

 select uid from profil_view where last_modified 
 '${dataimporter.last_index_time}'
 select * from profil_view where uid='${dataimporter.delta.uid}

 When i look at the delta import status i see that the total request to
 datasource equal the number of modification i had.  Is it possible to
 make
 only one request to database and fetch all modification ?

 select * from profil_view where uid in ('${dataimporter.delta.ALLuid}')
 (something like that).
 --
 View this message in context:
 http://www.nabble.com/Delta-import-tp22663196p22663196.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 

-- 
View this message in context: 
http://www.nabble.com/Delta-import-tp22663196p22689588.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Commit is taking very long time

2009-03-24 Thread Chris Hostetter

: My application is in prod and quite frequently�getting NullPointerException.
...
: java.lang.NullPointerException
: at 
com.fm.search.incrementalindex.service.AuctionCollectionServiceImpl.indexData(AuctionCollectionServiceImpl.java:251)
: at 
com.fm.search.incrementalindex.service.AuctionCollectionServiceImpl.process(AuctionCollectionServiceImpl.java:135)
: at 
com.fm.search.job.SearchIndexingJob.executeInternal(SearchIndexingJob.java:68)
: at 
org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:86)
: at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
: at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)

that stack trace doesn't suggest anything remotely related to Solr.  none 
of those classes are in teh Solr code base -- without having any idea what 
the code on line 251 of your AuctionCollectionServiceImpl class looks 
like, no one could even begin to speculate what is causing the NPE.   Even 
if we know what line 251 looks like, uinderstanding why some refrence on 
that line is null would probably require knowing a whole lot more about 
your application.




-Hoss


RE: Exact Match

2009-03-24 Thread Chris Hostetter

: Depending on your needs, you might want to do some sort of minimal
: analysis on the field (ignore punctuation, lowercase,...) Here's the
: text_exact field that I use:

Deans reply is a great example of what exact is a vague term.  

with a TextField you can get an exact match using a simple phrase query 
(ie: putting quotes arround the input) assuming your meaning of exact is 
that all the tokens appera together in sequence, and assuming your 
analyzer doesn't change things in a way that makes a phrase search match 
in a way that you don't consider exact enough

if you want to ensure that the documents contains exactly what the user 
queried for, no more and no less, then using a copyField into StrField is 
really the best way to do that.




-Hoss



Re: Response schema for an update.

2009-03-24 Thread Chris Hostetter

: Subject: Response schema for an update.
: In-Reply-To: shivayigjyfbf88vtu21...@shiva.ceiindia.com
: References: 69de18140903230141t38dbcd28n40bbcc944ddb0...@mail.gmail.com
:  shivayigjyfbf88vtu21...@shiva.ceiindia.com

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/Thread_hijacking





-Hoss



Re: Not able to configure multicore

2009-03-24 Thread Chris Hostetter

: I am facing a problem related to multiple cores configuration. I have placed
: a solr.xml file in solr.home directory. eventhough when I am trying to
: access http://localhost:8983/solr/admin/cores it gives me tomcat error. 
: 
: Can anyone tell me what can be possible issue with this??

not without knowing exactly what the tomcat error message is, what your 
solr.xml file looks like, what log messages you see on startup, etc...




-Hoss



Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler

2009-03-24 Thread Chris Hostetter

Deja-Vu...

http://www.nabble.com/Missing-required-field%3A-id-Using-ExtractingRequestHandler-to22611039.html

: I'm performing this operation:
: 
: curl http://localhost:8983/solr/update/extract?ext.def.fl=text --data-binary
: @ZOLA.doc -H 'Content-type:text/html'
: 
: in order to index word document ZOLA.doc into Solr using the example
: schema.xml. It says I have not provided an 'id', which is a required field.
: I'm not sure how (syntactically) to provide the id- should it be part of the
: query string? And if so, how?


-Hoss



Re: lucene-java version mismatches

2009-03-24 Thread Paul Libbrecht




Le 24-mars-09 à 11:14, Shalin Shekhar Mangar a écrit :

On Tue, Mar 24, 2009 at 3:30 PM, Paul Libbrecht  
p...@activemath.org wrote:

Is there a lucene version that solr-lucene-core-1.3.0 is?


The lucene jars shipped with Solr 1.3.0 were 2.4-dev built from svn  
revision
r691741. You can check out the source from lucene's svn using that  
revision

number.


thanks,

that's useful,

could I suggest that the maven repositories are populated next-time a  
release of solr-specific-lucenes are made?


paul

smime.p7s
Description: S/MIME cryptographic signature


Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler

2009-03-24 Thread Chris Muktar
Fantastic thank you!

I'm executing this:
curl -F te...@zheng.doc -F 'commit=true'
http://localhost:8983/solr/update/extract?ext.def.fl=text\ext.literal.id=2

however performing the query
http://localhost:8983/solr/select?q=id:2

produces the output but without a text field. I'm not sure if it's being
extracted  indexed correctly. The commit is going through though. This is
using the example schema. Any thoughts? XML response follows...

response
-
lst name=responseHeader
int name=status0/int
int name=QTime2/int
-
lst name=params
str name=qid:2/str
/lst
/lst
-
result name=response numFound=1 start=0
-
doc
str name=id2/str
int name=popularity0/int
str name=sku2/str
date name=timestamp2009-03-24T22:27:00.714Z/date
/doc
/result
/response


Re: delta-import commit=false doesn't seems to work

2009-03-24 Thread sunnyfr

Hi,
Sorry I still don't know what should I do ???
I can see in my log which clearly optimize somewhere even if my command is
deltaimportoptimize=false
is it a parameter to add to the commit or to the snappuller or ???


Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM
org.apache.solr.handler.dataimport.SolrWriter persistStartTime INFO: Wrote
last indexed time to dataimport.properties
Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM
org.apache.solr.handler.dataimport.DocBuilder commit INFO: Full Import
completed successfully
Mar 24 23:02:44 search-01 jsvc.exec[22812]: Mar 24, 2009 11:02:44 PM
org.apache.solr.update.DirectUpdateHandler2 commit INFO: start
commit(optimize=true,waitFlush=false,waitSearcher=true)

thanks a lot for your help


sunnyfr wrote:
 
 Like you can see, I did that and I've no information in my DIH but you can
 notice in my logs and even my segments 
 that and optimize is fired alone automaticly?
 
 
 Noble Paul നോബിള്‍  नोब्ळ् wrote:
 
 just hit the DIH without any command and you may be able to see the
 status of the last import. It can tell you whether a commit/optimize
 was performed
 
 On Fri, Mar 20, 2009 at 7:07 PM, sunnyfr johanna...@gmail.com wrote:

 Thanks I gave more information there :
 http://www.nabble.com/Problem-for-replication-%3A-segment-optimized-automaticly-td22601442.html

 thanks a lot Paul


 Noble Paul നോബിള്‍  नोब्ळ् wrote:

 sorry, the whole thing was commented . I did not notice that. I'll
 look into that

 2009/3/20 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com:
 you have set autoCommit every x minutes . it must have invoked commit
 automatically


 On Thu, Mar 19, 2009 at 4:17 PM, sunnyfr johanna...@gmail.com wrote:

 Hi,

 Even if I hit command=delta-importcommit=falseoptimize=false
 I still have commit set in my logs and sometimes even optimize=true,

 About optimize I wonder if it comes from commitment too close and one
 is
 not
 done, but still I don't know really.

 Any idea?

 Thanks a lot,
 --
 View this message in context:
 http://www.nabble.com/delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22597630p22597630.html
 Sent from the Solr - User mailing list archive at Nabble.com.





 --
 --Noble Paul




 --
 --Noble Paul



 --
 View this message in context:
 http://www.nabble.com/Re%3A-delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22614216p22620439.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 -- 
 --Noble Paul
 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Re%3A-delta-import-commit%3Dfalse-doesn%27t-seems-to-work-tp22614216p22691417.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Trivial question: request for id when indexing using CURL ExtractingRequestHandler

2009-03-24 Thread Erik Hatcher
If your text field is not stored, then it won't be available in  
results.  That's the likely explanation.  Seems like all is well.


Erik

On Mar 24, 2009, at 11:34 PM, Chris Muktar wrote:


Fantastic thank you!

I'm executing this:
curl -F te...@zheng.doc -F 'commit=true'
http://localhost:8983/solr/update/extract?ext.def.fl=text 
\ext.literal.id=2


however performing the query
http://localhost:8983/solr/select?q=id:2

produces the output but without a text field. I'm not sure if it's  
being
extracted  indexed correctly. The commit is going through though.  
This is

using the example schema. Any thoughts? XML response follows...

response
-
lst name=responseHeader
int name=status0/int
int name=QTime2/int
-
lst name=params
str name=qid:2/str
/lst
/lst
-
result name=response numFound=1 start=0
-
doc
str name=id2/str
int name=popularity0/int
str name=sku2/str
date name=timestamp2009-03-24T22:27:00.714Z/date
/doc
/result
/response




Re: Hardware Questions...

2009-03-24 Thread Shashi Kant
Have you looked at http://wiki.apache.org/solr/SolrPerformanceData
?http://wiki.apache.org/solr/SolrPerformanceData

On Tue, Mar 24, 2009 at 4:51 PM, solr s...@highbeam.com wrote:

 We have three Solr servers (several two processor Dell PowerEdge
 servers). I'd like to get three newer servers and I wanted to see what
 we should be getting. I'm thinking the following...



 Dell PowerEdge 2950 III

 2x2.33GHz/12M 1333MHz Quad Core

 16GB RAM
 6 x 146GB 15K RPM RAID-5 drives



 How do people spec out servers, especially CPU, memory and disk? Is this
 all based on the number of doc's, indexes, etc...



 Also, what are people using for benchmarking and monitoring Solr? Thanks
 - Mike




Re: Solr index deletion

2009-03-24 Thread Nasseam Elkarra
The tool says there are no problems. Solr is pointing to the right  
directory so not sure what is preventing it from returning any  
results. Any ideas? Here is the output:


Segments file=segments_2 numSegments=1 version=FORMAT_USER_DATA  
[Lucene 2.9]

  1 of 1: name=_0 docCount=18021
compound=false
hasProx=true
numFiles=9
size (MB)=8.389
has deletions [delFileName=_0_1.del]
test: open reader.OK [18 deleted docs]
test: fields, norms...OK [35 fields]
test: terms, freq, prox...OK [60492 terms; 1157700 terms/docs  
pairs; 1224063 tokens]
test: stored fields...OK [386828 total field count; avg  
21.487 fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/ 
freq vector fields per doc]


No problems were detected with this index.

--

Thanks,
Nasseam


On Mar 24, 2009, at 1:34 PM, Otis Gospodnetic wrote:



There is, it's called CheckIndex and it is a part of Lucene (and  
Lucene jars that come with Solr, I believe):


http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Nasseam Elkarra nass...@bodukai.com
To: solr-user@lucene.apache.org
Sent: Tuesday, March 24, 2009 4:21:50 PM
Subject: Re: Solr index deletion

Correction: index was not deleted. The folder is still there with  
the index
files in it but a *:* query returns 0 results. Is there a tool to  
check the

health of an index?

Thanks,
Nasseam

On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:



Somehow that sounds very unlikely.  Have you looked at logs?  What  
have you
found from Solr there?  I am not checking the sources, but I don't  
think there

is any place in Solr where the index directory gets deleted.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Nasseam Elkarra
To: solr-user@lucene.apache.org
Sent: Tuesday, March 24, 2009 2:35:22 PM
Subject: Solr index deletion

On a few occasions, our development server crashed and in the  
process solr
deleted the index folder. We are suspecting another app on the  
server caused

an
OutOfMemoryException on Tomcat causing all apps including solr to  
crash.


So my question is why is solr deleting the index? We are not  
doing any

updates
to the index only reading from it so any insight would be  
appreciated.


Thank you,
Nasseam








get all facets

2009-03-24 Thread Ashish P

Can I get all the facets in QueryResponse??
Thanks,
Ashish
-- 
View this message in context: 
http://www.nabble.com/get-all-facets-tp22693809p22693809.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: autocommit and crashing tomcat

2009-03-24 Thread Jacob Singh
Hi Yonik,

Thanks for the response.  If I shut down tomcat cleanly, does it
commit all uncommitted documents?

Best,
Jacob


-- Forwarded message --
From: Yonik Seeley yo...@lucidimagination.com
Date: Tue, Mar 24, 2009 at 8:48 PM
Subject: Re: autocommit and crashing tomcat
To: solr-user@lucene.apache.org


On Tue, Mar 24, 2009 at 5:52 AM, Jacob Singh jacobsi...@gmail.com wrote:
 If I'm using autocommit, and I have a crash of tomcat (or the whole
 machine) while there are still docs pending, will I lose those
 documents in limbo

Yep.

 If the answer is they go away: Is there anyway to ensure integrity
 of an update?

You can only be sure that the docs are on the disk after you have done a commit.

An optional transaction log would be part of high-availability for
writes, is something we should eventually get to though.

-Yonik
http://www.lucidimagination.com



-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


Re: Solr index deletion

2009-03-24 Thread Otis Gospodnetic

Hm, you are not saying much about what you've tried.  Could it be your Solr 
home is wrong and not even pointing to the index you just checked?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Nasseam Elkarra nass...@bodukai.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, March 24, 2009 7:47:08 PM
 Subject: Re: Solr index deletion
 
 The tool says there are no problems. Solr is pointing to the right directory 
 so 
 not sure what is preventing it from returning any results. Any ideas? Here is 
 the output:
 
 Segments file=segments_2 numSegments=1 version=FORMAT_USER_DATA [Lucene 2.9]
   1 of 1: name=_0 docCount=18021
 compound=false
 hasProx=true
 numFiles=9
 size (MB)=8.389
 has deletions [delFileName=_0_1.del]
 test: open reader.OK [18 deleted docs]
 test: fields, norms...OK [35 fields]
 test: terms, freq, prox...OK [60492 terms; 1157700 terms/docs pairs; 
 1224063 
 tokens]
 test: stored fields...OK [386828 total field count; avg 21.487 fields 
 per doc]
 test: term vectorsOK [0 total vector count; avg 0 term/freq 
 vector 
 fields per doc]
 
 No problems were detected with this index.
 
 --
 
 Thanks,
 Nasseam
 
 
 On Mar 24, 2009, at 1:34 PM, Otis Gospodnetic wrote:
 
  
  There is, it's called CheckIndex and it is a part of Lucene (and Lucene 
  jars 
 that come with Solr, I believe):
  
  
 http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/index/CheckIndex.html
  
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: Nasseam Elkarra 
  To: solr-user@lucene.apache.org
  Sent: Tuesday, March 24, 2009 4:21:50 PM
  Subject: Re: Solr index deletion
  
  Correction: index was not deleted. The folder is still there with the index
  files in it but a *:* query returns 0 results. Is there a tool to check the
  health of an index?
  
  Thanks,
  Nasseam
  
  On Mar 24, 2009, at 11:49 AM, Otis Gospodnetic wrote:
  
  
  Somehow that sounds very unlikely.  Have you looked at logs?  What have 
  you
  found from Solr there?  I am not checking the sources, but I don't think 
 there
  is any place in Solr where the index directory gets deleted.
  
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
  - Original Message 
  From: Nasseam Elkarra
  To: solr-user@lucene.apache.org
  Sent: Tuesday, March 24, 2009 2:35:22 PM
  Subject: Solr index deletion
  
  On a few occasions, our development server crashed and in the process 
  solr
  deleted the index folder. We are suspecting another app on the server 
 caused
  an
  OutOfMemoryException on Tomcat causing all apps including solr to crash.
  
  So my question is why is solr deleting the index? We are not doing any
  updates
  to the index only reading from it so any insight would be appreciated.
  
  Thank you,
  Nasseam
  
  



Re: lucene-java version mismatches

2009-03-24 Thread Shalin Shekhar Mangar
On Wed, Mar 25, 2009 at 3:23 AM, Paul Libbrecht p...@activemath.org wrote:


 could I suggest that the maven repositories are populated next-time a
 release of solr-specific-lucenes are made?


But they are? It is inside the org.apache.solr group since those lucene jars
are released by Solr -- http://repo2.maven.org/maven2/org/apache/solr/

-- 
Regards,
Shalin Shekhar Mangar.


Re: Delta import

2009-03-24 Thread Shalin Shekhar Mangar
On Wed, Mar 25, 2009 at 2:25 AM, AlexxelA alexandre.boudrea...@canoe.cawrote:


 Ok i'm ok with the fact the solr gonna do X request to database for X
 update.. but when i try to run the delta-import command with 2 row to
 update is it normal that its kinda really slow ~ 1 document fetched / sec ?


Not really, I've seen 1000x faster. Try firing a few of those queries on the
database directly. Are they slow? Is the database remote?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Not able to configure multicore

2009-03-24 Thread mitulpatel



hossman wrote:
 
 
 : I am facing a problem related to multiple cores configuration. I have
 placed
 : a solr.xml file in solr.home directory. eventhough when I am trying to
 : access http://localhost:8983/solr/admin/cores it gives me tomcat error. 
 : 
 : Can anyone tell me what can be possible issue with this??
 
 not without knowing exactly what the tomcat error message is, what your 
 solr.xml file looks like, what log messages you see on startup, etc...
 
 -Hoss
 
 
Hello Hoss,

Thanks for reply.

Here is the error message shown on browser:
HTTP Status 404 - /solr2/admin/cores
type Status report
message /solr2/admin/cores
description The requested resource (/solr2/admin/cores) is not available.

and here is the solr.xml file.
solr persistent=true sharedLib=lib
 cores adminPath=/admin/cores
  core name=core0 instanceDir=core0 /
  core name=core1 instanceDir=core1 /
 /cores
/solr


-- 
View this message in context: 
http://www.nabble.com/Not-able-to-configure-multicore-tp22682691p22695098.html
Sent from the Solr - User mailing list archive at Nabble.com.



Index time boost

2009-03-24 Thread Gargate, Siddharth
Hi all,
Can we specify the index-time boost value for a particular field in
schema.xml?
 
Thanks,
Siddharth


Snapinstaller + Overlapping onDeckSearchers Problems

2009-03-24 Thread Cloude Porteus
We have been running our solr slaves without autowarming our new searchers
for a long time, but that was causing us 50-75 requests in 20+ seconds
timeframe after every update on the slaves. I have turned on autowarming and
that has fixed our slow response times, but I'm running into occasional
Overlapping onDeckSearchers.

We have replication setup and are using the snapinstaller script every 10
minutes:

/home/solr/bin/snappuller -M util01 -P 18984 -D /home/solr/write/data -S
/home/solr/logs -d /home/solr/read/data -u instruct;
/home/solr/bin/snapinstaller -M util01 -S /home/solr/write/logs -d
/home/solr/read/data -u instruct

Here's what a successful update/commit log looks like:

[14:13:02.510] start
commit(optimize=false,waitFlush=false,waitSearcher=true)
[14:13:02.522] Opening searc...@e9b4bb main
[14:13:02.524] end_commit_flush
[14:13:02.525] autowarming searc...@e9b4bb main from searc...@159e6e8 main
[14:13:02.525]
filterCache{lookups=1809739,hits=1766607,hitratio=0.97,inserts=43211,evictions=0,
size=43154,cumulative_lookups=1809739,cumulative_hits=1766607,cumulative_hitratio=0.97,cumulative_inserts=43211,cumulative_evictions=0}
--
[14:15:42.372] {commit=} 0 159964
[14:15:42.373] /update  0 159964

Here's what a unsuccessful update/commit log looks like, where the /update
took too long and we started another commit:

[21:03:03.829] start
commit(optimize=false,waitFlush=false,waitSearcher=true)
[21:03:03.836] Opening searc...@b2f2d6 main
[21:03:03.836] end_commit_flush
[21:03:03.836] autowarming searc...@b2f2d6 main from searc...@103c520 main
[21:03:03.836]
filterCache{lookups=1062196,hits=1062160,hitratio=0.99,inserts=49144,evictions=0,size=48353,cumulative_lookups=259485564,cumulative_hits=259426904,cumulative_hitratio=0.99,cumulative_inserts=68467,cumulative_evictions=0}
--
[21:23:04.794] start
commit(optimize=false,waitFlush=false,waitSearcher=true)
[21:23:04.794] PERFORMANCE WARNING: Overlapping onDeckSearchers=2
[21:23:04.802] Opening searc...@f11bc main
[21:23:04.802] end_commit_flush
--
[21:24:55.987] {commit=} 0 1312158
[21:24:55.987] /update  0 1312158


I don't understand why this sometimes takes two minutes between the start
commit  /update and sometimes takes 20 minutes? One of our caches has about
~40,000 items, but I can't imagine it taking 20 minutes to autowarm a
searcher.

It would be super handy if the Snapinstaller script would wait until the
previous one was done before starting a new one, but I'm not sure how to
make that happen.

Thanks for any help with this.

best,
cloude

-- 
VP of Product Development
Instructables.com

http://www.instructables.com/member/lebowski


Re: Index time boost

2009-03-24 Thread Shalin Shekhar Mangar
On Wed, Mar 25, 2009 at 10:14 AM, Gargate, Siddharth sgarg...@ptc.comwrote:

 Hi all,
Can we specify the index-time boost value for a particular field in
 schema.xml?


No. You can specify it along with the document when you add it to Solr.

-- 
Regards,
Shalin Shekhar Mangar.