Re: Delete solr data from disk space

2009-08-05 Thread Ashish Kumar Srivastava

Hi Toby,

Thanks but i have tried this solution earlier but the problem with this
solution is that
it is taking too much disk space for optimization(more than two times of
originally index data size)
Do you have any better solution or any other option by which we can use
optimize without using too much space.

Thanks 
Ashish




Toby Cole-2 wrote:
 
 Hi Anish,
   Have you optimized your index?
 When you delete documents in lucene they are simply marked as  
 'deleted', they aren't physically removed from the disk.
 To get the disk space back you must run an optimize, which re-writes  
 the index out to disk without the deleted documents, then deletes the  
 original.
 
 Toby
 
 On 4 Aug 2009, at 14:41, Ashish Kumar Srivastava wrote:
 

 Hi ,


 Sorry!! But this solution will not work because I deleted data by  
 certain
 query.
 Then how can i know which files should be deleted. I cant delete  
 whole data.



 Markus Jelsma - Buyways B.V. wrote:

 Hello,


 A rigorous but quite effective method is manually deleting the  
 files in
 your SOLR_HOME/data directory and reindex the documents you want.  
 This
 will surely free some diskspace.


 Cheers,

 -
 Markus Jelsma  Buyways B.V. Tel.  
 050-3118123
 Technisch ArchitectFriesestraatweg 215c Fax.  
 050-3118124
 http://www.buyways.nl  9743 AD GroningenKvK  01074105


 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote:

 I am facing a problem in deleting solr data form disk space.
 I had 80Gb of of solr data. I deleted 30% of these data by using  
 query in
 solr-php client and committed.
 Now deleted data is not visible from the solr UI but used disk  
 space is
 still 80Gb for solr data.
 Please reply if you have any solution to free the disk space after
 deleting
 some solr data.

 Thanks in advance.



 -- 
 View this message in context:
 http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 --
 Toby Cole
 Software Engineer, Semantico Limited
 Registered in England and Wales no. 03841410, VAT no. GB-744614334.
 Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.
 
 Check out all our latest news and thinking on the Discovery blog
 http://blogs.semantico.com/discovery-blog/
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24821241.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Delete solr data from disk space

2009-08-05 Thread Ashish Kumar Srivastava

Hi Toby,

Thanks for the reply, But i have tried this solution earlier but the problem
with this solution is that
it is taking too much disk space for optimization(more than two times of
originally index data size)
Do you have any better solution or any other option by which we can use
optimize without using too much space.

Thanks
Ashish 



Toby Cole-2 wrote:
 
 Hi Anish,
   Have you optimized your index?
 When you delete documents in lucene they are simply marked as  
 'deleted', they aren't physically removed from the disk.
 To get the disk space back you must run an optimize, which re-writes  
 the index out to disk without the deleted documents, then deletes the  
 original.
 
 Toby
 
 On 4 Aug 2009, at 14:41, Ashish Kumar Srivastava wrote:
 

 Hi ,


 Sorry!! But this solution will not work because I deleted data by  
 certain
 query.
 Then how can i know which files should be deleted. I cant delete  
 whole data.



 Markus Jelsma - Buyways B.V. wrote:

 Hello,


 A rigorous but quite effective method is manually deleting the  
 files in
 your SOLR_HOME/data directory and reindex the documents you want.  
 This
 will surely free some diskspace.


 Cheers,

 -
 Markus Jelsma  Buyways B.V. Tel.  
 050-3118123
 Technisch ArchitectFriesestraatweg 215c Fax.  
 050-3118124
 http://www.buyways.nl  9743 AD GroningenKvK  01074105


 On Tue, 2009-08-04 at 06:26 -0700, Ashish Kumar Srivastava wrote:

 I am facing a problem in deleting solr data form disk space.
 I had 80Gb of of solr data. I deleted 30% of these data by using  
 query in
 solr-php client and committed.
 Now deleted data is not visible from the solr UI but used disk  
 space is
 still 80Gb for solr data.
 Please reply if you have any solution to free the disk space after
 deleting
 some solr data.

 Thanks in advance.



 -- 
 View this message in context:
 http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24808883.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 --
 Toby Cole
 Software Engineer, Semantico Limited
 Registered in England and Wales no. 03841410, VAT no. GB-744614334.
 Registered office Lees House, 21-23 Dyke Road, Brighton BN1 3FE, UK.
 
 Check out all our latest news and thinking on the Discovery blog
 http://blogs.semantico.com/discovery-blog/
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Delete-solr-data-from-disk-space-tp24808676p24821271.html
Sent from the Solr - User mailing list archive at Nabble.com.



DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)

2009-08-05 Thread Chantal Ackermann

Hi all,

the database from which I populate the SOLR index is refreshed
partially. Subsets of the data is deleted and readded for a certain
group identifier. Is it possible to do something alike in a (delta) 
import of the DataImportHandler?


Example:
SOLR-Index:
groupID: 1, PK: 1, refreshDate: [before last_index_time]
groupID: 1, PK: 2, refreshDate: [before last_index_time]
groupID: 1, PK: 3, refreshDate: [before last_index_time]

Refreshed DB:
groupID: 1, PK: 1, refreshDate: [after last_index_time]
groupID: 1, PK: 5, refreshDate: [after last_index_time]
groupID: 1, PK: 30, refreshDate: [after last_index_time]
(PK 2 and 3 are not there, anymore. PK is unique across all groupIDs)

deleteQuery=groupID:1
(An attribute of the entity element that the DocBuilder (1.3) reads and
sends as query once, before the delta import, unchanged to the SOLR
writer to delete documents.)

After that, the delta import loads data with groupID=1 from the DB.

Could I plug into SOLR with maybe a custom processor to achieve
something in the direction of:

deleteInput=select FIELD_VALUE from TABLE where CHANGED_DATE 
'${dataimporter.last_index_time}' group by FIELD_VALUE
deleteQuery=field:${my_entity.FIELD_VALUE}

FIELD_VALUE is not the primary key, and the deleteInput query can
return multiple rows.


I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will
help me. In those cases it looks like the delete is run per entity. I
want the delete to run before the (delta)import, once.
If that impression is wrong, I'll happily switch to 1.4, of course.

Cheers!
Chantal


--
Chantal Ackermann




Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)

2009-08-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
did you explore the deletedPkQuery ?

On Wed, Aug 5, 2009 at 11:46 AM, Chantal
Ackermannchantal.ackerm...@btelligent.de wrote:
 Hi all,

 the database from which I populate the SOLR index is refreshed
 partially. Subsets of the data is deleted and readded for a certain
 group identifier. Is it possible to do something alike in a (delta) import
 of the DataImportHandler?

 Example:
 SOLR-Index:
 groupID: 1, PK: 1, refreshDate: [before last_index_time]
 groupID: 1, PK: 2, refreshDate: [before last_index_time]
 groupID: 1, PK: 3, refreshDate: [before last_index_time]

 Refreshed DB:
 groupID: 1, PK: 1, refreshDate: [after last_index_time]
 groupID: 1, PK: 5, refreshDate: [after last_index_time]
 groupID: 1, PK: 30, refreshDate: [after last_index_time]
 (PK 2 and 3 are not there, anymore. PK is unique across all groupIDs)

 deleteQuery=groupID:1
 (An attribute of the entity element that the DocBuilder (1.3) reads and
 sends as query once, before the delta import, unchanged to the SOLR
 writer to delete documents.)

 After that, the delta import loads data with groupID=1 from the DB.

 Could I plug into SOLR with maybe a custom processor to achieve
 something in the direction of:

 deleteInput=select FIELD_VALUE from TABLE where CHANGED_DATE 
 '${dataimporter.last_index_time}' group by FIELD_VALUE
 deleteQuery=field:${my_entity.FIELD_VALUE}

 FIELD_VALUE is not the primary key, and the deleteInput query can
 return multiple rows.


 I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will
 help me. In those cases it looks like the delete is run per entity. I
 want the delete to run before the (delta)import, once.
 If that impression is wrong, I'll happily switch to 1.4, of course.

 Cheers!
 Chantal


 --
 Chantal Ackermann






-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: change sort order for MoreLikeThis

2009-08-05 Thread Renz Daluz
Thanks guys.
I tried to boost it instead (as sort looks like not supported) but it's not
taking effect. Here are the parameters that I'm using:

I want to boost by time_published field and I enable mlt.boost
bf=recip(rord(time_published),1,1000,165)^1500qt=mltmlt.boost=true


Regards,
/Renz



2009/8/4 Avlesh Singh avl...@gmail.com

 
  You lost me.
 
 Absolutely sorry about that Bill :(

 How does boosting change the sort order?

 What I really meant here is that if you have more than one similarity
 fields in you MLT query, you can boost the results found due to one over
 the
 other. It was not at all aimed to be an answer for sort. Actually, I was
 too
 prompt to respond!

 What about sorting on a field that is not the mlt field?
 
 Haven't tried this yet. It would be surprising if it does not work as
 expected.

 Cheers
 Avlesh

 On Tue, Aug 4, 2009 at 3:24 AM, Bill Au bill.w...@gmail.com wrote:

  Avlesh,
  You lost me.  How does boosting change the sort order?  What about
  sorting on a field that is not the mlt field?
 
  Bill
 
  On Mon, Aug 3, 2009 at 3:13 AM, Avlesh Singh avl...@gmail.com wrote:
 
   You can boost the similarity field matches, if you want. Look for
  mlt.boost
   at http://wiki.apache.org/solr/MoreLikeThis
  
   Cheers
   Avlesh
  
   On Mon, Aug 3, 2009 at 11:33 AM, Renz Daluz renz052...@gmail.com
  wrote:
  
Hi,
   
I'm looking at changing the result order when searching by MLT. I
 tried
   the
sort=field,order but it's not working. I check the wiki and can't
   find
anything. Is there a way to do this?
   
Thanks,
/Laurence
   
  
 



Re: change sort order for MoreLikeThis

2009-08-05 Thread Renz Daluz
Oh and yes, I tried to sort that is not mlt field and it's not taking
effect:
Here the whole parameters that I'm using:

mlt.fl=text,titletie=0.01mlt.mintf=1mlt.match.include=truefl=tagged_bucket,tagged_entitiesbf=recip(rord(time_published),1,1000,165)^1500qt=mltmlt.minwl=3mm=5mlt.boost=trueqf=text^0.5+title^0.4++description^0.01+keywords^0.01+bestlink_keywords^0.1+authors_t^0.05mlt.maxwl=20mlt.maxntp=200mlt.maxqt=10mlt.interestingTerms=detailsrows=200mlt.mindf=3pf=text^300+title^10+tagged_entities^200+inbound_text^1+bestlink_keywords^1q=id:story|25584945cps=1sort=time_published+desc


Thanks,
Renz


2009/8/5 Renz Daluz renz052...@gmail.com

 Thanks guys.
 I tried to boost it instead (as sort looks like not supported) but it's not
 taking effect. Here are the parameters that I'm using:

 I want to boost by time_published field and I enable mlt.boost
 bf=recip(rord(time_published),1,1000,165)^1500qt=mltmlt.boost=true


 Regards,
 /Renz



 2009/8/4 Avlesh Singh avl...@gmail.com

 
  You lost me.
 
 Absolutely sorry about that Bill :(

 How does boosting change the sort order?

 What I really meant here is that if you have more than one similarity
 fields in you MLT query, you can boost the results found due to one over
 the
 other. It was not at all aimed to be an answer for sort. Actually, I was
 too
 prompt to respond!

 What about sorting on a field that is not the mlt field?
 
 Haven't tried this yet. It would be surprising if it does not work as
 expected.

 Cheers
 Avlesh

 On Tue, Aug 4, 2009 at 3:24 AM, Bill Au bill.w...@gmail.com wrote:

  Avlesh,
  You lost me.  How does boosting change the sort order?  What about
  sorting on a field that is not the mlt field?
 
  Bill
 
  On Mon, Aug 3, 2009 at 3:13 AM, Avlesh Singh avl...@gmail.com wrote:
 
   You can boost the similarity field matches, if you want. Look for
  mlt.boost
   at http://wiki.apache.org/solr/MoreLikeThis
  
   Cheers
   Avlesh
  
   On Mon, Aug 3, 2009 at 11:33 AM, Renz Daluz renz052...@gmail.com
  wrote:
  
Hi,
   
I'm looking at changing the result order when searching by MLT. I
 tried
   the
sort=field,order but it's not working. I check the wiki and
 can't
   find
anything. Is there a way to do this?
   
Thanks,
/Laurence
   
  
 





query matching issue

2009-08-05 Thread Radha C.
Hello list,
 
I have documents contains word Richard Nass. I need to match the Richard
Nass documents for the query strings richard, nass, rich.
The search works for the following query ,
 
http://localhost:8983/solr/select?q=author:Richard
http://localhost:8983/solr/select?q=author:Richard nass nass 
http://localhost:8983/solr/select?q=author:Richard
http://localhost:8983/solr/select?q=author:Richard Nass Nass
http://localhost:8983/solr/select?q=author:richard
http://localhost:8983/solr/select?q=author:richard nass nass
 
But doesnot work for q=author:Richard, q=author:nass q=author:rich...
 
I tried wildcard search like q=author:rich* also.
 
Can anyone help me how to get the flexible search as above.
 
Thanks in advance..
 
Radha.C
 


Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)

2009-08-05 Thread Chantal Ackermann

Hi Paul,

yes, I did and I just verified in the code. The deletedPkQuery is used 
to collect all primary keys of the root entity that shall be deleted 
from the index.


The deletion is done on the SOLR writer by unique ID:
  writer.deleteDoc(deletedKey.get(root.pk)); //DocBuilder

  delCmd.id = id.toString(); // SOLR Writer deleteDoc()
  delCmd.fromPending = true;
  delCmd.fromCommitted = true;
  processor.processDelete(delCmd);

// RunUpdateProcessorFactory
  @Override
  public void processDelete(DeleteUpdateCommand cmd) throws IOException {
if( cmd.id != null ) {
  updateHandler.delete(cmd); // writer.deleteDoc() uses that
}
else {
  updateHandler.deleteByQuery(cmd); // I would like to use that
}
super.processDelete(cmd);
  }

My problem is that the ids I have to delete are those that do not exist 
in the database anymore. So, I have no means to return them by DB query. 
That is why I would like to use a different field that a group of 
documents has in common, and that would allow me to get hold of the 
outdated documents in the index. (But I have to find out the value of 
that other field by DB query.)


Cheers,
Chantal


Noble Paul നോബിള്‍ नोब्ळ् schrieb:

did you explore the deletedPkQuery ?

On Wed, Aug 5, 2009 at 11:46 AM, Chantal
Ackermannchantal.ackerm...@btelligent.de wrote:

Hi all,

the database from which I populate the SOLR index is refreshed
partially. Subsets of the data is deleted and readded for a certain
group identifier. Is it possible to do something alike in a (delta) import
of the DataImportHandler?

Example:
SOLR-Index:
groupID: 1, PK: 1, refreshDate: [before last_index_time]
groupID: 1, PK: 2, refreshDate: [before last_index_time]
groupID: 1, PK: 3, refreshDate: [before last_index_time]

Refreshed DB:
groupID: 1, PK: 1, refreshDate: [after last_index_time]
groupID: 1, PK: 5, refreshDate: [after last_index_time]
groupID: 1, PK: 30, refreshDate: [after last_index_time]
(PK 2 and 3 are not there, anymore. PK is unique across all groupIDs)

deleteQuery=groupID:1
(An attribute of the entity element that the DocBuilder (1.3) reads and
sends as query once, before the delta import, unchanged to the SOLR
writer to delete documents.)

After that, the delta import loads data with groupID=1 from the DB.

Could I plug into SOLR with maybe a custom processor to achieve
something in the direction of:

deleteInput=select FIELD_VALUE from TABLE where CHANGED_DATE 
'${dataimporter.last_index_time}' group by FIELD_VALUE
deleteQuery=field:${my_entity.FIELD_VALUE}

FIELD_VALUE is not the primary key, and the deleteInput query can
return multiple rows.


I am aware of SOLR-1060 and SOLR-1059 but I am not sure that those will
help me. In those cases it looks like the delete is run per entity. I
want the delete to run before the (delta)import, once.
If that impression is wrong, I'll happily switch to 1.4, of course.

Cheers!
Chantal


--
Chantal Ackermann







--
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: DataImportHandler: Partial Delete and Update (Hacking deleteQuery in SOLR 1.3?)

2009-08-05 Thread Chantal Ackermann

Thanks, Paul! :-)

The wiki doesn't mark $deleteDocByQuery (and the other special commands) 
as 1.4, as it usually does. Maybe it's worth correcting that?


Noble Paul നോബിള്‍ नोब्ळ् schrieb:

ok, writing an EntityProcessor/Transofrmer may help here use the special command
http://wiki.apache.org/solr/DataImportHandler#head-5e9ebf5a2aaa1dc54464102c395ed1bf7cdb98c3

$deleteDocByQuery is what you need .



query matching issue

2009-08-05 Thread Radha C.
Hello list,

I have documents contain word Richard Nass. I need to match the Richard
Nass documents for the query strings richard, nass, rich.

The search works for the following queries ,

 http://localhost:8983/solr/select?q=author:Richard nass
http://localhost:8983/solr/select?q=author:Richard nass

 http://localhost:8983/solr/select?q=author:Richard Nass
http://localhost:8983/solr/select?q=author:Richard Nass

 http://localhost:8983/solr/select?q=author:richard nass
http://localhost:8983/solr/select?q=author:richard nass 


But doesnot work for q=author:Richard, q=author:nass q=author:rich 

I tried wildcard search like q=author:rich* also.

Can anyone help me how to get the flexible search as above.

Thanks in advance..

Radha.C



Re: ClassCastException from custom request handler

2009-08-05 Thread James Brady
OK, problem solved! Well, worked around.

I gave up on the new style plugin loading in a multicore Jetty setup, and
packaged up my plugin in a rebuilt solr.war.

I had tried this before, but only putting the class files in WEB-INF/lib. If
I put a jar file in there, it works.

2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de



 James Brady schrieb:

 Yeah I was thinking T would be SolrRequestHandler too. Eclipse's debugger
 can't tell me...


 You could try disassembling. Or Eclipse opens classes in a very rudimentary
 format when there is no source code attached. Maybe it shows the actual
 return value there, instead of T.


 Lot's of other handlers are created with no problem before my plugin falls
 over, so I don't think it's a problem with T not being what we expected.

 Do you know of any working examples of plugins I can download and build in
 my environment to see what happens?


 No sorry. I've only overwritten the EntityProcessor from DataImportHandler,
 and that is not configured in solrconfig.xml.




 2009/8/4 Chantal Ackermann chantal.ackerm...@btelligent.de

  Code is from AbstractPluginLoader in the solr plugin package, 1.3 (the
 regular stable release, no svn checkout).


  80-84

 @SuppressWarnings(unchecked)
 protected T create( ResourceLoader loader, String name, String
 className, Node node ) throws Exception
 {
  return (T) loader.newInstance( className, getDefaultPackages() );
 }



 --
 http://twitter.com/goodgravy
 512 300 4210
 http://webmynd.com/
 Sent from Bury, United Kingdom




-- 
http://twitter.com/goodgravy
512 300 4210
http://webmynd.com/
Sent from Bury, United Kingdom


mergeContiguous for multiple search terms

2009-08-05 Thread Hachmann, Bjoern
Hello,
 
we would like to use the highlightingComponent with the mergeContiguous 
parameter set to true. 
 
We have a field with value: Ökonom Charles Goodhart.
 
If we search for all three words, they are found correctly: emÖkonom/em 
emCharles/em emGoodhart/em
 
But, as I set the mergeContiguous parameter to true, I expected: emÖkonom 
Charles Goodhart/em. Am I misunderstanding the behaviour of this parameter? 
We are using the dismax-query parser and solr-1.3.
 
Thank you very much for your time.
Björn Hachmann
 
 
 


help getting started with spell check dictionary

2009-08-05 Thread Ian Connor
Hi,

I have downloaded a dictionary in plane text format from
http://icon.shef.ac.uk/Moby/mwords.html and added it to my /mnt directory.

When I tried to add:

 lst name=dictionary
   str name=nameexternal/str
   str
name=typeorg.apache.solr.spelling.FileBasedSpellChecker/str
   str name=sourceLocation/mnt/dictionary.txt/str
   str name=fieldTypetext/str
  /lst

within the requestHandler name=spellchecker
class=solr.SpellCheckerRequestHandler startup=lazy block, I thought it
would be as easy as running a query like:

http://localhost:8983/solr/select/?q=cancrspellcheck=truespellcheck.build=true

to get it to work. Can anyone tell me what steps I am missing here? Thanks
for any help here.

I was trying to get the idea from the example here:
https://issues.apache.org/jira/browse/SOLR-572 after reading through
http://wiki.apache.org/solr/SpellCheckComponent

-- 
Regards,

Ian Connor


SolrJ and ISO-8859-1

2009-08-05 Thread Schilperoort , René
Hello,

Is it possible to change the encoding of the SolrJ request and response?

Regards, Rene


Index rebuilding.

2009-08-05 Thread caezar

Hi All,

Am I right, when I say, that solr index is rebuild, when 'commit' command
send?
Let's suppose yes. For instance, I have solr index with 1M document, and
then I'm committing one more million documents. Here is some questions:
- will this (second) commit took longer then first one? much longer?
- Will it use some drive space for temporary data, while rebuilding index,
which is then will be free? how much?
- Is is possible to perform searches, which this rebuilding is in progress?

Thanks!
-- 
View this message in context: 
http://www.nabble.com/Index-rebuilding.-tp24829220p24829220.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DisMax - fetching dynamic fields

2009-08-05 Thread Alexey Serba
My bad! Please disregard this post.

Alex

On Tue, Aug 4, 2009 at 9:21 PM, Alexey Serbaase...@gmail.com wrote:
 Solr 1.4 built from trunk revision 790594 ( 02 Jul 2009 )

 On Tue, Aug 4, 2009 at 9:19 PM, Alexey Serbaase...@gmail.com wrote:
 Hi everybody,

 I have a couple of dynamic fields in my schema, e.g. rating_* popularity_*

 The problem I have is that if I try to specify existing fields
 rating_1 popularity_1 in fl parameter - DisMax handler just
 ignores them whereas StandardRequestHandler works fine.

 Any clues what's wrong?

 Thanks in advance,
 Alex




Re: Index rebuilding.

2009-08-05 Thread Shalin Shekhar Mangar
On Wed, Aug 5, 2009 at 8:21 PM, caezar caeza...@gmail.com wrote:


 Hi All,

 Am I right, when I say, that solr index is rebuild, when 'commit' command
 send?
 Let's suppose yes. For instance, I have solr index with 1M document, and
 then I'm committing one more million documents. Here is some questions:
 - will this (second) commit took longer then first one? much longer?


When you do the second commit, the auto-warming of caches and/or queries on
newSearcher may take longer. Also, during indexing segments may get merged
which may add some time.


 - Will it use some drive space for temporary data, while rebuilding index,
 which is then will be free? how much?


No. Commit should not need extra drive space. An optimize may need
additional space temporarily. But it is always good to have extra free space
on the disk.



 - Is is possible to perform searches, which this rebuilding is in progress?


Yes.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Wild card search does not return any result

2009-08-05 Thread Mohamed Parvez
Thanks Otis and Avlesh,

Below is the configuration I have

1] solrconfig.xml

.
  requestHandler name=standard class=solr.SearchHandler default=true
 lst name=defaults
   str name=echoParamsexplicit/str
  str name=spellcheck.onlyMorePopularfalse/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.count1/str
/lst
 arr name=last-components
  strspellcheck/str
/arr
  /requestHandler
.
.
  requestHandler name=/dataimport
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
  str name=configdata-import.xml/str
/lst
  /requestHandler
..
..
  searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpell/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldSPELL/str
  str name=spellcheckIndexDir./spellcheckerIndex/str
  str name=buildOnCommittrue/str
  str name=buildOnOptimizetrue/str
/lst
  /searchComponent

2] data-import.xml

.
..
document name=doc
entity name=user pk=ID
  query=select * from user
field column=ROLE name=ROLE /
field column=ID name=ID /
field column=BUS name=BUS /
.
.

3] schema.xml
..
..
field name=ID type=float indexed=true stored=true /
field name=BUS type=text indexed=true stored=true/
field name=ROLE type=text indexed=true stored=true /
..
..
field name=ID type=float indexed=true stored=true /
field name=BUS type=text indexed=true stored=true/
field name=ROLE type=text indexed=true stored=true /
field name=SPELL type=textSpell indexed=true stored=true
multiValued=true/
copyField source=BUS dest=SPELL /
..
..
fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType




To make it simple. I have only one record in the table,
ID=1
BUS=ICS
ROLE=SSE


like I said before,
*I don't get any match, if i search for q=ics*
I get the match, which is correct result, if i search for q=sse**

I have not done any query rewriting, i am just using the default
configuration, that comes with solr.

Otis, Let me know if you need any more information.

Avlesh, The above set up is just a striped down version, to figure out what
is the issue, In my real application, I have 100 of collums in the table,
that i use for building the search index. I dont think its a good option to
copy over all the fields and create another 100 odd fields, with just lower
case filter applied.


Parvez


From: Otis Gospodnetic otis_gospodne...@yahoo.com
Date: Tue, Aug 4, 2009 at 8:25 PM
Subject: Re: Wild card search does not return any result
To: solr-user@lucene.apache.org


Hi,

I doubt it's a bug.  It's probably working correctly based on the config,
etc., I just don't have enough details about the configuration, your request
handler, query rewriting, the data in your index, etc. to tell you what
exactly is happening.

 Otis


On Tue, Aug 4, 2009 at 11:13 PM, Avlesh Singh avl...@gmail.com wrote:

 You read it incorrectly Parvez.
 The bug that Bill seem to have found out is with the analysis tool and
 NOT
 the search handler itself. Results in your case is as expected. Wildcard
 queries are not analyzed hence the inconsistency.
 A workaround is suggested, on the same thread, here -

 http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:i5zxdbnvspgek2bp+state:results

 Cheers
 Avlesh

 On Wed, Aug 5, 2009 at 12:52 AM, Mohamed Parvez par...@gmail.com wrote:

  Thanks Otis, The thread suggests that this is bug
 
 
 
 http://markmail.org/message/ts65a6jok3ii6nva#query:+page:1+mid:qinymqdn6mkocv4k
 
  Both SSE and ICS are 3 letter word and both are not part of English
  language.
  SEE* works 

Re: Wild card search does not return any result

2009-08-05 Thread Mohamed Parvez
looks like earlier schema.xml, has some typo.
below is the correct schema.xml

3] schema.xml
..
..
field name=ID type=float indexed=true stored=true /
field name=BUS type=text indexed=true stored=true/
field name=ROLE type=text indexed=true stored=true /
field name=SPELL type=textSpell indexed=true stored=true
multiValued=true/
copyField source=BUS dest=SPELL /
..
..
fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType



Thanks/Regards,
Parvez



On Wed, Aug 5, 2009 at 10:53 AM, Mohamed Parvez par...@gmail.com wrote:

 Thanks Otis and Avlesh,

 Below is the configuration I have

 1] solrconfig.xml
 
 .
   requestHandler name=standard class=solr.SearchHandler
 default=true
  lst name=defaults
str name=echoParamsexplicit/str
   str name=spellcheck.onlyMorePopularfalse/str
   str name=spellcheck.extendedResultsfalse/str
   str name=spellcheck.count1/str
 /lst
  arr name=last-components
   strspellcheck/str
 /arr
   /requestHandler
 .
 .
   requestHandler name=/dataimport
 class=org.apache.solr.handler.dataimport.DataImportHandler
 lst name=defaults
   str name=configdata-import.xml/str
 /lst
   /requestHandler
 ..
 ..
   searchComponent name=spellcheck class=solr.SpellCheckComponent
 str name=queryAnalyzerFieldTypetextSpell/str
 lst name=spellchecker
   str name=namedefault/str
   str name=fieldSPELL/str
   str name=spellcheckIndexDir./spellcheckerIndex/str
   str name=buildOnCommittrue/str
   str name=buildOnOptimizetrue/str
 /lst
   /searchComponent

 2] data-import.xml

 .
 ..
 document name=doc
 entity name=user pk=ID
   query=select * from user
 field column=ROLE name=ROLE /
 field column=ID name=ID /
 field column=BUS name=BUS /
 .
 .

 3] schema.xml
 ..
 ..
 field name=ID type=float indexed=true stored=true /
 field name=BUS type=text indexed=true stored=true/
 field name=ROLE type=text indexed=true stored=true /
  ..
 ..
 field name=ID type=float indexed=true stored=true /
 field name=BUS type=text indexed=true stored=true/
 field name=ROLE type=text indexed=true stored=true /
 field name=SPELL type=textSpell indexed=true stored=true
 multiValued=true/
 copyField source=BUS dest=SPELL /
 ..
 ..
 fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType
 
 


 To make it 

Re: THIS WEEK: PNW Hadoop, HBase / Apache Cloud Stack Users' Meeting, Wed Jul 29th, Seattle

2009-08-05 Thread Bradford Stephens
A big thanks to everyone who came out despite the heat! Hope to see
you again the last week of August, probably at UW.

On Wed, Jul 29, 2009 at 4:52 PM, Bradford
Stephensbradfordsteph...@gmail.com wrote:
 Don't forget this is tonight! Excited to see everyone there.

 On Tue, Jul 28, 2009 at 11:25 AM, Bradford
 Stephensbradfordsteph...@gmail.com wrote:
 Hey everyone,

 SLIGHT change of plans.

 A few people have asked me to move to a place with Air Conditioning,
 since the temperature's in the 90's this week. So, here we go:

 Big Time Brewing Company
 4133 University Way NE
 Seattle, WA 98105

 Call me at 904-415-3009 if you have any questions.


 On Mon, Jul 27, 2009 at 12:16 PM, Bradford
 Stephensbradfordsteph...@gmail.com wrote:
 Hello again!

 Yes, I know some of us are still recovering from OSCON. It's time for
 another delicious meetup to chat about Hadoop, HBase, Solr, Lucene,
 and more!

 UW is quite a pain for us to access until August, so we're changing
 the venue to one pretty close:

 Piccolo's Pizza
 5301 Roosevelt Way NE
 (between 53rd St  55th St)

 6:45pm - 8:30 (or when we get bored)!

 As usual, people are more than welcome to give talks, whether they're
 long-format or lightning. I'd also really like to start thinking about
 hackathons, perhaps we could have one next month?

 I'll be talking about HBase .20 and the possibility of low-latency
 HBase Analytics. I'd be very excited to hear what people are up to!

 Contact me if there's any questions: 904-415-3009

 Cheers,
 Bradford

 --
 http://www.roadtofailure.com -- The Fringes of Scalability, Social
 Media, and Computer Science




 --
 http://www.roadtofailure.com -- The Fringes of Scalability, Social
 Media, and Computer Science




 --
 http://www.roadtofailure.com -- The Fringes of Scalability, Social
 Media, and Computer Science




-- 
http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi ,

We are planning to use Solr for indexing the server log contents.
The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).

Can any one provide what would be the optimal size of the index that I can 
store on a single server, without hampering the search performance etc.

We are planning to use OSX server with a configuration of 16 GB (Can go to 24 
GB).

We need to figure out how many servers are required to handle such amount of 
data..

Any help would be greatly appreciated.

Thanks
SilentSurfer


  



Re: sole 1.3: bug in phps response writer

2009-08-05 Thread Poohneat

Hey Otis, 
I don't think this issue has been solved yet. I am working with Solr 1.3
release and yet i get the same exception as the original post. 
I have Solr 1.3 release with the localsolr jars. 

Any advice is helpful ... for now i will use the json response writer and
work around this bug. 

Thanks 
--
take care


Otis Gospodnetic wrote:
 
 Hi Alok,
 
 I don't think it's a known issue and 2. a) sounds like the best and most
 appreciated approach! :)
 
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
 
 
 From: Alok Dhir ad...@symplicity.com
 To: solr-user@lucene.apache.org
 Sent: Monday, November 17, 2008 12:36:25 PM
 Subject: sole 1.3: bug in phps response writer
 
 Distributed queries:
 
 curl
 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=php'
 
 curl
 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=xml
 
 curl
 'http://devxen0:8983/solr/core0/select?shards=search3:0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=json''
 
 All work fine, providing identical results in their respective formats
 (note the change in the wt param).
 
 curl
 'http://devxen0:8983/solr/core0/select?shards=search3:8983/solr/core0,search3:8983/solr/core2version=2.2start=0rows=10q=instance%3Arit%5C-csm.symplicity.com+AND+label%3ALoginwt=phps'
 
 fails with:
 
 java.lang.IllegalArgumentException: Map size must not be negative
 at
 org.apache.solr.request.PHPSerializedWriter.writeMapOpener(PHPSerializedResponseWriter.java:195)
 at
 org.apache.solr.request.JSONWriter.writeSolrDocument(JSONResponseWriter.java:392)
 at
 org.apache.solr.request.JSONWriter.writeSolrDocumentList(JSONResponseWriter.java:547)
 at
 org.apache.solr.request.TextResponseWriter.writeVal(TextResponseWriter.java:147)
 at
 org.apache.solr.request.JSONWriter.writeNamedListAsMapMangled(JSONResponseWriter.java:150)
 at
 org.apache.solr.request.PHPSerializedWriter.writeNamedList(PHPSerializedResponseWriter.java:71)
 at
 org.apache.solr.request.PHPSerializedWriter.writeResponse(PHPSerializedResponseWriter.java:66)
 at
 org.apache.solr.request.PHPSerializedResponseWriter.write(PHPSerializedResponseWriter.java:47)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
 at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
 at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
 at
 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 
 Questions:
 
 1) Is this known?  I didn't see it in the issue treacker.
 
 2) What's the better course of action: a) download source, fix, submit
 patch, wait for new relase; b) drop phps and use json instead?
 
 Thanks
 

-- 
View this message in context: 
http://www.nabble.com/sole-1.3%3A-bug-in-phps-response-writer-tp20544146p24834570.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Limit of Index size per machine..

2009-08-05 Thread Ian Connor
I try to keep the index directory size less than the amount of RAM and rely
on the OS to cache as it needs. Linux does a pretty good job here and I am
sure OS X will do a good job also.

Distributed search here will be your friend so you can chunk it up to a
number of servers to keep your cost down (2GB RAM sticks are much cheaper
than 4GB RAM sticks $20  $100).

Ian.

On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.comwrote:


 Hi ,

 We are planning to use Solr for indexing the server log contents.
 The expected processed log file size per day: 100 GB
 We are expecting to retain these indexes for 30 days (100*30 ~ 3 TB).

 Can any one provide what would be the optimal size of the index that I can
 store on a single server, without hampering the search performance etc.

 We are planning to use OSX server with a configuration of 16 GB (Can go to
 24 GB).

 We need to figure out how many servers are required to handle such amount
 of data..

 Any help would be greatly appreciated.

 Thanks
 SilentSurfer







-- 
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor


RE: 99.9% uptime requirement

2009-08-05 Thread Robert Petersen
Maintenance Questions:  In a two slave one master setup where the two
slaves are behind load balancers what happens if I have to restart solr?
If I have to restart solr say for a schema update where I have added a
new field then what is the recommended procedure?

If I can guarantee no commits or optimizes happen on the master during
the schema update so no new snapshots become available then can I safely
leave rsyncd enabled?  When I stop and start a slave server, should I
first pull it out of the load balancers list or will solr gracefully
release connections as it shuts down so no searches are lost?

What do you guys do to push out updates?

Thanks for any thoughts,
Robi


-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Tuesday, August 04, 2009 8:57 AM
To: solr-user@lucene.apache.org
Subject: Re: 99.9% uptime requirement

Right. You don't get to 99.9% by assuming that an 8 hour outage is OK.  
Design for continuous uptime, with plans for how long it takes to  
patch around a single point of failure. For example, if your load  
balancer is a single point of failure, make sure that you can redirect  
the front end servers to a single Solr server in much less than 8 hours.

Also, think about your SLA. Can the search index be more than 8 hours  
stale? How quickly do you need to be able to replace a failed indexing  
server? You might be able to run indexing locally on each search  
server if they are lightly loaded.

wunder

On Aug 4, 2009, at 7:11 AM, Norberto Meijome wrote:

 On Mon, 3 Aug 2009 13:15:44 -0700
 Robert Petersen rober...@buy.com wrote:

 Thanks all, I figured there would be more talk about daemontools if  
 there
 were really a need.  I appreciate the input and for starters we'll  
 put two
 slaves behind a load balancer and grow it from there.


 Robert,
 not taking away from daemon tools, but daemon tools won't help you  
 if your
 whole server goes down.

 don't put all your eggs in one basket - several
 servers, load balancer (hardware load balancers x 2, haproxy, etc)

 and sure, use daemon tools to keep your services running within each  
 server...

 B
 _
 {Beto|Norberto|Numard} Meijome

 Why do you sit there looking like an envelope without any address  
 on it?
  Mark Twain

 I speak for myself, not my employer. Contents may be hot. Slippery  
 when wet.
 Reading disclaimers makes you go blind. Writing them is worse. You  
 have been
 Warned.




enablereplication does not work

2009-08-05 Thread solr jay
Hi,

http://localhost:8549/solr/replication?command=enablereplication

does not seem working. After making the request, I run

http://localhost:8549/solr/replication?command=indexversion

and here is the response:


response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
long name=indexversion0/long
long name=generation0/long
/response


Notice the indexversion is 0, which is the value after you disable
replication. On the other hand

http://localhost:8549/solr/replication?command=details

returns:


response

lst name=responseHeader
int name=status0/int
int name=QTime7/int
/lst

lst name=details
str name=indexSize692 bytes/str

str name=indexPath
  /tmp/solr/solrdata/index
/str
arr name=commits/
str name=isMastertrue/str
str name=isSlavefalse/str
long name=indexVersion1249517184279/long
long name=generation2/long

lst name=master
str name=replicateAftercommit/str
/lst
/lst

str name=WARNING
This response format is experimental.  It is likely to change in the future.
/str
/response


Notice that the indexversion is 1249517184279.

thanks,

-- 
J


Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

That means we need approximately 3000 GB (Index Size)/24 GB (RAM) = 125 
servers. 

It would be very hard to convince my org to go for 125 servers for log 
management of 3 Terabytes of indexes. 

Has any one used, solr for processing and handling of the indexes of the order 
of 3 TB ? If so how many servers were used for indexing alone.

Thanks,
sS


--- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com wrote:

 From: Ian Connor ian.con...@gmail.com
 Subject: Re: Limit of Index size per machine..
 To: solr-user@lucene.apache.org
 Date: Wednesday, August 5, 2009, 9:38 PM
 I try to keep the index directory
 size less than the amount of RAM and rely
 on the OS to cache as it needs. Linux does a pretty good
 job here and I am
 sure OS X will do a good job also.
 
 Distributed search here will be your friend so you can
 chunk it up to a
 number of servers to keep your cost down (2GB RAM sticks
 are much cheaper
 than 4GB RAM sticks $20  $100).
 
 Ian.
 
 On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.comwrote:
 
 
  Hi ,
 
  We are planning to use Solr for indexing the server
 log contents.
  The expected processed log file size per day: 100 GB
  We are expecting to retain these indexes for 30 days
 (100*30 ~ 3 TB).
 
  Can any one provide what would be the optimal size of
 the index that I can
  store on a single server, without hampering the search
 performance etc.
 
  We are planning to use OSX server with a configuration
 of 16 GB (Can go to
  24 GB).
 
  We need to figure out how many servers are required to
 handle such amount
  of data..
 
  Any help would be greatly appreciated.
 
  Thanks
  SilentSurfer
 
 
 
 
 
 
 
 -- 
 Regards,
 
 Ian Connor
 1 Leighton St #723
 Cambridge, MA 02141
 Call Center Phone: +1 (714) 239 3875 (24 hrs)
 Fax: +1(770) 818 5697
 Skype: ian.connor
 


  



Re: Limit of Index size per machine..

2009-08-05 Thread Walter Underwood
That is why people don't use search engines to manage logs. Look at a  
Hadoop cluster.


wunder

On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote:



Hi,

That means we need approximately 3000 GB (Index Size)/24 GB (RAM) =  
125 servers.


It would be very hard to convince my org to go for 125 servers for  
log management of 3 Terabytes of indexes.


Has any one used, solr for processing and handling of the indexes of  
the order of 3 TB ? If so how many servers were used for indexing  
alone.


Thanks,
sS


--- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com wrote:


From: Ian Connor ian.con...@gmail.com
Subject: Re: Limit of Index size per machine..
To: solr-user@lucene.apache.org
Date: Wednesday, August 5, 2009, 9:38 PM
I try to keep the index directory
size less than the amount of RAM and rely
on the OS to cache as it needs. Linux does a pretty good
job here and I am
sure OS X will do a good job also.

Distributed search here will be your friend so you can
chunk it up to a
number of servers to keep your cost down (2GB RAM sticks
are much cheaper
than 4GB RAM sticks $20  $100).

Ian.

On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer silentsurfe...@yahoo.com 
wrote:




Hi ,

We are planning to use Solr for indexing the server

log contents.

The expected processed log file size per day: 100 GB
We are expecting to retain these indexes for 30 days

(100*30 ~ 3 TB).


Can any one provide what would be the optimal size of

the index that I can

store on a single server, without hampering the search

performance etc.


We are planning to use OSX server with a configuration

of 16 GB (Can go to

24 GB).

We need to figure out how many servers are required to

handle such amount

of data..

Any help would be greatly appreciated.

Thanks
SilentSurfer








--
Regards,

Ian Connor
1 Leighton St #723
Cambridge, MA 02141
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Fax: +1(770) 818 5697
Skype: ian.connor










Re: Limit of Index size per machine..

2009-08-05 Thread Silent Surfer

Hi,

We initially went with Hadoop path, but as it is one more software based file 
system on top of the OS file system, we didn't get a buy in from our system 
Engineers. i.e In case if we run into any HDFS issues, SEs won't be supporting 
us :(

Regards,
sS

--- On Thu, 8/6/09, Walter Underwood wun...@wunderwood.org wrote:

 From: Walter Underwood wun...@wunderwood.org
 Subject: Re: Limit of Index size per machine..
 To: solr-user@lucene.apache.org
 Date: Thursday, August 6, 2009, 5:12 AM
 That is why people don't use search
 engines to manage logs. Look at a  
 Hadoop cluster.
 
 wunder
 
 On Aug 5, 2009, at 10:08 PM, Silent Surfer wrote:
 
 
  Hi,
 
  That means we need approximately 3000 GB (Index
 Size)/24 GB (RAM) =  
  125 servers.
 
  It would be very hard to convince my org to go for 125
 servers for  
  log management of 3 Terabytes of indexes.
 
  Has any one used, solr for processing and handling of
 the indexes of  
  the order of 3 TB ? If so how many servers were used
 for indexing  
  alone.
 
  Thanks,
  sS
 
 
  --- On Wed, 8/5/09, Ian Connor ian.con...@gmail.com
 wrote:
 
  From: Ian Connor ian.con...@gmail.com
  Subject: Re: Limit of Index size per machine..
  To: solr-user@lucene.apache.org
  Date: Wednesday, August 5, 2009, 9:38 PM
  I try to keep the index directory
  size less than the amount of RAM and rely
  on the OS to cache as it needs. Linux does a
 pretty good
  job here and I am
  sure OS X will do a good job also.
 
  Distributed search here will be your friend so you
 can
  chunk it up to a
  number of servers to keep your cost down (2GB RAM
 sticks
  are much cheaper
  than 4GB RAM sticks $20  $100).
 
  Ian.
 
  On Wed, Aug 5, 2009 at 1:44 PM, Silent Surfer
 silentsurfe...@yahoo.com
 
  wrote:
 
 
  Hi ,
 
  We are planning to use Solr for indexing the
 server
  log contents.
  The expected processed log file size per day:
 100 GB
  We are expecting to retain these indexes for
 30 days
  (100*30 ~ 3 TB).
 
  Can any one provide what would be the optimal
 size of
  the index that I can
  store on a single server, without hampering
 the search
  performance etc.
 
  We are planning to use OSX server with a
 configuration
  of 16 GB (Can go to
  24 GB).
 
  We need to figure out how many servers are
 required to
  handle such amount
  of data..
 
  Any help would be greatly appreciated.
 
  Thanks
  SilentSurfer
 
 
 
 
 
 
 
  -- 
  Regards,
 
  Ian Connor
  1 Leighton St #723
  Cambridge, MA 02141
  Call Center Phone: +1 (714) 239 3875 (24 hrs)
  Fax: +1(770) 818 5697
  Skype: ian.connor
 
 
 
 
 
 








Re: enablereplication does not work

2009-08-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
how is the replicationhandler configured? if there was no
commit/optimize thhen it would show the version as '0'

On Thu, Aug 6, 2009 at 5:50 AM, solr jaysolr...@gmail.com wrote:
 Hi,

 http://localhost:8549/solr/replication?command=enablereplication

 does not seem working. After making the request, I run

 http://localhost:8549/solr/replication?command=indexversion

 and here is the response:


 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 /lst
 long name=indexversion0/long
 long name=generation0/long
 /response


 Notice the indexversion is 0, which is the value after you disable
 replication. On the other hand

 http://localhost:8549/solr/replication?command=details

 returns:


 response

 lst name=responseHeader
 int name=status0/int
 int name=QTime7/int
 /lst

 lst name=details
 str name=indexSize692 bytes/str

 str name=indexPath
  /tmp/solr/solrdata/index
 /str
 arr name=commits/
 str name=isMastertrue/str
 str name=isSlavefalse/str
 long name=indexVersion1249517184279/long
 long name=generation2/long

 lst name=master
 str name=replicateAftercommit/str
 /lst
 /lst

 str name=WARNING
 This response format is experimental.  It is likely to change in the future.
 /str
 /response


 Notice that the indexversion is 1249517184279.

 thanks,

 --
 J




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com