date:20100217

Re: Upgrading Tika in Solr

2010-02-17 Thread Liam O'Boyle

I just copied in the newer .jars and got rid of the old ones and
everything seemed to work smoothly enough.

Liam

On Tue, 2010-02-16 at 13:11 -0500, Grant Ingersoll wrote:
> I've got a task open to upgrade to 0.6.  Will try to get to it this week.  
> Upgrading is usually pretty trivial.
> 
> 
> On Feb 14, 2010, at 12:37 AM, Liam O'Boyle wrote:
> 
> > Afternoon,
> > 
> > I've got a large collections of documents which I'm attempting to add to
> > a Solr index using Tika via the ExtractingRequestHandler, but there are
> > a large number that it has problems with (PDFs, PPTX and XLS documents
> > mainly).  
> > 
> > I've tried them with the most recent stand alone version of Tika and it
> > handles most of the failing documents correctly.  I tried using a recent
> > nightly build of Solr, but the same problems seem to occur.
> > 
> > Are there instructions somewhere on installing a more recent Tika build
> > into Solr?
> > 
> > Thanks,
> > Liam
> > 
> > 
> 
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem using Solr/Lucene: 
> http://www.lucidimagination.com/search
>

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread Ron Chan


probably not 

if there is no need to embed or programmatically start and stop the server then 
Tomcat would be the safe choice, probably easier to get going with to start 
with and you'll find a lot more information about it 

- Original Message - 
From: "Steve Radhouani"  
To: solr-user@lucene.apache.org 
Sent: Wednesday, 17 February, 2010 7:24:01 AM 
Subject: Re: Tomcat vs Jetty: A Comparative Analysis? 

Thanks Ron. Actually, I'm developing a Web search engine. Would that 
matter? 

Thanks. 

2010/2/16 Ron Chan  

> 
> I'd doubt if a performance benchmark would be very useful, it ultimately 
> depends on what you are trying to do and what you are comfortable with. 
> 
> We've had successful deployments on both. 
> 
> Any difference in performance is far outweighed by ease of setup/support 
> that you personally find in each. 
> 
> There is far more "knowledge" around Tomcat, but Jetty is more lightweight 
> and real easy to embed. 
> 
> Ron 
> 
> - Original Message - 
> From: "Steve Radhouani"  
> To: solr-user@lucene.apache.org 
> Sent: Tuesday, 16 February, 2010 12:38:04 PM 
> Subject: Tomcat vs Jetty: A Comparative Analysis? 
> 
> Hi there, 
> 
> Is there any analysis out there that may help to choose between Tomcat and 
> Jetty to deploy Solr? I wonder wether there's a significant difference 
> between them in terms of performance. 
> 
> Any advice would be much appreciated, 
> -Steve 
>

Re: Tomcat vs Jetty: A Comparative Analysis?

2010-02-17 Thread Steve Radhouani

Thanks a lot Ron!

2010/2/17 Ron Chan 

>
> probably not
>
> if there is no need to embed or programmatically start and stop the server
> then Tomcat would be the safe choice, probably easier to get going with to
> start with and you'll find a lot more information about it
>
> - Original Message -
> From: "Steve Radhouani" 
> To: solr-user@lucene.apache.org
> Sent: Wednesday, 17 February, 2010 7:24:01 AM
> Subject: Re: Tomcat vs Jetty: A Comparative Analysis?
>
> Thanks Ron. Actually, I'm developing a Web search engine. Would that
> matter?
>
> Thanks.
>
> 2010/2/16 Ron Chan 
>
> >
> > I'd doubt if a performance benchmark would be very useful, it ultimately
> > depends on what you are trying to do and what you are comfortable with.
> >
> > We've had successful deployments on both.
> >
> > Any difference in performance is far outweighed by ease of setup/support
> > that you personally find in each.
> >
> > There is far more "knowledge" around Tomcat, but Jetty is more
> lightweight
> > and real easy to embed.
> >
> > Ron
> >
> > - Original Message -
> > From: "Steve Radhouani" 
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, 16 February, 2010 12:38:04 PM
> > Subject: Tomcat vs Jetty: A Comparative Analysis?
> >
> > Hi there,
> >
> > Is there any analysis out there that may help to choose between Tomcat
> and
> > Jetty to deploy Solr? I wonder wether there's a significant difference
> > between them in terms of performance.
> >
> > Any advice would be much appreciated,
> > -Steve
> >
>

Incremental Backup of Indexes

2010-02-17 Thread abhishes


Hello All,

If we have very large index size, how can I back up incrementally. (one full
backup followed by multiple incremental backups).

How do I take compressed backups?


Do I have roll out the backup infrastructure manually? or is there something
pre-built?

-- 
View this message in context: 
http://old.nabble.com/Incremental-Backup-of-Indexes-tp27621757p27621757.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: dataimporthandler and expungeDeletes=false

2010-02-17 Thread Jorg Heymans

Looking closer at the documentation, it appears that expungeDeletes in fact
has nothing to do with 'removing deleted documents from the index' as i
thought before:

http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22


expungeDeletes = "true" | "false" — default is false — merge segments with
deletes away.

Is this correct ?

FWIW I worked around the issue by adding a removed flag to my data and
sending   and  commands after delta import but it would have
been so much nicer to be able to do this all from DIH.

Has anybody been able to get deletedPkQuery to work for deleting documents
during delta import ?

Jorg

On Tue, Feb 16, 2010 at 3:57 PM, Jorg Heymans wrote:

> Hi,
>
> Can anybody tell me if [1] still applies as of version trunk 03/02/2010 ? I
> am removing documents from my index using deletedPkQuery and a deltaimport.
> I can tell from the logs that the removal seems to be working:
>
> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder
> collectDelta
> INFO: Completed parentDeltaQuery for Entity: attachment
> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder
> deleteAll
> INFO: Deleting stale documents
> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.SolrWriter
> deleteDoc
> INFO: Deleting document: 33053
> 16-Feb-2010 15:32:54 org.apache.solr.core.SolrDeletionPolicy onInit
> INFO: SolrDeletionPolicy.onInit: commits:num=1
>
>  
> commit{dir=D:\lib\apache-solr-1.5-dev\example\solr\project\data\index,segFN=segments_1y,version=1265210107838,generation=70,filenames=[_2v.prx,
> _2v.fnm, _2v.tis, _2v.fdt, _2v.frq, segments_1y, _2v.fdx, _2v.tii]
> 16-Feb-2010 15:32:54 org.apache.solr.core.SolrDeletionPolicy updateCommits
> INFO: newest commit = 1265210107838
> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder doDelta
> INFO: Delta Import completed successfully
> 16-Feb-2010 15:32:54 org.apache.solr.handler.dataimport.DocBuilder finish
> INFO: Import completed successfully
> 16-Feb-2010 15:32:54 org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: start
> commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false)
> 16-Feb-2010 15:32:54 org.apache.solr.search.SolrIndexSearcher 
> INFO: Opening searc...@182c2d9 main
> 16-Feb-2010 15:32:54 org.apache.solr.update.DirectUpdateHandler2 commit
> INFO: end_commit_flush
>
> However when i search the index the removed data is still present,
> presumably because the DirectUpdateHandler2 does not automatically do
> expungeDeletes ? Can i configure this somewhere in solrconfig.xml (SOLR-1275
> was not very clear exactly what needs to be done to activate this behaviour)
> ?
>
> Thanks
> Jorg
>
> [1] http://marc.info/?l=solr-user&m=125962049425151&w=2
>

Re: Incremental Backup of Indexes

2010-02-17 Thread Jay Ess


abhishes wrote:

Hello All,

If we have very large index size, how can I back up incrementally. (one full
backup followed by multiple incremental backups).

How do I take compressed backups?
  

http://rsnapshot.org/

Hi, I'm having a strange problem when indexing data through our application. Whenever I post something to the update resource, I get Unexpected character 'a' (code 97) in prolog; expected '<' at [row,col {unknown-source}]: [1,1], Error 400 Unexpected character 'a' (code 97) in prolog; expected '<' at [row,col {unknown-source}]: [1,1] HTTP ERROR 400 Problem accessing /solr/update. Reason: Unexpected character 'a' (code 97) in prolog; expected '<' at [row,col {unknown-source}]: [1,1]Powered by Jetty:// However, when I post the same data from an xml file using curl it works. The add command looks like this: 145405329411702010-02-16T15:30:02Z02010-02-16T15:30:02Z2019-12-31T00:00:00Z0145-4053294«Positives Gespräch» zwischen Bielefeld und DFL«Positives Gespräch» zwischen Bielefeld und DFLBielefeld (dpa) - Der finanziell angeschlagene Zweitligist Arminia Bielefeld hat der Deutschen Fußball Liga in Frankfurt/Main einen Maßnahmen-Katalog präsentiert.

Bielefeld (dpa) - Der finanziell angeschlagene Zweitligist Arminia Bielefeld hat der Deutschen Fußball Liga in Frankfurt/Main einen Maßnahmen-Katalog präsentiert.

«Daran arbeiten wir derzeit mit Hochdruck», teilte Arminia-Geschäftsführer Heinz Anders mit. Die Arminia-Delegation, zu der noch Manager Detlev Dammeier, Aufsichtsratschef Norbert Leopoldseder und Finanz-Prokurist Henrik Wiehl gehörten, habe die Lage vor den DFL-Vertretern laut Anders «offen und transparent» analysiert. Es sei ein «sehr positives Gespräch gewesen». Die nicht näher erläuterten Maßnahmen müssten nun umgesetzt und bei der DFL entsprechend nachgewiesen werden.

Die DFL kommentierte das Zusammentreffen in ihrer Frankfurter Zentrale nicht. «Zu solchen Dinge äußern wir uns nicht», erklärte ein Sprecher auf Anfrage der Deutschen Presse-Agentur dpa.

Der frühere Erstligist Bielefeld hat Verbindlichkeiten und Schulden von rund 15,5 Millionen Euro. Im operativen Geschäft dieser Saison gibt es eine Finanzierungslücke von 2,5 Millionen Euro. Der Club hat sich vor allem mit dem Ausbau und der Modernisierung der SchücoArena übernommen. Zudem ist die Entwicklung bei den Zuschauer-Zahlen und den Sponsorzuwendungen nach dem Bundesliga-Abstieg unerfreulich. Allein für das Stadion sind noch 13 Millionen Euro zu tilgen. Der Verein denkt sogar an einen Verkauf der SchücoArena.

68 matches

Mail list logo