I'm SURE there is a cleaner way, but in the past, we read the segments
file (manually :( ), and any file which wasn't listed in there was
considered to be a redundant file.

Worked for us. There may be a way to ask a IndexReader which files it's
using, and then extrapolate from there, but we were using Lucene.net
1.something, which didn't.

I think that's what luke does. Opens the index, asks Lucene whats it's
using, kills everything else.

-----Original Message-----
From: Nitin Shiralkar [mailto:nit...@coreobjects.com] 
Sent: 13 January 2009 11:26
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options

Hi All,

I have started this thread for Lucene scalability aspect. I have an
index with 80 GB size. However it looks like many of the segment files
are either redundant or unused. Even if I delete them and just retain
CFS, segments and deletable files, the index seems to be working fine.
However I want to know more cleaner approach to identify such
redundant/unused files through APIs. I am able to see these unused files
in Luke as "Deletable". However I am not sure how Luke is able to
identify unused files. I am using Lucene.NET 2.0 version.

Can you please suggest some way?



-----Original Message-----
From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
Sent: Tuesday, January 13, 2009 1:01 AM
To: lucene-net-user@incubator.apache.org
Subject: RE: Lucene Scalability Options


Floyd, you will need to provide more details about the specific problems
you are encountering.

I made a quick check, and have no difficulty opening and inspecting an
index I created a few minutes ago with Lucene.NET v2.3.1 using Luke
v0.9.1.

-- Neal


-----Original Message-----
From: Floyd Wu [mailto:floyd...@gmail.com]
Sent: Friday, January 09, 2009 8:18 PM
To: lucene-net-user@incubator.apache.org
Subject: Re: Lucene Scalability Options

Hi all,
It seems new version of Luke is not compitable with Lucene.net and I've
email to the creator of Luke. Below is feedback from him

"Yes, there have been many changes,
but Lucene 2.4 can still open indexes built with earlier versions of
Lucene/Java.
This is the second report I've got about the possible incompatibility
with
Lucene.Net -
I suggest to raise up this issue on the Lucene mailing list (
java-...@lucene.apache.org),
and provide more details,
eg. Lucene.Net revision, stack trace, a small sample index if you can."

My original report as below
"The situation is Luke-0.9 can not open the index files which built by
Lucene.Net-2.3.1.
I tried to use older version of Luke and confirm Luke-0.8 and Luke-0.8.1
can
open and read index files fine.
 I wonder if there is any change between java Lucene 2.3 and 2.4.
Please help on this."

Floyd



2009/1/9 George Aroush <geo...@aroush.net>

> Hi Nitin,
>
> Any optimization that Luke can do on an index is also doable by making
API
> calls from Lucene.Net.  If not, then there is either a bug in
Lucene.Net or
> in your use of the API.  Can you share with us your API calls as well
as
> the
> Lucene.Net version you are using?
>
> Thanks.
>
> -- George
>
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
>  > Sent: Friday, January 09, 2009 6:27 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > Surprisingly, it has brought down the index size to ~20 GB
> > with only one CFS and segment files left behind. I used
> > compound optimization option. But I use the similar
> > "SetUseCompoundFile" property on "IndexModifier" object in my
> > Lucene.NET code, but it has no effect on size or files after
> > optimization. Any suggestions??
> >
> >
> > -----Original Message-----
> > From: Hugh Spiller [mailto:hugh.spil...@renishaw.com]
> > Sent: Friday, January 09, 2009 3:35 PM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Hi Nitin,
> >
> > I've found the easiest way to get rid of redundant files in
> > an index is to use Luke. As soon as you use it to open the
> > index, it tidies up all the cruft.
> >
> > It's at http://www.getopt.org/luke/ .
> >
> > ________________________________
> >
> > Hugh Spiller
> >
> >
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
> > Sent: 09 January 2009 08:48
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > -- snip --
> >
> >
> > Any inputs on junk/redundant files in above list?
> >
> >
> >
> > --------------------------------------------------------------
> > ------------------------------------
> > This email and any attachments are confidential and are for
> > the use of the addressee only. If you are not the addressee,
> > you must not use or disclose the contents to any other
> > person. Please immediately notify the sender and delete the
> > email. Statements and opinions expressed here may not
> > represent those of the company. Email correspondence is
> > monitored by the company. This information may be subject to
> > Export Control Regulation. You are obliged to comply with
> > such Regulations
> >
> > The parent company of the Renishaw Group is Renishaw plc,
> > registered in England no. 1106260. Registered Office: New
> > Mills, Wotton-under-Edge, Gloucestershire, GL12 8JR, United
> > Kingdom. Tel +44 (0) 1453 524524
> > --------------------------------------------------------------
> > ------------------------------------
> >
>
> 
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated. If you 
have received it in error, please delete it from your system. Do not use, copy 
or disclose the information in any way nor act in reliance on it and notify the 
sender immediately.
 
Please note that the BBC monitors e-mails sent or received. Further 
communication will signify your consent to this

This e-mail has been sent by one of the following wholly-owned subsidiaries of 
the BBC:
 
BBC Worldwide Limited, Registration Number: 1420028 England, Registered 
Address: BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World News Limited, Registration Number: 04514407 England, Registered 
Address: Woodlands, BBC Media Centre, 201 Wood Lane, London, W12 7TQ
BBC World Distribution Limited, Registration Number: 04514408, Registered 
Address: Woodlands, BBC Media Centre, 201 Wood Lane, London, W12 7TQ

Reply via email to