What version of Lucene.Net are you observing this on?

I know I had difficulty doing a clean port in this area of the code for 2.0
and even 2.1 (it's still the same for 2.3.1).

-- George 

> -----Original Message-----
> From: Nitin Shiralkar [mailto:nit...@coreobjects.com] 
> Sent: Tuesday, January 13, 2009 8:46 AM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> Hi George,
> 
> Thanks. But the basic reason for junk files is optimize only. 
> When you set compound index flag to true to have single 
> segment file, then lucene tries to merge all segments and 
> deletes the older ones. However if the older ones are being 
> accessed in parallel, then delete operation fails. This is 
> tracked in lucene through "deletable" and should be cleaned 
> when we open index next time. However in some cases the files 
> remain as unused and no longer referenced in lucene.
> 
> This is a rare scenario and files are created over a period 
> of two years.
> 
> -----Original Message-----
> From: George Aroush [mailto:geo...@aroush.net]
> Sent: Tuesday, January 13, 2009 6:54 PM
> To: lucene-net-user@incubator.apache.org
> Subject: RE: Lucene Scalability Options
> 
> There is.  Call the Optimize() function on the index.
> 
> You should never delete index files manually unless if you 
> know what you are doing otherwise you can corrupt / destroy 
> your index.
> 
> -- George
> 
> > -----Original Message-----
> > From: Nic Wise [mailto:nic.w...@bbc.com]
> > Sent: Tuesday, January 13, 2009 6:36 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > I'm SURE there is a cleaner way, but in the past, we read 
> the segments 
> > file (manually :( ), and any file which wasn't listed in there was 
> > considered to be a redundant file.
> >
> > Worked for us. There may be a way to ask a IndexReader which files 
> > it's using, and then extrapolate from there, but we were using 
> > Lucene.net 1.something, which didn't.
> >
> > I think that's what luke does. Opens the index, asks Lucene 
> whats it's 
> > using, kills everything else.
> >
> > -----Original Message-----
> > From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
> > Sent: 13 January 2009 11:26
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> > Hi All,
> >
> > I have started this thread for Lucene scalability aspect. I have an 
> > index with 80 GB size. However it looks like many of the 
> segment files 
> > are either redundant or unused. Even if I delete them and 
> just retain 
> > CFS, segments and deletable files, the index seems to be 
> working fine.
> > However I want to know more cleaner approach to identify such 
> > redundant/unused files through APIs. I am able to see these unused 
> > files in Luke as "Deletable". However I am not sure how 
> Luke is able 
> > to identify unused files. I am using Lucene.NET 2.0 version.
> >
> > Can you please suggest some way?
> >
> >
> >
> > -----Original Message-----
> > From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com]
> > Sent: Tuesday, January 13, 2009 1:01 AM
> > To: lucene-net-user@incubator.apache.org
> > Subject: RE: Lucene Scalability Options
> >
> >
> > Floyd, you will need to provide more details about the specific 
> > problems you are encountering.
> >
> > I made a quick check, and have no difficulty opening and 
> inspecting an 
> > index I created a few minutes ago with Lucene.NET v2.3.1 using Luke 
> > v0.9.1.
> >
> > -- Neal
> >
> >
> > -----Original Message-----
> > From: Floyd Wu [mailto:floyd...@gmail.com]
> > Sent: Friday, January 09, 2009 8:18 PM
> > To: lucene-net-user@incubator.apache.org
> > Subject: Re: Lucene Scalability Options
> >
> > Hi all,
> > It seems new version of Luke is not compitable with Lucene.net and 
> > I've email to the creator of Luke. Below is feedback from him
> >
> > "Yes, there have been many changes,
> > but Lucene 2.4 can still open indexes built with earlier 
> versions of 
> > Lucene/Java.
> > This is the second report I've got about the possible 
> incompatibility 
> > with Lucene.Net - I suggest to raise up this issue on the Lucene 
> > mailing list ( java-...@lucene.apache.org), and provide 
> more details, 
> > eg.
> > Lucene.Net revision, stack trace, a small sample index if you can."
> >
> > My original report as below
> > "The situation is Luke-0.9 can not open the index files 
> which built by 
> > Lucene.Net-2.3.1.
> > I tried to use older version of Luke and confirm Luke-0.8 and
> > Luke-0.8.1 can open and read index files fine.
> >  I wonder if there is any change between java Lucene 2.3 and 2.4.
> > Please help on this."
> >
> > Floyd
> >
> >
> >
> > 2009/1/9 George Aroush <geo...@aroush.net>
> >
> > > Hi Nitin,
> > >
> > > Any optimization that Luke can do on an index is also
> > doable by making
> > API
> > > calls from Lucene.Net.  If not, then there is either a bug in
> > Lucene.Net or
> > > in your use of the API.  Can you share with us your API
> > calls as well
> > as
> > > the
> > > Lucene.Net version you are using?
> > >
> > > Thanks.
> > >
> > > -- George
> > >
> > > > -----Original Message-----
> > > > From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
> > >  > Sent: Friday, January 09, 2009 6:27 AM
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > Thanks Hugh. Yes, I tried using Luke for index optimization.
> > > > Surprisingly, it has brought down the index size to ~20
> > GB with only
> > > > one CFS and segment files left behind. I used compound
> > optimization
> > > > option. But I use the similar "SetUseCompoundFile" property on 
> > > > "IndexModifier" object in my Lucene.NET code, but it has
> > no effect
> > > > on size or files after optimization. Any suggestions??
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Hugh Spiller [mailto:hugh.spil...@renishaw.com]
> > > > Sent: Friday, January 09, 2009 3:35 PM
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > Hi Nitin,
> > > >
> > > > I've found the easiest way to get rid of redundant files
> > in an index
> > > > is to use Luke. As soon as you use it to open the index,
> > it tidies
> > > > up all the cruft.
> > > >
> > > > It's at http://www.getopt.org/luke/ .
> > > >
> > > > ________________________________
> > > >
> > > > Hugh Spiller
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Nitin Shiralkar [mailto:nit...@coreobjects.com]
> > > > Sent: 09 January 2009 08:48
> > > > To: lucene-net-user@incubator.apache.org
> > > > Subject: RE: Lucene Scalability Options
> > > >
> > > > -- snip --
> > > >
> > > >
> > > > Any inputs on junk/redundant files in above list?
> > > >
> > > >
> > > >
> > > > --------------------------------------------------------------
> > > > ------------------------------------
> > > > This email and any attachments are confidential and are
> > for the use
> > > > of the addressee only. If you are not the addressee, 
> you must not 
> > > > use or disclose the contents to any other person. Please
> > immediately
> > > > notify the sender and delete the email. Statements and opinions 
> > > > expressed here may not represent those of the company. Email 
> > > > correspondence is monitored by the company. This
> > information may be
> > > > subject to Export Control Regulation. You are obliged to
> > comply with
> > > > such Regulations
> > > >
> > > > The parent company of the Renishaw Group is Renishaw plc,
> > registered
> > > > in England no. 1106260. Registered Office: New Mills, 
> > > > Wotton-under-Edge, Gloucestershire, GL12 8JR, United 
> Kingdom. Tel
> > > > +44 (0) 1453 524524
> > > > --------------------------------------------------------------
> > > > ------------------------------------
> > > >
> > >
> > >
> > This e-mail (and any attachments) is confidential and may contain 
> > personal views which are not the views of the BBC unless 
> specifically 
> > stated. If you have received it in error, please delete it 
> from your 
> > system. Do not use, copy or disclose the information in any way nor 
> > act in reliance on it and notify the sender immediately.
> >
> > Please note that the BBC monitors e-mails sent or received.
> > Further communication will signify your consent to this
> >
> > This e-mail has been sent by one of the following wholly-owned 
> > subsidiaries of the BBC:
> >
> > BBC Worldwide Limited, Registration Number: 1420028 England, 
> > Registered Address: BBC Media Centre, 201 Wood Lane, London,
> > W12 7TQ BBC World News Limited, Registration Number: 
> 04514407 England, 
> > Registered Address: Woodlands, BBC Media Centre, 201 Wood Lane, 
> > London, W12 7TQ BBC World Distribution Limited, 
> Registration Number: 
> > 04514408, Registered Address: Woodlands, BBC Media Centre, 201 Wood 
> > Lane, London, W12 7TQ
> >
> 

Reply via email to