I think, I found the bug. Here is the dump of the original index:
NUMDOCS: 3
MAXDOCS: 7
DELETED(0): True
DELETED(1): True
DELETED(2): False
DELETED(3): True
DELETED(4): True
DELETED(5): False
DELETED(6): False
TERM(0): _l_activationdatetime:552877632000000000
TERM(1): _l_author:admin
TERM(2): _l_bookmarkcount:0
TERM(3): _l_clix:0
TERM(4): _l_clix:1
TERM(5): _l_creationdatetime:633427319866778624
TERM(6): _l_creationdatetime:633427324812559872
TERM(7): _l_creationdatetime:633760609388437504
TERM(8): _l_deactivationdatetime:155377824000000000
TERM(9): _l_deactivationdatetime:155378687999969792
TERM(10): _l_document_class:1
TERM(11): _l_document_class:98305
TERM(12): _l_folder:163841
TERM(13): _l_folder:163843
TERM(14): _l_hidden:aaa
TERM(15): _l_last_modified_datetime:633427319866778624
TERM(16): _l_last_modified_datetime:633427324812559872
TERM(17): _l_last_modified_datetime:633760609388437504
TERM(18): _l_meta:abc
TERM(19): _l_meta:abc.ppt
TERM(20): _l_meta:ddx
TERM(21): _l_meta:doc
TERM(22): _l_meta:xyz
TERM(23): _l_meta:名
TERM(24): _l_meta:問
TERM(25): _l_meta:有
TERM(26): _l_meta:檔
TERM(27): _l_meta:測
TERM(28): _l_meta:看
TERM(29): _l_meta:試
TERM(30): _l_meta:還
TERM(31): _l_meta:題
TERM(32): _l_parentdocument:196609
TERM(33): _l_parentdocument:327681
TERM(34): _l_parentdocument:557057
TERM(35): _l_ratingavg:0
TERM(36): _l_ratingmedian:0
TERM(37): _l_ratingstdev:0
TERM(38): _l_ratingsum:0
TERM(39): _l_read_permission:admin
TERM(40): _l_rootdocument:196609
TERM(41): _l_rootdocument:327681
TERM(42): _l_rootdocument:557057
TERM(43): _l_state:0
TERM(44): _l_state:2
TERM(45): _l_summary:2123456789
TERM(46): _l_summary:abc
TERM(47): _l_summary:abc.ppt
TERM(48): _l_summary:ddx
TERM(49): _l_summary:doc
TERM(50): _l_summary:xyz
TERM(51): _l_summary:有
TERM(52): _l_summary:還
TERM(53): _l_title:123
TERM(54): _l_title:class
TERM(55): _l_title:default
TERM(56): _l_title:document
TERM(57): _l_title:名
TERM(58): _l_title:問
TERM(59): _l_title:檔
TERM(60): _l_title:測
TERM(61): _l_title:看
TERM(62): _l_title:試
TERM(63): _l_title:題
TERM(64): _l_unique_key:196609
TERM(65): _l_unique_key:327681
TERM(66): _l_unique_key:557057
TERM(67): _l_version:1
TERM(68): 作者:123
TERM(69): 摘要:2123456789
TERM(70): 摘要:abc
TERM(71): 摘要:abc.ppt
TERM(72): 摘要:ddx
TERM(73): 摘要:doc
TERM(74): 摘要:xyz
TERM(75): 摘要:有
TERM(76): 摘要:還
TERM(77): 標題:123
TERM(78): 標題:class
TERM(79): 標題:default
TERM(80): 標題:document
TERM(81): 標題:名
TERM(82): 標題:問
TERM(83): 標題:檔
TERM(84): 標題:測
TERM(85): 標題:看
TERM(86): 標題:試
TERM(87): 標題:題
TERM(88): 關鍵詞:123
And here is a sample code: read docs from original index and then write to an
new one.
void CreateNewIndex(string OrgIndex)
{
IndexReader reader = IndexReader.Open(OrgIndex);
IndexWriter writer = new IndexWriter("Floyd", new
Lucene.Net.Analysis.WhitespaceAnalyzer(),true);
for (int i = 0; i < reader.MaxDoc(); i++)
{
if (reader.IsDeleted(i) == true) continue;
Lucene.Net.Documents.Document orgDoc = reader.Document(i);
System.Collections.IList fields = orgDoc.GetFields();
Lucene.Net.Documents.Document newDoc = new Document();
foreach (Lucene.Net.Documents.Field field in fields)
{
Lucene.Net.Documents.Field newField = new Field(
System.Convert.ToBase64String(
System.Text.Encoding.UTF8.GetBytes(field.Name())), //ç
//field.Name(), //ç
field.StringValue(),
field.IsStored() ? Lucene.Net.Documents.Field.Store.YES
: Lucene.Net.Documents.Field.Store.NO,
field.IsTokenized() ?
Lucene.Net.Documents.Field.Index.TOKENIZED :
Lucene.Net.Documents.Field.Index.UN_TOKENIZED);
newDoc.Add(newField);
}
writer.AddDocument(newDoc);
}
writer.Close();
reader.Close();
}
If some field names are chinese, then Luke returns “read past EOF”. But if
those field names are replaced with non-chinese names, then it works.
DIGY
-----Original Message-----
From: Granroth, Neal V. [mailto:[email protected]]
Sent: Friday, April 24, 2009 8:53 PM
To: [email protected]
Subject: Luke-0.9.x cannot open index files
Digy,
Some additional information from the discussion on the lucene-net-user list
with Floyd Wu.
I ran some further tests using Java Lucene 2.3.2 and JDK 1.5.
The Java equivalents of the two small test applications I use to inspect an
index and compact it, function identically to the .NET versions (that were
built with VS2005 and Lucene.NET 2.3.1).
That Luke cannot open the index appears to be a problem within Luke.
Even if Floyd's index contains some odd entries, Java Lucene 2.3.2 does not
flag the index as corrupt; and both the Java and .NET versions report the same
index content before and after the optimize operation.
-- Neal
**************************************************************
Neal Granroth
Software Engineer, Molecular Spectroscopy
Thermo Fisher Scientific
5225 Verona Road, Madison, WI 53711
[email protected]
Tel: 608-276-5645
Fax: 608-276-6328
www.thermofisher.com
WORLDWIDE CONFIDENTIALITY NOTE: Dissemination, distribution or copying of this
e-mail or the information herein by anyone other than the intended recipient,
or an employee or agent of a system responsible for delivering the message to
the intended recipient, is prohibited. If you are not the intended recipient,
please inform the sender and delete all copies.
-----Original Message-----
From: Digy (JIRA) [mailto:[email protected]]
Sent: Wednesday, April 08, 2009 6:28 PM
To: [email protected]
Subject: [jira] Commented: (LUCENENET-169) Changes to make Lucene.NET
compatible with ASP.NET Medium Trust Level, in hosting environments (like
GoDaddy...)
[
https://issues.apache.org/jira/browse/LUCENENET-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697335#action_12697335
]
Digy commented on LUCENENET-169:
--------------------------------
Although you can overcome all of them somehow;
* controlling the the lifetime of IndexWriter/IndexReader in a naturally manner,
* reopening the IndexReader only when needed using (for ex) FileSystemWatcher,
* providing a separation between data & bussiness layer,
* providing other apps an interface that may want to write its own user
interface,
* accessing a single search service from different web apps/from load balanced
web servers
* controlling the lifetime of searching/indexing code (without being effected
by the restart of the IIS processes automatically when some memory limit is
exceeded (for ex.) )
* Ability to access some system resources that can be restricted by IIS
etc.
make me think a separete search service is a better idea.But at last, it is a
design decision of you.
(Think, A WebApp+Solr in Java world)
DIGY
> Changes to make Lucene.NET compatible with ASP.NET Medium Trust Level, in
> hosting environments (like GoDaddy...)
> -----------------------------------------------------------------------------------------------------------------
>
> Key: LUCENENET-169
> URL: https://issues.apache.org/jira/browse/LUCENENET-169
> Project: Lucene.Net
> Issue Type: Improvement
> Environment: ASP.NET
> Reporter: Corey Trager
> Attachments: FSDirectory.patch
>
>
> Microsoft has a configuration file for shared hosting for what they call
> "Medium Trust". There are a couple places in FSDirectory.cs that violate
> the restrictions of Medium Trust, but I coded workarounds, shown below.
> #1)
> // Corey Trager, Oct 2008: Commented call to GetTempPath to workaround
> permission restrictions at shared host.
> // LOCK_DIR isn't used anyway.
> public static readonly System.String LOCK_DIR = null; //
> SupportClass.AppSettings.Get("Lucene.Net.lockDir",
> System.IO.Path.GetTempPath());
> #2)
> /// <summary>Returns an array of strings, one for each Lucene
> index file in the directory. </summary>
> public override System.String[] List()
> {
> /* Changes by Corey Trager, Oct 2008, to workaround permission restrictions
> at shared host */
> System.IO.DirectoryInfo dir = new
> System.IO.DirectoryInfo(directory.FullName);
> System.IO.FileInfo[] files = dir.GetFiles();
> string[] list = new string[files.Length];
> for (int i = 0; i < files.Length; i++)
> {
> list[i] = files[i].Name;
> }
> return list;
> /* end of changes */
> // System.String[] files =
> SupportClass.FileSupport.GetLuceneIndexFiles(directory.FullName,
> IndexFileNameFilter.GetFilter());
> // for (int i = 0; i < files.Length; i++)
> // {
> // System.IO.FileInfo fi = new System.IO.FileInfo(files[i]);
> // files[i] = fi.Name;
> // }
> // return files;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.