Issue with Japanese User Dictionary

2022-01-12 Thread Marc D'Mello
Hi,

I had a question about the Japanese user dictionary. We have a user
dictionary that used to work but after attempting to upgrade Lucene, it
fails with the following error:

Caused by: java.lang.RuntimeException: Illegal user dictionary entry レコーダー
- the concatenated segmentation (レコーダー) does not match the surface form
(レコーダー)
at
org.apache.lucene.analysis.ja.dict.UserDictionary.(UserDictionary.java:123)

The specific commit causing this error is here
.
The only thing that seems to differ is that the characters are full-width
vs half-width, so I was wondering if this is intended behavior or a bug/too
restrictive. Any suggestions for fixing this would be greatly appreciated!
Thanks!


migration from lucene 5 to 8

2022-01-12 Thread Sascha Janz
Hello,

the body from my previous mail was filtered out...


we need to migrate our lucene 5.5 indexes to version 8.11.1. fortunately i 
found the IndexUpgrader class which i didn't know yet.

i tried to migrate from major version to major version.

so i did

java -cp lucene-core-6.6.6.jar;lucene-backward-codecs-6.6.6.jar 
org.apache.lucene.index.IndexUpgrader -delete-prior-commits -verbose 
"V:\\LuceneMigration\\5"

next step

java -cp lucene-core-7.7.3.jar;lucene-backward-codecs-7.7.3.jar 
org.apache.lucene.index.IndexUpgrader -delete-prior-commits -verbose 
"V:\\LuceneMigration\\5"

and then

java -cp lucene-core-8.11.1.jar;lucene-backward-codecs-8.11.1.jar 
org.apache.lucene.index.IndexUpgrader -delete-prior-commits -verbose 
"V:\\LuceneMigration\\5"

the first two seems to work well.

but with the last i get

MS 0 [2022-01-12T14:04:24.248Z; main]: initDynamicDefaults spins=true 
maxThreadCount=1 maxMergeCount=6
IW 0 [2022-01-12T14:04:24.275Z; main]: init: hit exception on init; releasing 
write lock
Exception in thread "main" org.apache.lucene.index.IndexFormatTooOldException: 
Format version is not supported (resource 
BufferedChecksumIndexInput(MMapIndexInput(path="V:\LuceneMigration\5\segments_91"))):
 This index was initially created with Lucene 6.x while the current version is 
8.11.1 and Lucene only supports reading the current and previous major 
versions.. This version of Lucene only supports indexes created with release 
7.0 and later.
at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:322)
at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:291)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1037)
at org.apache.lucene.index.IndexUpgrader.upgrade(IndexUpgrader.java:167)
at org.apache.lucene.index.IndexUpgrader.main(IndexUpgrader.java:78)
Suppressed: org.apache.lucene.index.CorruptIndexException: checksum 
passed (7268b2f2). possibly transient resource issue, or a Lucene or JVM bug 
(resource=BufferedChecksumIndexInput(MMapIndexInput(path="V:\LuceneMigration\5\segments_91")))
at 
org.apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:466)
at 
org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:434)
... 4 more

did i anything wrong?



thanks for help.

regards

Sascha

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Migration from Lucene 5.5 to 8.11.1

2022-01-12 Thread Adrien Grand
The log says what the problem is: version 8.11.1 cannot read indices
created by Lucene 5.5, you will need to reindex your data.

On Wed, Jan 12, 2022 at 3:41 PM  wrote:
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org



-- 
Adrien

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Migration from Lucene 5.5 to 8.11.1

2022-01-12 Thread Sascha . Janz
C:\workspaces\workspaceecm5\LuceneIndexUpgrade\lb>java -cp 
lucene-core-6.6.6.jar;lucene-backward-codecs-6.6.6.jar 
org.apache.lucene.index.IndexUpgrader -delete-prior-commits -verbose 
"V:\\LuceneMigration\\5"
IFD 0 [2022-01-12T14:02:35.479Z; main]: init: current segments file is 
"segments_8z"; 
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@6833ce2c
IFD 0 [2022-01-12T14:02:35.491Z; main]: init: load commit "segments_8z"
IFD 0 [2022-01-12T14:02:35.494Z; main]: delete []
IFD 0 [2022-01-12T14:02:35.495Z; main]: now checkpoint "_gg(5.5.5):C98089 
_gu(5.5.5):C52875 _h4(5.5.5):c11834" [3 segments ; isCommit = false]
IFD 0 [2022-01-12T14:02:35.495Z; main]: delete []
IFD 0 [2022-01-12T14:02:35.496Z; main]: 1 msec to checkpoint
IW 0 [2022-01-12T14:02:35.496Z; main]: init: create=false
IW 0 [2022-01-12T14:02:35.498Z; main]:
dir=MMapDirectory@V:\LuceneMigration\5 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@7d417077
index=_gg(5.5.5):C98089 _gu(5.5.5):C52875 _h4(5.5.5):c11834
version=6.6.6
analyzer=null
ramBufferSizeMB=16.0
maxBufferedDocs=-1
maxBufferedDeleteTerms=-1
mergedSegmentWarmer=null
delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
commit=null
openMode=CREATE_OR_APPEND
similarity=org.apache.lucene.search.similarities.BM25Similarity
mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=-1, maxMergeCount=-1, 
ioThrottle=true
codec=Lucene62
infoStream=org.apache.lucene.util.PrintStreamInfoStream
mergePolicy=UpgradeIndexMergePolicy([TieredMergePolicy: maxMergeAtOnce=10, 
maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, 
forceMergeDeletesPctAllowed=10.0, segmentsPerTier=10.0, 
maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.1)
indexerThreadPool=org.apache.lucene.index.DocumentsWriterPerThreadPool@7dc36524
readerPooling=false
perThreadHardLimitMB=1945
useCompoundFile=true
commitOnClose=true
indexSort=null
writer=org.apache.lucene.index.IndexWriter@35bbe5e8

IW 0 [2022-01-12T14:02:35.499Z; main]: MMapDirectory.UNMAP_SUPPORTED=true
IndexUpgrader 0 [2022-01-12T14:02:35.499Z; main]: Upgrading all pre-6.6.6 
segments of index directory 'MMapDirectory@V:\LuceneMigration\5 
lockFactory=org.apache.lucene.store.NativeFSLockFactory@7d417077' to version 
6.6.6...
IW 0 [2022-01-12T14:02:35.500Z; main]: forceMerge: index now _gg(5.5.5):C98089 
_gu(5.5.5):C52875 _h4(5.5.5):c11834
IW 0 [2022-01-12T14:02:35.501Z; main]: now flush at forceMerge
IW 0 [2022-01-12T14:02:35.501Z; main]:   start flush: applyAllDeletes=true
IW 0 [2022-01-12T14:02:35.502Z; main]:   index before flush _gg(5.5.5):C98089 
_gu(5.5.5):C52875 _h4(5.5.5):c11834
DW 0 [2022-01-12T14:02:35.502Z; main]: startFullFlush
DW 0 [2022-01-12T14:02:35.503Z; main]: main finishFullFlush success=true
IW 0 [2022-01-12T14:02:35.503Z; main]: apply all deletes during flush
IW 0 [2022-01-12T14:02:35.503Z; main]: now apply all deletes for all segments 
maxDoc=162798
BD 0 [2022-01-12T14:02:35.509Z; main]: applyDeletes: open segment readers took 
0 msec
BD 0 [2022-01-12T14:02:35.510Z; main]: applyDeletes: no segments; skipping
BD 0 [2022-01-12T14:02:35.510Z; main]: prune sis=segments_8z: _gg(5.5.5):C98089 
_gu(5.5.5):C52875 _h4(5.5.5):c11834 minGen=0 packetCount=0
UPGMP 0 [2022-01-12T14:02:35.511Z; main]: findForcedMerges: 
segmentsToUpgrade={_gg(5.5.5):C98089=true, _gu(5.5.5):C52875=true, 
_h4(5.5.5):c11834=true}
TMP 0 [2022-01-12T14:02:35.511Z; main]: findForcedMerges maxSegmentCount=1 
infos=_gg(5.5.5):C98089 _gu(5.5.5):C52875 _h4(5.5.5):c11834 
segmentsToMerge={_gg(5.5.5):C98089=true, _gu(5.5.5):C52875=true, 
_h4(5.5.5):c11834=true}
TMP 0 [2022-01-12T14:02:35.514Z; main]: eligible=[_gg(5.5.5):C98089, 
_gu(5.5.5):C52875, _h4(5.5.5):c11834]
TMP 0 [2022-01-12T14:02:35.514Z; main]: forceMergeRunning=false
TMP 0 [2022-01-12T14:02:35.516Z; main]: add final merge=_gg(5.5.5):C98089 
_gu(5.5.5):C52875 _h4(5.5.5):c11834
IW 0 [2022-01-12T14:02:35.517Z; main]: add merge to pendingMerges: 
_gg(5.5.5):C98089 _gu(5.5.5):C52875 _h4(5.5.5):c11834 [total 1 pending]
IW 0 [2022-01-12T14:02:35.517Z; main]: registerMerge merging= []
IW 0 [2022-01-12T14:02:35.518Z; main]: registerMerge info=_gg(5.5.5):C98089
IW 0 [2022-01-12T14:02:35.518Z; main]: registerMerge info=_gu(5.5.5):C52875
IW 0 [2022-01-12T14:02:35.519Z; main]: registerMerge info=_h4(5.5.5):c11834
MS 0 [2022-01-12T14:02:35.520Z; main]: initDynamicDefaults spins=true 
maxThreadCount=1 maxMergeCount=6
MS 0 [2022-01-12T14:02:35.522Z; main]: now merge
MS 0 [2022-01-12T14:02:35.522Z; main]:   index: _gg(5.5.5):C98089 
_gu(5.5.5):C52875 _h4(5.5.5):c11834
MS 0 [2022-01-12T14:02:35.522Z; main]:   consider merge _gg(5.5.5):C98089 
_gu(5.5.5):C52875 _h4(5.5.5):c11834
MS 0 [2022-01-12T14:02:35.523Z; main]: launch new thread [Lucene Merge 
Thread #0]
MS 0 [2022-01-12T14:02:35.524Z; Lucene Merge Thread #0]:   merge thread: start
IW 0 [2022-01-12T14:02:35.524Z; Lucene Merge Thread #0]: now apply deletes for 
3 merging segments
BD 0 [2022-01-12T14:02:35.524Z;