Ishan Sri created LUCENE-8830: --------------------------------- Summary: DefaultIndexingChain.getOrAddField method ignores omitNorms from FieldType Key: LUCENE-8830 URL: https://issues.apache.org/jira/browse/LUCENE-8830 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 6.6.1 Reporter: Ishan Sri
Norms are being computed and written even when *omitNorms is set to true* in the fieldTypes. I chased the issue and found that the method *getOrAddField* tries to create a *FieldInfo* object in the 1st pass. By default this object has omitNorms to false. The method sets the *indexOptions* as specified in the fieldType on this newly created object but doesn't do the same for *omitNorms.* This effectively overrides this flag which creates issues down the line. Here's the code snippet for the method with the *fieldInfos.getOrAdd* call {code:java} private PerField getOrAddField(String name, IndexableFieldType fieldType, boolean invert) { // Make sure we have a PerField allocated final int hashPos = name.hashCode() & hashMask; PerField fp = fieldHash[hashPos]; while (fp != null && !fp.fieldInfo.name.equals(name)) { fp = fp.next; } if (fp == null) { // First time we are seeing this field in this segment FieldInfo fi = fieldInfos.getOrAdd(name); // Messy: must set this here because e.g. FreqProxTermsWriterPerField looks at the // initial IndexOptions to decide what arrays it must create). Then, we also must // set it in PerField.invert to allow for later downgrading of the index options: fi.setIndexOptions(fieldType.indexOptions()); fp = new PerField(fi, invert); ... {code} The *getOrAdd* method below instantiates a new object with omitNorms set to false as the 4th parameter. {code:java} /** Create a new field, or return existing one. */ public FieldInfo getOrAdd(String name) { FieldInfo fi = fieldInfo(name); if (fi == null) { // This field wasn't yet added to this in-RAM // segment's FieldInfo, so now we get a global // number for this field. If the field was seen // before then we'll get the same name and number, // else we'll allocate a new one: final int fieldNumber = globalFieldNumbers.addOrGet(name, -1, DocValuesType.NONE, 0, 0); fi = new FieldInfo(name, fieldNumber, false, false, false, IndexOptions.NONE, DocValuesType.NONE, -1, new HashMap<>(), 0, 0); assert !byName.containsKey(fi.name); globalFieldNumbers.verifyConsistent(Integer.valueOf(fi.number), fi.name, DocValuesType.NONE); byName.put(fi.name, fi); } return fi; }{code} This will cause norms to always be computed which not only produces incorrect scores but also impacts the disk usage if there are many documents with multiple fields which have this flag set to true but ignored -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org