Shalin,
OK!
I got myself a JIRA account and opened solr-1000 and followed the
wiki instructions on creating a patch which I have now uploaded! Only
problem is that while the fix seems fine the test case I added to
TestFileListEntityProcessor.java fails. I need somebody who knows
what they are doing to point out what I am doing wrong and/or how
to debug test failures.
It would also be nice if I knew how to run or debug one Junit
test rather than all of them, which takes almost 8min.
@Test
public void testRECURSION() throws IOException {
long time = System.currentTimeMillis();
File childdir = new File(. + time + /child );
childdir.mkdirs();
childdir.deleteOnExit();
createFile(childdir, a.xml, a.xml.getBytes(), true);
createFile(childdir, b.xml, b.xml.getBytes(), true);
createFile(childdir, c.props, c.props.getBytes(), true);
Map attrs = AbstractDataImportHandlerTest.createMap(
FileListEntityProcessor.FILE_NAME, ^.*\\.xml$,
FileListEntityProcessor.BASE_DIR, childdir.getAbsolutePath(),
FileListEntityProcessor.RECURSIVE, true);
Context c = AbstractDataImportHandlerTest.getContext(null,
new VariableResolverImpl(), null, 0, Collections.EMPTY_LIST, attrs);
FileListEntityProcessor fileListEntityProcessor = new
FileListEntityProcessor();
fileListEntityProcessor.init(c);
ListString fList = new ArrayListString();
while (true) {
// add the documents to the index
MapString, Object f = fileListEntityProcessor.nextRow();
if (f == null)
break;
fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE));
}
System.out.println(List of files indexed -- + fList);
Assert.assertEquals(3, fList.size());
}
Regards Fergus.
On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie fer...@twig.me.uk wrote:
Hello
I have been trying to find out why DIH in FileListEntityProcessor
mode did not appear to be recursing into subdirectories. Going through
FileListEntityProcessor.java I eventually tumbled to the fact that my
filename filter setting from data-config.xml also applied to directory
names.
Hmm, not good.
entity name=jc
processor=FileListEntityProcessor
fileName=.*\.xml
newerThan='NOW-1000DAYS'
recursive=true
rootEntity=false
dataSource=null
baseDir=/Volumes/spare/ts/stuff/ford
Now, I feel that the fieldName filter should be applied to files fed
into the parser, it should not be applied to the directory names we are
recursing through. I bodged the code as follows to adjust the behavior
so that the FileName and excludes attributes of entity only
apply to filenames and not directory names.
I agree with you.
Perhaps we can have separate filters for directories and files but let's
hold on till the need comes up.
It now recurses though my directory tree only indexing the appropriate
files! I think the new behavior is more standard.
Is this a change valid?
Absolutely. Can you please create an issue and attach the patch? Thanks!
--
Regards,
Shalin Shekhar Mangar.
--
===
Fergus McMenemie Email:fer...@twig.me.uk
Techmore Ltd Phone:(UK) 07721 376021
Unix/Mac/Intranets Analyst Programmer
===