Re: DIH FileListEntityProcessor recursion and fileName clash

2009-02-02 Thread Fergus McMenemie
Shalin,

OK!

I got myself a JIRA account and opened solr-1000 and followed the
wiki instructions on creating a patch which I have now uploaded! Only
problem is that while the fix seems fine the test case I added to
TestFileListEntityProcessor.java fails. I need somebody who knows 
what they are doing to point out what I am doing wrong and/or how
to debug test failures.

It would also be nice if I knew how to run or debug one Junit
test rather than all of them, which takes almost 8min.



  @Test
  public void testRECURSION() throws IOException {
long time = System.currentTimeMillis();
File childdir = new File(. + time + /child );
childdir.mkdirs();
childdir.deleteOnExit();
createFile(childdir, a.xml, a.xml.getBytes(), true);
createFile(childdir, b.xml, b.xml.getBytes(), true);
createFile(childdir, c.props, c.props.getBytes(), true);
Map attrs = AbstractDataImportHandlerTest.createMap(
FileListEntityProcessor.FILE_NAME, ^.*\\.xml$,
FileListEntityProcessor.BASE_DIR, childdir.getAbsolutePath(),
FileListEntityProcessor.RECURSIVE, true);
Context c = AbstractDataImportHandlerTest.getContext(null,
new VariableResolverImpl(), null, 0, Collections.EMPTY_LIST, attrs);
FileListEntityProcessor fileListEntityProcessor = new 
FileListEntityProcessor();
fileListEntityProcessor.init(c);
ListString fList = new ArrayListString();
while (true) {
  // add the documents to the index
  MapString, Object f = fileListEntityProcessor.nextRow();
  if (f == null)
break;
  fList.add((String) f.get(FileListEntityProcessor.ABSOLUTE_FILE));
}
System.out.println(List of files indexed --  + fList);
Assert.assertEquals(3, fList.size());
  }

Regards Fergus.

On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie fer...@twig.me.uk wrote:

 Hello

 I have been trying to find out why DIH in FileListEntityProcessor
 mode did not appear to be recursing into subdirectories. Going through
 FileListEntityProcessor.java I eventually tumbled to the fact that my
 filename filter setting from data-config.xml also applied to directory
 names.


Hmm, not good.




entity name=jc
   processor=FileListEntityProcessor
   fileName=.*\.xml
   newerThan='NOW-1000DAYS'
   recursive=true
   rootEntity=false
   dataSource=null
   baseDir=/Volumes/spare/ts/stuff/ford

 Now, I feel that the fieldName filter should be applied to files fed
 into the parser, it should not be applied to the directory names we are
 recursing through. I bodged the code as follows to adjust the behavior
 so  that the FileName and excludes attributes of entity only
 apply to filenames and not directory names.


I agree with you.

Perhaps we can have separate filters for directories and files but let's
hold on till the need comes up.



 It now recurses though my directory tree only indexing the appropriate
 files! I think the new behavior is more standard.

 Is this a change valid?


Absolutely. Can you please create an issue and attach the patch? Thanks!

-- 
Regards,
Shalin Shekhar Mangar.

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===


Re: DIH FileListEntityProcessor recursion and fileName clash

2009-02-02 Thread Shalin Shekhar Mangar
On Mon, Feb 2, 2009 at 10:08 PM, Fergus McMenemie fer...@twig.me.uk wrote:

 Shalin,

 OK!

 I got myself a JIRA account and opened solr-1000 and followed the
 wiki instructions on creating a patch which I have now uploaded! Only
 problem is that while the fix seems fine the test case I added to
 TestFileListEntityProcessor.java fails. I need somebody who knows
 what they are doing to point out what I am doing wrong and/or how
 to debug test failures.


Thanks!

I'll take a look at the test.



 It would also be nice if I knew how to run or debug one Junit
 test rather than all of them, which takes almost 8min.


The following command can run a single test:
ant -Dtestcase=TestFileListEntityProcessor test

Also, since DIH is a contrib inside solr, you can execute the test-contrib
ant target to run only the tests included in contrib projects.

PS: Congratulations to be the lucky one to create the 1000th issue in Solr
:-)

-- 
Regards,
Shalin Shekhar Mangar.


Re: DIH FileListEntityProcessor recursion and fileName clash

2009-02-01 Thread Shalin Shekhar Mangar
On Mon, Feb 2, 2009 at 2:36 AM, Fergus McMenemie fer...@twig.me.uk wrote:

 Hello

 I have been trying to find out why DIH in FileListEntityProcessor
 mode did not appear to be recursing into subdirectories. Going through
 FileListEntityProcessor.java I eventually tumbled to the fact that my
 filename filter setting from data-config.xml also applied to directory
 names.


Hmm, not good.




entity name=jc
   processor=FileListEntityProcessor
   fileName=.*\.xml
   newerThan='NOW-1000DAYS'
   recursive=true
   rootEntity=false
   dataSource=null
   baseDir=/Volumes/spare/ts/stuff/ford

 Now, I feel that the fieldName filter should be applied to files fed
 into the parser, it should not be applied to the directory names we are
 recursing through. I bodged the code as follows to adjust the behavior
 so  that the FileName and excludes attributes of entity only
 apply to filenames and not directory names.


I agree with you.

Perhaps we can have separate filters for directories and files but let's
hold on till the need comes up.



 It now recurses though my directory tree only indexing the appropriate
 files! I think the new behavior is more standard.

 Is this a change valid?


Absolutely. Can you please create an issue and attach the patch? Thanks!

-- 
Regards,
Shalin Shekhar Mangar.