parse-mp3 plugin concatenating previous tags for text field
-----------------------------------------------------------
Key: NUTCH-414
URL: http://issues.apache.org/jira/browse/NUTCH-414
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 0.9.0
Environment: -
Reporter: Brian Whitman
The parse-mp3 plugin seems to be saving a state of the previous parse's text
content. For every new mp3 file parsed, it is putting the contents of all the
previous text fields in the plain text field for that file.
You can see this by fetching a set of mp3s in one segment, then viewing their
plain text in the nutch webapp. The plaintext will include the contents of all
files fetched in that round, which makes searching fruitless.
I made a tiny band-aid change to MP3Parser.java and MetadataCollector.java
against the nightly. It seems to fix the problem.
--- MP3Parser.java 2006-12-10 09:43:26.000000000 -0500
+++ MP3Parser.java.new 2006-12-10 16:37:03.000000000 -0500
@@ -67,7 +67,7 @@
fos.write(raw);
fos.close();
MP3File mp3 = new MP3File(tmp);
-
+ metadataCollector.clearText();
if (mp3.hasID3v2Tag()) {
parse = getID3v2Parse(mp3, content.getMetadata());
} else if (mp3.hasID3v1Tag()) {
--- MetadataCollector.java 2006-12-10 09:43:26.000000000 -0500
+++ MetadataCollector.java.new 2006-12-10 16:37:28.000000000 -0500
@@ -42,6 +42,10 @@
this.conf = conf;
}
+ public void clearText() {
+ text = "";
+ }
+
public void notifyProperty(String name, String value) throws
MalformedURLException {
if (name.equals("TIT2-Text"))
setTitle(value);
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers