[ https://issues.apache.org/jira/browse/NUTCH-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14368484#comment-14368484 ]
ASF GitHub Bot commented on NUTCH-1968: --------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/nutch/pull/14 > File Name too long issue of DumpFileUtil.java file > -------------------------------------------------- > > Key: NUTCH-1968 > URL: https://issues.apache.org/jira/browse/NUTCH-1968 > Project: Nutch > Issue Type: Bug > Components: tool > Affects Versions: 1.10 > Environment: Nutch 1.10 Revision 1667458 > Reporter: Xin Zhang > Assignee: Chris A. Mattmann > Labels: dumper, filename > Fix For: 1.10 > > Attachments: EXTENSION_TOO_LONG.patch > > > With the helpful patch that Renxia posts > https://issues.apache.org/jira/browse/NUTCH-1957, I figure out that we need > to solve the file name collision, otherwise we will lose data. However, when > I use this patch to execute bin/nutch dump, I get file name too long error as > follows: > zhangxin0804@zhangxin0804-VirtualBox:~/Desktop/Nutch/nutch/runtime/local$ > bin/nutch dump -outputDir outputDir -segment TestCrawl2/segments > java.io.FileNotFoundException:/home/zhangxin0804/Desktop/Nutch/nutch/runtime/local/outputDir/86/fc/830433456bfbcff5f7b53661cc24d9d4_maps.php?submitted=true&year=2014&month=6&imgs%5b%5d=nationaltavgrank&imgs%5b%5d=nationaltmaxrank&imgs%5b%5d=nationaltminrank&imgs%5b%5d=nationalpcpnrank&imgs%5b%5d=regionaltavgrank&imgs%5b%5d=regionaltmaxrank&imgs%5b%5d=regionaltminrank&imgs%5b%5d=regionalpcpnrank&imgs%5b%5d=statewidetavgrank&imgs%5b%5d=statewidetmaxrank&imgs%5b%5d=statewidetminrank&imgs%5b%5d=statewidepcpnrank&imgs%5b%5d=divisionaltavgrank&imgs%5b%5d=divisionaltmaxrank&imgs%5b%5d=divisionaltminrank&imgs%5b%5d=divisionalpcpnrank&ts=3 > (File name too long) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:221) > at java.io.FileOutputStream.<init>(FileOutputStream.java:171) > at org.apache.nutch.tools.FileDumper.dump(FileDumper.java:221) > at org.apache.nutch.tools.FileDumper.main(FileDumper.java:309) > I dig into this patch and find it only checks the length of fileBaseName in > /nutch/trunk/src/java/org/apache/nutch/util/DumpFileUtil.java. Therefore, if > the <extension> is too long, the final outputFullPath is still too long which > means it will throw exception in FileDumper.java Probably not everyone will > meet this issue and it is maybe a minor bug, correct me if I am wrong. > Meanwhile, is that OK to truncate fileExtension name as we did on fileBase > name to solve this problem? -- This message was sent by Atlassian JIRA (v6.3.4#6332)