[ https://issues.apache.org/jira/browse/NUTCH-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583080#comment-13583080 ]
Roland commented on NUTCH-1534: ------------------------------- Seems to be not the case, I had breakpoints in :557 triggered but nothing happend afterwards while stepping through code. later on I hit another breakpoint I set in :580 {code} FetcherThread25[1] print this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, fit.page).getStatus() this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, fit.page).getStatus() = "org.apache.nutch.storage.ProtocolStatus@783f { "code":"1" "args":"[]" "lastModified":"0" }" {code} so we are in switch... case ProtocolStatusCodes.SUCCESS (:533) and there we go directly into output() so I checked what output gets: {code} FetcherThread25[1] print this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, fit.page).getContent() this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, fit.page).getContent() = "Version: 0 url: http://kleinanzeigen.ebay.de/anzeigen/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924-203-6766 base: http://kleinanzeigen.ebay.de/anzeigen/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924-203-6766 contentType: text/html metadata: Content-Language=de-DE Age=0 Content-Length=13143 Set-Cookie=up=%7B%22ln%22%3A%22974347744%22%2C%22r%22%3A%224%22%7D; Expires=Tue, 20-Aug-2013 09:57:49 GMT; Path=/anzeigen/ Connection=keep-alive Server=Apache X-Varnish=3688196070 Vary=User-Agent,Accept-Encoding Date=Thu, 21 Feb 2013 09:57:49 GMT Content-Encoding=gzip Via=1.1 varnish Content-Type=text/html;charset=UTF-8 Accept-Ranges=bytes Content: <!DOCTYPE html><!--[if IE 7 ]> <html class="ie ie7" lang="de"> <![endif]--> <!--[if IE 8 ]> <html class="ie ie8" lang="de"> <![endif]--> <!--[if IE 9 ]> <html class="ie ie9" lang="de"> <![endif]--> <!--[if gt IE 9 ]> <html class="ie gtie9" lang="de"> <![endif]--> <!--[if lt IE 7 ]> <html class="ie ltie7" lang="de"> <![endif]--> <!--[if !IE]><!--> <html lang="de"> <!--<![endif]--> <head> <meta charset="UTF-8"> <meta name="gaVirtualUrl" content="/PVIP_immonet/195_Immobilien/203_Wohnung_mieten/wohnung_mieten.qm_i_130_wohnung_mieten.zimmer_i_3/B2C/OFFER"/> <title>Große 3-Zimmerwohnung in zentraler Lage in Kitzingen in Bayern - Kitzingen | eBay Kleinanzeigen</title> <link rel="SHORTCUT ICON" href="/git.REL-262.0.0/img/favicon.ico" type="image/vnd.microsoft.icon" /> <meta name="description" content="Große 3-Zimmerwohnung in zentraler Lage in Kitzingen Objektbeschreibung: Große 3-Zimmerwohung in...,Große 3-Zimmerwohnung in zentraler Lage in Kitzingen in Bayern - Kitzingen" /><link rel="canonical" href="http://kleinanzeigen.ebay.de/anzeigen/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924-203-6766" /><meta name="robots" content="noindex, follow"/><link rel="alternate" media="only screen and (max-width: 640px)" href="http://m.kleinanzeigen.ebay.de/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924" > [...]" {code} looks fine for me I changed the end of output() to: {code} try { context.write(key, fit.page); } catch (final Throwable t) { LOG.error("Unexpected error for " + fit.url, t); } {code} and set a breakpoint to LOG line, we will see... > cassandra/hector exception: InvalidRequestException(why:column name must not > be empty) > -------------------------------------------------------------------------------------- > > Key: NUTCH-1534 > URL: https://issues.apache.org/jira/browse/NUTCH-1534 > Project: Nutch > Issue Type: Bug > Components: fetcher, parser > Affects Versions: 2.1 > Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / > gora-core 0.2.1 > running fetch with parse=true > Reporter: Roland > Fix For: 2.2 > > > during bigger fetches (100k+ URLs), sometimes these errors occure: > {code} > 2013-02-19 09:32:09,639 WARN fetcher.FetcherJob - Attempting to finish item > from unknown queue: FetchItem [queueID=http://www.wer-kennt-wen.de, url=http > ://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09, > u=http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09, > page=org.apache.nutch.storage.WebPa > ge@7b1ab444 { > "baseUrl":"null" > "status":"34" > "fetchTime":"1361262537305" > "prevFetchTime":"1361257503835" > "fetchInterval":"0" > "retriesSinceFetch":"0" > "modifiedTime":"0" > "protocolStatus":"org.apache.nutch.storage.ProtocolStatus@40b98 { > "code":"16" > "args":"[Http code=403, > url=http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09]" > "lastModified":"0" > }" > "content":"null" > "contentType":"null" > "prevSignature":"null" > "signature":"null" > "title":"null" > "text":"null" > "parseStatus":"null" > "score":"0.0" > "reprUrl":"null" > "headers":"{Set-Cookie=WKWSESSID=9d968aeef3a709bc4bba9bb955b93e1e; path=/; > domain=.wer-kennt-wen.de, Connection=close, Content-Type=text/html, Cache-Co > ntrol=no-store, no-cache, must-revalidate, post-check=0, pre-check=0, > Date=Tue, 19 Feb 2013 08:28:57 GMT, P3P=CP="CAO OUR", Expires=Thu, 19 Nov > 1981 08:5 > 2:00 GMT, Server=Apache, Pragma=no-cache}" > "outlinks":"{}" > "inlinks":"{}" > "markers":"{dist=0, _injmrk_=y, _ftcmrk_=1361257998-2045033576, > _gnmrk_=1361257998-2045033576}" > "metadata":"{}" > }] > 2013-02-19 09:32:09,640 ERROR fetcher.FetcherJob - Unexpected error for > http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09 > me.prettyprint.hector.api.exceptions.HInvalidRequestException: > InvalidRequestException(why:column name must not be empty) > at > me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90) > at > me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101) > at > me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:233) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108) > at > me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:248) > at > me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:245) > at > me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) > at > me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) > at > me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:245) > at > me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:79) > at > org.apache.gora.cassandra.store.CassandraClient.addSubColumn(CassandraClient.java:172) > at > org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:360) > at > org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:212) > at > org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65) > at > org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587) > at > org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) > at > org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:663) > at > org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:557) > Caused by: InvalidRequestException(why:column name must not be empty) > at > org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19479) > at > org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035) > at > org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) > at > me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95) > ... 20 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira