[ 
https://issues.apache.org/jira/browse/NUTCH-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583080#comment-13583080
 ] 

Roland commented on NUTCH-1534:
-------------------------------

Seems to be not the case, I had breakpoints in :557 triggered but nothing 
happend afterwards while stepping through code.
later on I hit another breakpoint I set in :580
{code}
FetcherThread25[1] print 
this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, 
fit.page).getStatus()
 this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, 
fit.page).getStatus() = "org.apache.nutch.storage.ProtocolStatus@783f {
  "code":"1"
  "args":"[]"
  "lastModified":"0"
}"
{code}

so we are in switch... case ProtocolStatusCodes.SUCCESS (:533)
and there we go directly into output()

so I checked what output gets:
{code}
FetcherThread25[1] print 
this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, 
fit.page).getContent()

this.protocolFactory.getProtocol(fit.url).getProtocolOutput(fit.url, 
fit.page).getContent() = "Version: 0
url: 
http://kleinanzeigen.ebay.de/anzeigen/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924-203-6766
base: 
http://kleinanzeigen.ebay.de/anzeigen/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924-203-6766
contentType: text/html
metadata: Content-Language=de-DE Age=0 Content-Length=13143 
Set-Cookie=up=%7B%22ln%22%3A%22974347744%22%2C%22r%22%3A%224%22%7D; 
Expires=Tue, 20-Aug-2013 09:57:49 GMT; Path=/anzeigen/ Connection=keep-alive 
Server=Apache X-Varnish=3688196070 Vary=User-Agent,Accept-Encoding Date=Thu, 21 
Feb 2013 09:57:49 GMT Content-Encoding=gzip Via=1.1 varnish 
Content-Type=text/html;charset=UTF-8 Accept-Ranges=bytes
Content:
<!DOCTYPE html><!--[if IE 7 ]>    <html class="ie ie7" lang="de"> <![endif]-->
<!--[if IE 8 ]>    <html class="ie ie8" lang="de"> <![endif]-->
<!--[if IE 9 ]>    <html class="ie ie9" lang="de"> <![endif]-->
<!--[if gt IE 9 ]>    <html class="ie gtie9" lang="de"> <![endif]-->
<!--[if lt IE 7 ]>    <html class="ie ltie7" lang="de"> <![endif]-->
<!--[if !IE]><!--> <html lang="de">         <!--<![endif]-->
<head>
    <meta charset="UTF-8">
        <meta name="gaVirtualUrl" 
content="/PVIP_immonet/195_Immobilien/203_Wohnung_mieten/wohnung_mieten.qm_i_130_wohnung_mieten.zimmer_i_3/B2C/OFFER"/>
        <title>Große 3-Zimmerwohnung in zentraler Lage in Kitzingen in Bayern - 
Kitzingen | eBay Kleinanzeigen</title>
        <link rel="SHORTCUT ICON" href="/git.REL-262.0.0/img/favicon.ico" 
type="image/vnd.microsoft.icon" />

        <meta name="description" content="Große 3-Zimmerwohnung in zentraler 
Lage in Kitzingen

Objektbeschreibung:
Große 3-Zimmerwohung in...,Große 3-Zimmerwohnung in zentraler Lage in Kitzingen 
in Bayern - Kitzingen" /><link rel="canonical" 
href="http://kleinanzeigen.ebay.de/anzeigen/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924-203-6766";
 /><meta name="robots" content="noindex, follow"/><link rel="alternate" 
media="only screen and (max-width: 640px)" 
href="http://m.kleinanzeigen.ebay.de/s-anzeige/grosse-3-zimmerwohnung-in-zentraler-lage-in-kitzingen/102053924";
 >
[...]"
{code}

looks fine for me

I changed the end of output() to:
{code}
      try {
        context.write(key, fit.page);
      } catch (final Throwable t) {
        LOG.error("Unexpected error for " + fit.url, t);
      }
{code}

and set a breakpoint to LOG line, we will see...

                
> cassandra/hector exception: InvalidRequestException(why:column name must not 
> be empty)
> --------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1534
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1534
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, parser
>    Affects Versions: 2.1
>         Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / 
> gora-core 0.2.1
> running fetch with parse=true
>            Reporter: Roland
>             Fix For: 2.2
>
>
> during bigger fetches (100k+ URLs), sometimes these errors occure:
> {code}
> 2013-02-19 09:32:09,639 WARN  fetcher.FetcherJob - Attempting to finish item 
> from unknown queue: FetchItem [queueID=http://www.wer-kennt-wen.de, url=http
> ://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09, 
> u=http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09, 
> page=org.apache.nutch.storage.WebPa
> ge@7b1ab444 {
>   "baseUrl":"null"
>   "status":"34"
>   "fetchTime":"1361262537305"
>   "prevFetchTime":"1361257503835"
>   "fetchInterval":"0"
>   "retriesSinceFetch":"0"
>   "modifiedTime":"0"
>   "protocolStatus":"org.apache.nutch.storage.ProtocolStatus@40b98 {
>   "code":"16"
>   "args":"[Http code=403, 
> url=http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09]";
>   "lastModified":"0"
> }"
>   "content":"null"
>   "contentType":"null"
>   "prevSignature":"null"
>   "signature":"null"
>   "title":"null"
>   "text":"null"
>   "parseStatus":"null"
>   "score":"0.0"
>   "reprUrl":"null"
>   "headers":"{Set-Cookie=WKWSESSID=9d968aeef3a709bc4bba9bb955b93e1e; path=/; 
> domain=.wer-kennt-wen.de, Connection=close, Content-Type=text/html, Cache-Co
> ntrol=no-store, no-cache, must-revalidate, post-check=0, pre-check=0, 
> Date=Tue, 19 Feb 2013 08:28:57 GMT, P3P=CP="CAO OUR", Expires=Thu, 19 Nov 
> 1981 08:5
> 2:00 GMT, Server=Apache, Pragma=no-cache}"
>   "outlinks":"{}"
>   "inlinks":"{}"
>   "markers":"{dist=0, _injmrk_=y, _ftcmrk_=1361257998-2045033576, 
> _gnmrk_=1361257998-2045033576}"
>   "metadata":"{}"
> }]
> 2013-02-19 09:32:09,640 ERROR fetcher.FetcherJob - Unexpected error for 
> http://www.wer-kennt-wen.de/gallery/imageshow/mmfqq4y02q09
> me.prettyprint.hector.api.exceptions.HInvalidRequestException: 
> InvalidRequestException(why:column name must not be empty)
>         at 
> me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:52)
>         at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:97)
>         at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:90)
>         at 
> me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:101)
>         at 
> me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:233)
>         at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
>         at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)
>         at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:248)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:245)
>         at 
> me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
>         at 
> me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:245)
>         at 
> me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:79)
>         at 
> org.apache.gora.cassandra.store.CassandraClient.addSubColumn(CassandraClient.java:172)
>         at 
> org.apache.gora.cassandra.store.CassandraStore.addOrUpdateField(CassandraStore.java:360)
>         at 
> org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:212)
>         at 
> org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65)
>         at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:587)
>         at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at 
> org.apache.nutch.fetcher.FetcherReducer$FetcherThread.output(FetcherReducer.java:663)
>         at 
> org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:557)
> Caused by: InvalidRequestException(why:column name must not be empty)
>         at 
> org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19479)
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:1035)
>         at 
> org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009)
>         at 
> me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceImpl.java:95)
>         ... 20 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to