ok so i got
gora-core-0.3-20130401.060419-325.jar
gora-hbase-0.3-20130401.065448-305.jar

and when I run generate the code finished without any exception but the log file was full of lines like this (one for every url that I had in webpage table)

INFO mapreduce.GoraRecordWriter - Exception at GoraRecordWriter.class while writing to datastore.HBase mapping for field [org.apache.nutch.storage.WebPage#batchId] not found. Wrong gora-hbase-mapping.xml?


when i checked gora-hbase-mapping.xml there was no field for batchId

so I copied this line from  gora-cassandra-mapping.xml

<field name="batchId" family="f" qualifier="bid"/>

after that everything (and by that I mean generate fetch updatedb) worked fine. So now here are my questions:

1- as I said that line is missing for gora-hbase-mapping.xml. does this needs an jira issue or can you guys just add it and commit without going through all the hoops?

2- is the trunk version supposed to be compiled against the gora trunk? cause the current HEAD is not working with 0.2.1?

P.S this by the way worked the same with and without NUTCH-1551 patch



On 04/01/2013 03:28 PM, Lewis John Mcgibbney wrote:
You're right, this is a dev issue for sure.


On Mon, Apr 1, 2013 at 2:45 PM, kaveh minooie <ka...@plutoz.com
<mailto:ka...@plutoz.com>> wrote:

    The patch NUTCH-1551 didn't solve my issue. I am still getting the
    same exact error when i try to run generate. (this was run in local
    mode) :


NUTCH-1551 is not supposed to fix this problem entirely. It merely
attempts to make the WebTableReader tool backwards compatible and
permits you to check whether accesor methods WebPage.getBatchID() and
WebPage.getPrevModifiedTime() actually work for your use case. If you
are able to check and provide feedback of the webtable dump for the URL
causing the NPE it would be very valuable indeed.


    now the likely variable that is null seems to be 'mapkey' which is
    probably as a result of male formed URL ( thou I can't say that for
    sure )

    now the put function is being called from here

    this is from gora 2.1:

    
gora/blob/0.2.1/gora-core/src/__main/java/org/apache/gora/__mapreduce/GoraRecordWriter.__java:

    ...


    the same function in gora trunk is like this:
    ...

    which seems to me that would allow the code to recover from this
    kind of errors. now I get gora through ivy and I don't know how or
    if I can have ivy to fetch the trunk but regardless I still think
    the question remains whether it is a nutch issue or gora?

So it appears that some issues have been addressed and improved within
Gora trunk (which is nice). You can pull a Gora SNAPSHOT from here [0]
and place it on your class path then try it out. Feedback would be
greatly appreciated.

The underlying problem here is that not everyone using and developing
Gora is using and developing Nutch. We have been making good progress
towards building diversity over in Gora so that it is not so heavily
reliant upon Nutch users. This means the project can stand on its own
two feet. The downside of this, is that *some* bugs arising from *some*
use cases are not discovered until a little later than we would like.
Your feedback is really really helpful.

It should be noted that you can also patch your local copy of 2.x HEAD
to not contain the two offending issues we've previously discussed.

[0]
https://repository.apache.org/content/repositories/snapshots/org/apache/gora/

--
Kaveh Minooie

Reply via email to