Re: nutch 2.0 (trunk)

2010-09-07 Thread Andrzej Bialecki

On 2010-09-07 14:50, Faruk Berksöz wrote:

Dear all,

wenn i try to fetch a web page (e.g.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html ) with mysql
storage definition,
I am seeing the following error in my hadoop logs. ,  (no error with
hbase ) ;

java.io.IOException: java.sql.BatchUpdateException: Data truncation:
Data too long for column 'content' at row 1
 at org.gora.sql.store.SqlStore.flush(SqlStore.java:316)
 at org.gora.sql.store.SqlStore.close(SqlStore.java:163)
 at
org.gora.mapreduce.GoraOutputFormat$1.close(GoraOutputFormat.java:72)
 at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
 at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

The type of the column 'content' is BLOB.
It may be important for the next developments of Gora.
Should I file this in nutch-jira or hithub/gora or nothing?

environments : ubuntu 10.04
JVM : 1.6.0_20
nutch 2.0 (trunk)
Mysql/HBase (0.20.6) / Hadoop(0.20.2) pseudo-distributed


Yes, please create a JIRA issue. Thanks!



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: nutch 2.0 (trunk)

2010-09-07 Thread Julien Nioche
Hi Faruk,

You can either set a lower value for the parameter http.content.limit or
modify the mapping and set

field name=content column=content jdbc-type=MEDIUMBLOB/

which should work for mysql.

See the discussion on http://github.com/enis/gora/issues/closed#issue/48

HTH

Julien

-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com



On 7 September 2010 14:02, Andrzej Bialecki a...@getopt.org wrote:

 On 2010-09-07 14:50, Faruk Berksöz wrote:

 Dear all,

 wenn i try to fetch a web page (e.g.
 http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html ) with mysql
 storage definition,
 I am seeing the following error in my hadoop logs. ,  (no error with
 hbase ) ;

 java.io.IOException: java.sql.BatchUpdateException: Data truncation:
 Data too long for column 'content' at row 1
 at org.gora.sql.store.SqlStore.flush(SqlStore.java:316)
 at org.gora.sql.store.SqlStore.close(SqlStore.java:163)
 at
 org.gora.mapreduce.GoraOutputFormat$1.close(GoraOutputFormat.java:72)
 at
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)

 The type of the column 'content' is BLOB.
 It may be important for the next developments of Gora.
 Should I file this in nutch-jira or hithub/gora or nothing?

 environments : ubuntu 10.04
 JVM : 1.6.0_20
 nutch 2.0 (trunk)
 Mysql/HBase (0.20.6) / Hadoop(0.20.2) pseudo-distributed


 Yes, please create a JIRA issue. Thanks!



 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com