Re: Error importing hbase table on new system

2015-09-27 Thread Håvard Wahl Kongsgård
But is not that tool for reading regions not export (dumps) ??

-Håvard

On Sun, Sep 27, 2015 at 6:23 PM, Ted Yu  wrote:
> Have you used HFile tool ?
>
> The tool would print out that information as part of metadata.
>
> Cheers
>
> On Sun, Sep 27, 2015 at 9:19 AM, Håvard Wahl Kongsgård
>  wrote:
>>
>> Yes, I have tried to read them on another system as well. It worked
>> there. But I don't know if they are HFilev1 or HFilev2 format(any way
>> to check ?? )
>>
>> This is the first lines from one of the files
>>
>> SEQ
>> 1org.apache.hadoop.hbase.io.ImmutableBytesWritable%org.apache.hadoop.hbase.client.Result
>> *org.apache.hadoop.io.compress.DefaultCodec���N��
>> $��a&t ��wb%!10107712083-10152358443612846x��� P ]�6:� w |pw
>>  ���   �$��K 0�   Npw�$x�@� ���s�;�Ɦ���V�^��ݽW� �� �
>> �қ ��<� /��0/ � ?'/ �
>>  � /��� �� 7{kG [(���  ���w�� OY^I ���}9 � �l��;�TJ�� �� �J� ‹  
>> pu���V�   ӡm�\E @ ��V6�oe45U ���,�3 ���Ͻ�w��O���zڼ�/��歇�KȦ/ ?��  Y;�
>> / ��� �� �� }� ��룫-�'_�k� ��q� $ ��˨� � ���^
>> ��� i��� tH$/��e.J��{S �\��S >G d���1~ p#��  o �� ��M
>> �!٠��;c��I kQ
>> �A)|d�i�(Z�f��o Pb �j {�  �x��� � `�b���cbb`�"�} �
>> HCG��&�JG�%��',*!!��
>> �� �  � ��& �_Q��R�2�1��_��~>:� b  � ���w @�B�  ~Y�H�(�h/FR
>> _+��nX `#�
>> |D��� �j��܏�� f ��ƨT��k/ 颚h ��4` +Q#�ⵕ�,Z�80�V:�
>>  )Y)4Lq��[�   z#���T>
>> -Håvard
>>
>> On Sun, Sep 27, 2015 at 4:06 PM, Ted Yu  wrote:
>> > Have you verified that the files to be imported are in HFilev2 format ?
>> >
>> > http://hbase.apache.org/book.html#_hfile_tool
>> >
>> > Cheers
>> >
>> > On Sun, Sep 27, 2015 at 4:47 AM, Håvard Wahl Kongsgård
>> >  wrote:
>> >>
>> >> >Is the single node system secure ?
>> >>
>> >> No have not activated, just defaults
>> >>
>> >> the mapred conf.
>> >>
>> >> 
>> >>
>> >> 
>> >>
>> >>
>> >> 
>> >>
>> >>   
>> >>
>> >> mapred.job.tracker
>> >>
>> >> rack3:8021
>> >>
>> >>   
>> >>
>> >>
>> >>   
>> >>
>> >>   
>> >>
>> >> mapred.jobtracker.plugins
>> >>
>> >> org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin
>> >>
>> >> Comma-separated list of jobtracker plug-ins to be
>> >> activated.
>> >>
>> >> 
>> >>
>> >>   
>> >>
>> >>   
>> >>
>> >> jobtracker.thrift.address
>> >>
>> >> 0.0.0.0:9290
>> >>
>> >>   
>> >>
>> >> 
>> >>
>> >>
>> >> >>Have you checked hdfs healthiness ?
>> >>
>> >>
>> >> sudo -u hdfs hdfs dfsadmin -report
>> >>
>> >> Configured Capacity: 2876708585472 (2.62 TB)
>> >>
>> >> Present Capacity: 1991514849280 (1.81 TB)
>> >>
>> >> DFS Remaining: 1648230617088 (1.50 TB)
>> >>
>> >> DFS Used: 343284232192 (319.71 GB)
>> >>
>> >> DFS Used%: 17.24%
>> >>
>> >> Under replicated blocks: 52
>> >>
>> >> Blocks with corrupt replicas: 0
>> >>
>> >> Missing blocks: 0
>> >>
>> >>
>> >> -
>> >>
>> >> Datanodes available: 1 (1 total, 0 dead)
>> >>
>> >>
>> >> Live datanodes:
>> >>
>> >> Name: 127.0.0.1:50010 (localhost)
>> >>
>> >> Hostname: rack3
>> >>
>> >> Decommission Status : Normal
>> >>
>> >> Configured Capacity: 2876708585472 (2.62 TB)
>> >>
>> >> DFS Used: 343284232192 (319.71 GB)
>> >>
>> >> Non DFS Used: 885193736192 (824.40 GB)
>> >>
>> >> DFS Remaining: 1648230617088 (1.50 TB)
>> >>
>> >> DFS Used%: 11.93%
>> >>
>> >> DFS Remaining%: 57.30%
>> >>
>> >> Last contact: Sun Sep 27 13:44:45 CEST 2015
>> >>
>> >>
>> >> >>To which release of hbase were you importing ?
>> >>
>> &

Re: Error importing hbase table on new system

2015-09-27 Thread Håvard Wahl Kongsgård
Yes, I have tried to read them on another system as well. It worked
there. But I don't know if they are HFilev1 or HFilev2 format(any way
to check ?? )

This is the first lines from one of the files

SEQ1org.apache.hadoop.hbase.io.ImmutableBytesWritable%org.apache.hadoop.hbase.client.Result*org.apache.hadoop.io.compress.DefaultCodec���N��
$��a&t��wb%!10107712083-10152358443612846x���P]�6:�w|pw
��� �$��K 0� Npw�$x�@����s�;�Ɦ���V�^��ݽW����
�қ��<�/��0/�?'/�
�/�����7{kG[(������w��OY^I���}9��l��;�TJ�����J�‹pu���V�ӡm�\E@��V6�oe45U���,�3���Ͻ�w��O���zڼ�/��歇�KȦ/?��Y;�/�������}���룫-�'_�k���q�$��˨�����^
���i���tH$/��e.J��{S�\��S>Gd���1~p#��o����M
�!٠��;c��IkQ
�A)|d�i�(Z�f��oPb�j{��x����`�b���cbb`�"�}�HCG��&�JG�%��',*!!��
������&�_Q��R�2�1��_��~>:�b����w@�B�~Y�H�(�h/FR_+��nX`#�
|D����j��܏��f��ƨT��k/颚h��4`+Q#�ⵕ�,Z�80�V:�
)Y)4Lq��[�z#���T wrote:
> Have you verified that the files to be imported are in HFilev2 format ?
>
> http://hbase.apache.org/book.html#_hfile_tool
>
> Cheers
>
> On Sun, Sep 27, 2015 at 4:47 AM, Håvard Wahl Kongsgård
>  wrote:
>>
>> >Is the single node system secure ?
>>
>> No have not activated, just defaults
>>
>> the mapred conf.
>>
>> 
>>
>> 
>>
>>
>> 
>>
>>   
>>
>> mapred.job.tracker
>>
>> rack3:8021
>>
>>   
>>
>>
>>   
>>
>>   
>>
>> mapred.jobtracker.plugins
>>
>> org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin
>>
>> Comma-separated list of jobtracker plug-ins to be
>> activated.
>>
>> 
>>
>>   
>>
>>   
>>
>> jobtracker.thrift.address
>>
>> 0.0.0.0:9290
>>
>>   
>>
>> 
>>
>>
>> >>Have you checked hdfs healthiness ?
>>
>>
>> sudo -u hdfs hdfs dfsadmin -report
>>
>> Configured Capacity: 2876708585472 (2.62 TB)
>>
>> Present Capacity: 1991514849280 (1.81 TB)
>>
>> DFS Remaining: 1648230617088 (1.50 TB)
>>
>> DFS Used: 343284232192 (319.71 GB)
>>
>> DFS Used%: 17.24%
>>
>> Under replicated blocks: 52
>>
>> Blocks with corrupt replicas: 0
>>
>> Missing blocks: 0
>>
>>
>> -
>>
>> Datanodes available: 1 (1 total, 0 dead)
>>
>>
>> Live datanodes:
>>
>> Name: 127.0.0.1:50010 (localhost)
>>
>> Hostname: rack3
>>
>> Decommission Status : Normal
>>
>> Configured Capacity: 2876708585472 (2.62 TB)
>>
>> DFS Used: 343284232192 (319.71 GB)
>>
>> Non DFS Used: 885193736192 (824.40 GB)
>>
>> DFS Remaining: 1648230617088 (1.50 TB)
>>
>> DFS Used%: 11.93%
>>
>> DFS Remaining%: 57.30%
>>
>> Last contact: Sun Sep 27 13:44:45 CEST 2015
>>
>>
>> >>To which release of hbase were you importing ?
>>
>> Hbase 0.94 (CHD 4)
>>
>> the new one is CHD 5.4
>>
>> On Sun, Sep 27, 2015 at 1:32 PM, Ted Yu  wrote:
>> > Is the single node system secure ?
>> > Have you checked hdfs healthiness ?
>> > To which release of hbase were you importing ?
>> >
>> > Thanks
>> >
>> >> On Sep 27, 2015, at 3:06 AM, Håvard Wahl Kongsgård
>> >>  wrote:
>> >>
>> >> Hi, Iam trying to import a old backup to a new smaller system (just
>> >> single node, to get the data out)
>> >>
>> >> when I use
>> >>
>> >> sudo -u hbase hbase -Dhbase.import.version=0.94
>> >> org.apache.hadoop.hbase.mapreduce.Import crawler
>> >> /crawler_hbase/crawler
>> >>
>> >> I get this error in the tasks . Is this a permission problem?
>> >>
>> >>
>> >> 2015-09-26 23:56:32,995 ERROR
>> >> org.apache.hadoop.security.UserGroupInformation:
>> >> PriviledgedActionException as:mapred (auth:SIMPLE)
>> >> cause:java.io.IOException: keyvalues=NONE read 4096 bytes, should read
>> >> 14279
>> >> 2015-09-26 23:56:32,996 WARN org.apache.hadoop.mapred.Child: Error
>> >> running child
>> >> java.io.IOException: keyvalues=NONE read 4096 bytes, should read 14279
>> >> at
>> >> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2221)
>> >>

Re: Error importing hbase table on new system

2015-09-27 Thread Håvard Wahl Kongsgård
>Is the single node system secure ?

No have not activated, just defaults

the mapred conf.








  

mapred.job.tracker

rack3:8021

  


  

  

mapred.jobtracker.plugins

org.apache.hadoop.thriftfs.ThriftJobTrackerPlugin

Comma-separated list of jobtracker plug-ins to be activated.



  

  

jobtracker.thrift.address

0.0.0.0:9290

  




>>Have you checked hdfs healthiness ?


sudo -u hdfs hdfs dfsadmin -report

Configured Capacity: 2876708585472 (2.62 TB)

Present Capacity: 1991514849280 (1.81 TB)

DFS Remaining: 1648230617088 (1.50 TB)

DFS Used: 343284232192 (319.71 GB)

DFS Used%: 17.24%

Under replicated blocks: 52

Blocks with corrupt replicas: 0

Missing blocks: 0


-

Datanodes available: 1 (1 total, 0 dead)


Live datanodes:

Name: 127.0.0.1:50010 (localhost)

Hostname: rack3

Decommission Status : Normal

Configured Capacity: 2876708585472 (2.62 TB)

DFS Used: 343284232192 (319.71 GB)

Non DFS Used: 885193736192 (824.40 GB)

DFS Remaining: 1648230617088 (1.50 TB)

DFS Used%: 11.93%

DFS Remaining%: 57.30%

Last contact: Sun Sep 27 13:44:45 CEST 2015


>>To which release of hbase were you importing ?

Hbase 0.94 (CHD 4)

the new one is CHD 5.4

On Sun, Sep 27, 2015 at 1:32 PM, Ted Yu  wrote:
> Is the single node system secure ?
> Have you checked hdfs healthiness ?
> To which release of hbase were you importing ?
>
> Thanks
>
>> On Sep 27, 2015, at 3:06 AM, Håvard Wahl Kongsgård 
>>  wrote:
>>
>> Hi, Iam trying to import a old backup to a new smaller system (just
>> single node, to get the data out)
>>
>> when I use
>>
>> sudo -u hbase hbase -Dhbase.import.version=0.94
>> org.apache.hadoop.hbase.mapreduce.Import crawler
>> /crawler_hbase/crawler
>>
>> I get this error in the tasks . Is this a permission problem?
>>
>>
>> 2015-09-26 23:56:32,995 ERROR
>> org.apache.hadoop.security.UserGroupInformation:
>> PriviledgedActionException as:mapred (auth:SIMPLE)
>> cause:java.io.IOException: keyvalues=NONE read 4096 bytes, should read
>> 14279
>> 2015-09-26 23:56:32,996 WARN org.apache.hadoop.mapred.Child: Error running 
>> child
>> java.io.IOException: keyvalues=NONE read 4096 bytes, should read 14279
>> at 
>> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2221)
>> at 
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:74)
>> at 
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
>> at 
>> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
>> at 
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>> at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>> at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> 2015-09-26 23:56:33,002 INFO org.apache.hadoop.mapred.Task: Runnning
>> cleanup for the task
>>
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> Data Scientist



-- 
Håvard Wahl Kongsgård
Data Scientist


Error importing hbase table on new system

2015-09-27 Thread Håvard Wahl Kongsgård
Hi, Iam trying to import a old backup to a new smaller system (just
single node, to get the data out)

when I use

sudo -u hbase hbase -Dhbase.import.version=0.94
org.apache.hadoop.hbase.mapreduce.Import crawler
/crawler_hbase/crawler

I get this error in the tasks . Is this a permission problem?


2015-09-26 23:56:32,995 ERROR
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:mapred (auth:SIMPLE)
cause:java.io.IOException: keyvalues=NONE read 4096 bytes, should read
14279
2015-09-26 23:56:32,996 WARN org.apache.hadoop.mapred.Child: Error running child
java.io.IOException: keyvalues=NONE read 4096 bytes, should read 14279
at 
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2221)
at 
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:74)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:483)
at 
org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:76)
at 
org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:85)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:139)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
2015-09-26 23:56:33,002 INFO org.apache.hadoop.mapred.Task: Runnning
cleanup for the task



-- 
Håvard Wahl Kongsgård
Data Scientist


Re: How can I add a new hard disk in an existing HDFS cluster?

2013-05-03 Thread Håvard Wahl Kongsgård
go for ext3 or ext4


On Fri, May 3, 2013 at 8:32 AM, Joarder KAMAL  wrote:

> Hi,
>
>  I have a running HDFS cluster (Hadoop/HBase) consists of 4 nodes and the
> initial hard disk (/dev/vda1) size is 10G only. Now I have a second hard
> drive /dev/vdb of 60GB size and want to add it into my existing HDFS
> cluster. How can I format the new hard disk (and in which format? XFS?) and
> mount it to work with HDFS
>
> Default HDFS directory is situated in
> /usr/local/hadoop-1.0.4/hadoop-datastore
> And I followed this link for installation.
>
> http://ankitasblogger.blogspot.com.au/2011/01/hadoop-cluster-setup.html
>
> Many thanks in advance :)
>
>
> Regards,
> Joarder Kamal
>



-- 
Håvard Wahl Kongsgård
Data Scientist
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.dbkeeping.com/


Re: R environment with Hadoop

2013-04-13 Thread Håvard Wahl Kongsgård
Hi, simpler is always better ...

...for example if you use hadoop with java http://www.rforge.net/rJava/

... if you use hadoop with python(pydoop,dumbo)
http://rpy.sourceforge.net/rpy2/doc-2.0/html/index.html

On Wed, Apr 10, 2013 at 8:49 PM, Shah, Rahul1  wrote:
> Hi,
>
>
>
> I have to find out whether there is R environment that can be run on Hadoop.
> I see several packages of R and Hadoop. Any pointer which is good one to
> use. How can I learn R and start on with it.
>
>
>
> -Rahul
>
>



-- 
Håvard Wahl Kongsgård
Data Scientist
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.dbkeeping.com/


Re: Which hadoop installation should I use on ubuntu server?

2013-03-29 Thread Håvard Wahl Kongsgård
I recommend cloudera's CDH4 on ubuntu 12.04 LTS


On Thu, Mar 28, 2013 at 7:07 AM, David Parks  wrote:

> I’m moving off AWS MapReduce to our own cluster, I’m installing Hadoop on
> Ubuntu Server 12.10.
>
> ** **
>
> I see a .deb installer and installed that, but it seems like files are all
> over the place `/usr/share/Hadoop`, `/etc/hadoop`, `/usr/bin/hadoop`. And
> the documentation is a bit harder to follow:
>
> ** **
>
> http://hadoop.apache.org/docs/r1.1.2/cluster_setup.html
>
> ** **
>
> So I just wonder if this installer is the best approach, or if it’ll be
> easier/better to just install the basic build in /opt/hadoop and perhaps
> the docs become easier to follow. Thoughts?
>
> ** **
>
> Thanks,
>
> Dave
>
> ** **
>



-- 
Håvard Wahl Kongsgård
Data Scientist
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.dbkeeping.com/


Re: Hadoop cluster hangs on big hive job

2013-03-08 Thread Håvard Wahl Kongsgård
Dude I'am not going to read all you log files,

but try to run this as a normal map reduce job, it could be memory
related, something wrong with some of the zip files, wrong config
etc.

-Håvard

On Thu, Mar 7, 2013 at 8:53 PM, Daning Wang  wrote:
> We have hive query processing zipped csv files. the query was scanning for
> 10 days(partitioned by date). data for each day around 130G. The problem is
> not consistent since if you run it again, it might go through. but the
> problem has never happened on the smaller jobs(like processing only one days
> data).
>
> We don't have space issue.
>
> I have attached log file when problem happening. it is stuck like
> following(just search "19706 of 49964")
>
> 2013-03-05 15:13:51,587 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_19_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:51,811 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_39_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,551 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_32_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,760 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_00_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:52,946 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_24_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
> 2013-03-05 15:13:54,742 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_000008_0 0.131468% reduce > copy (19706 of 49964
> at 0.00 MB/s) >
>
> Thanks,
>
> Daning
>
>
> On Thu, Mar 7, 2013 at 12:21 AM, Håvard Wahl Kongsgård
>  wrote:
>>
>> hadoop logs?
>>
>> On 6. mars 2013 21:04, "Daning Wang"  wrote:
>>>
>>> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
>>> running big jobs. Basically all the nodes are dead, from that trasktracker's
>>> log looks it went into some kinds of loop forever.
>>>
>>> All the log entries like this when problem happened.
>>>
>>> Any idea how to debug the issue?
>>>
>>> Thanks in advance.
>>>
>>>
>>> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_12_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_28_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_36_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_16_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_19_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_39_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_32_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_00_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_24_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_08_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_39_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >
>>> 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker:
>>> attempt_201302270947_0010_r_04_0 0.131468% reduce > copy (19706 of 49964
>>> at 0.00 MB/s) >

Re: Hadoop cluster hangs on big hive job

2013-03-07 Thread Håvard Wahl Kongsgård
hadoop logs?
On 6. mars 2013 21:04, "Daning Wang"  wrote:

> We have 5 nodes cluster(Hadoop 1.0.4), It hung a couple of times while
> running big jobs. Basically all the nodes are dead, from that
> trasktracker's log looks it went into some kinds of loop forever.
>
> All the log entries like this when problem happened.
>
> Any idea how to debug the issue?
>
> Thanks in advance.
>
>
> 2013-03-05 15:13:19,526 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_12_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:19,552 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_28_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:20,858 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_36_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:21,141 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_16_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:21,486 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_19_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:21,692 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_39_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:22,448 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_32_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:22,643 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_00_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:22,840 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_24_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:24,628 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_08_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:24,723 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_39_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:25,336 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_04_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:25,539 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_43_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:25,545 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_12_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:25,569 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_28_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:25,855 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_24_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:26,876 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_36_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:27,159 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_16_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:27,505 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_19_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:28,464 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_32_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:28,553 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_43_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:28,561 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_12_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:28,659 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_00_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:30,519 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_19_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:30,644 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_08_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:30,741 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_39_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:31,369 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_04_0 0.131468% reduce > copy (19706 of
> 49964 at 0.00 MB/s) >
> 2013-03-05 15:13:31,675 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201302270947_0010_r_00_0 0.131468% reduce > copy 

Re: Skipping entire task

2013-01-06 Thread Håvard Wahl Kongsgård
Thanks, I was unaware of mapred.max.map.failures.percent

-Håvard

On Sun, Jan 6, 2013 at 3:46 PM, Harsh J  wrote:
> You can use the mapred.max.map.failures.percent and
> mapred.max.reduce.failures.percent features to control the percentage
> of allowed failures of tasks in a single job (despite which the job is
> marked successful).
>
> On Sun, Jan 6, 2013 at 8:04 PM, Håvard Wahl Kongsgård
>  wrote:
>>> Are tasks being executed multiple times due to failures? Sorry, it was not
>>> very clear from your question.
>>
>> yes, and I simply want to skip them if they fail more than x
>> times(after all this is big data :) ).
>>
>> -Håvard
>>
>> On Sun, Jan 6, 2013 at 3:01 PM, Hemanth Yamijala
>>  wrote:
>>> Hi,
>>>
>>> Are tasks being executed multiple times due to failures? Sorry, it was not
>>> very clear from your question.
>>>
>>> Thanks
>>> hemanth
>>>
>>>
>>> On Sat, Jan 5, 2013 at 7:44 PM, David Parks  wrote:
>>>>
>>>> Thinking here... if you submitted the task programmatically you should be
>>>> able to capture the failure of the task and gracefully move past it to
>>>> your
>>>> next tasks.
>>>>
>>>> To say it in a long-winded way:  Let's say you submit a job to Hadoop, a
>>>> java jar, and your main class implements Tool. That code has the
>>>> responsibility to submit a series of jobs to hadoop, something like this:
>>>>
>>>> try{
>>>>   Job myJob = new MyJob(getConf());
>>>>   myJob.submitAndWait();
>>>> }catch(Exception uhhohh){
>>>>   //Deal with the issue and move on
>>>> }
>>>> Job myNextJob = new MyNextJob(getConf());
>>>> myNextJob.submit();
>>>>
>>>> Just pseudo code there to demonstrate my thought.
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>> -Original Message-
>>>> From: Håvard Wahl Kongsgård [mailto:haavard.kongsga...@gmail.com]
>>>> Sent: Saturday, January 05, 2013 4:54 PM
>>>> To: user
>>>> Subject: Skipping entire task
>>>>
>>>> Hi, hadoop can skip bad records
>>>>
>>>> http://devblog.factual.com/practical-hadoop-streaming-dealing-with-brittle-c
>>>> ode.
>>>> But it is also possible to skip entire tasks?
>>>>
>>>> -Håvard
>>>>
>>>> --
>>>> Håvard Wahl Kongsgård
>>>> Faculty of Medicine &
>>>> Department of Mathematical Sciences
>>>> NTNU
>>>>
>>>> http://havard.security-review.net/
>>>>
>>>
>>
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> Faculty of Medicine &
>> Department of Mathematical Sciences
>> NTNU
>>
>> http://havard.security-review.net/
>
>
>
> --
> Harsh J



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Skipping entire task

2013-01-06 Thread Håvard Wahl Kongsgård
> Are tasks being executed multiple times due to failures? Sorry, it was not
> very clear from your question.

yes, and I simply want to skip them if they fail more than x
times(after all this is big data :) ).

-Håvard

On Sun, Jan 6, 2013 at 3:01 PM, Hemanth Yamijala
 wrote:
> Hi,
>
> Are tasks being executed multiple times due to failures? Sorry, it was not
> very clear from your question.
>
> Thanks
> hemanth
>
>
> On Sat, Jan 5, 2013 at 7:44 PM, David Parks  wrote:
>>
>> Thinking here... if you submitted the task programmatically you should be
>> able to capture the failure of the task and gracefully move past it to
>> your
>> next tasks.
>>
>> To say it in a long-winded way:  Let's say you submit a job to Hadoop, a
>> java jar, and your main class implements Tool. That code has the
>> responsibility to submit a series of jobs to hadoop, something like this:
>>
>> try{
>>   Job myJob = new MyJob(getConf());
>>   myJob.submitAndWait();
>> }catch(Exception uhhohh){
>>   //Deal with the issue and move on
>> }
>> Job myNextJob = new MyNextJob(getConf());
>> myNextJob.submit();
>>
>> Just pseudo code there to demonstrate my thought.
>>
>> David
>>
>>
>>
>> -Original Message-
>> From: Håvard Wahl Kongsgård [mailto:haavard.kongsga...@gmail.com]
>> Sent: Saturday, January 05, 2013 4:54 PM
>> To: user
>> Subject: Skipping entire task
>>
>> Hi, hadoop can skip bad records
>>
>> http://devblog.factual.com/practical-hadoop-streaming-dealing-with-brittle-c
>> ode.
>> But it is also possible to skip entire tasks?
>>
>> -Håvard
>>
>> --
>> Håvard Wahl Kongsgård
>> Faculty of Medicine &
>> Department of Mathematical Sciences
>> NTNU
>>
>> http://havard.security-review.net/
>>
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Skipping entire task

2013-01-05 Thread Håvard Wahl Kongsgård
yes, but I use pydoop not the native java library. The problem is that
the same task times, so a solution is not that straightforward. And
Pydoop does not seem to have any methods to inform the task how many
times it has failed. So if there is no native method in hadoop, I
could use a database or something for that purpose. Any other ideas?

-Håvard

On Sat, Jan 5, 2013 at 3:14 PM, David Parks  wrote:
> Thinking here... if you submitted the task programmatically you should be
> able to capture the failure of the task and gracefully move past it to your
> next tasks.
>
> To say it in a long-winded way:  Let's say you submit a job to Hadoop, a
> java jar, and your main class implements Tool. That code has the
> responsibility to submit a series of jobs to hadoop, something like this:
>
> try{
>   Job myJob = new MyJob(getConf());
>   myJob.submitAndWait();
> }catch(Exception uhhohh){
>   //Deal with the issue and move on
> }
> Job myNextJob = new MyNextJob(getConf());
> myNextJob.submit();
>
> Just pseudo code there to demonstrate my thought.
>
> David
>
>
>
> -Original Message-
> From: Håvard Wahl Kongsgård [mailto:haavard.kongsga...@gmail.com]
> Sent: Saturday, January 05, 2013 4:54 PM
> To: user
> Subject: Skipping entire task
>
> Hi, hadoop can skip bad records
> http://devblog.factual.com/practical-hadoop-streaming-dealing-with-brittle-c
> ode.
> But it is also possible to skip entire tasks?
>
> -Håvard
>
> --
> Håvard Wahl Kongsgård
> Faculty of Medicine &
> Department of Mathematical Sciences
> NTNU
>
> http://havard.security-review.net/
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Skipping entire task

2013-01-05 Thread Håvard Wahl Kongsgård
Hi, hadoop can skip bad records
http://devblog.factual.com/practical-hadoop-streaming-dealing-with-brittle-code.
But it is also possible to skip entire tasks?

-Håvard

-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Hadoop in Pseudo-Distributed mode on Mac OS X 10.8

2012-08-31 Thread Håvard Wahl Kongsgård
Pseudo-distributed mode is good for developing and testing hadoop
code. But instead of experimenting with hadoop on your mac, I would go
for hadoop on EC2. With starcluster http://web.mit.edu/star/cluster/
it takes just a single command to start hadoop.  You also get a fixed
environment.

-Håvard


On Mon, Aug 13, 2012 at 6:21 AM, Subho Banerjee  wrote:
> Hello,
>
> I am running hadoop v1.0.3 in Mac OS X 10.8 with Java_1.6.0_33-b03-424
>
>
> When running hadoop on pseudo-distributed mode, the map seems to work, but
> it cannot compute the reduce.
>
> 12/08/13 08:58:12 INFO mapred.JobClient: Running job: job_201208130857_0001
> 12/08/13 08:58:13 INFO mapred.JobClient: map 0% reduce 0%
> 12/08/13 08:58:27 INFO mapred.JobClient: map 20% reduce 0%
> 12/08/13 08:58:33 INFO mapred.JobClient: map 30% reduce 0%
> 12/08/13 08:58:36 INFO mapred.JobClient: map 40% reduce 0%
> 12/08/13 08:58:39 INFO mapred.JobClient: map 50% reduce 0%
> 12/08/13 08:58:42 INFO mapred.JobClient: map 60% reduce 0%
> 12/08/13 08:58:45 INFO mapred.JobClient: map 70% reduce 0%
> 12/08/13 08:58:48 INFO mapred.JobClient: map 80% reduce 0%
> 12/08/13 08:58:51 INFO mapred.JobClient: map 90% reduce 0%
> 12/08/13 08:58:54 INFO mapred.JobClient: map 100% reduce 0%
> 12/08/13 08:59:14 INFO mapred.JobClient: Task Id :
> attempt_201208130857_0001_m_00_0, Status : FAILED
> Too many fetch-failures
> 12/08/13 08:59:14 WARN mapred.JobClient: Error reading task outputServer
> returned HTTP response code: 403 for URL:
> http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stdout
> 12/08/13 08:59:14 WARN mapred.JobClient: Error reading task outputServer
> returned HTTP response code: 403 for URL:
> http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stderr
> 12/08/13 08:59:18 INFO mapred.JobClient: map 89% reduce 0%
> 12/08/13 08:59:21 INFO mapred.JobClient: map 100% reduce 0%
> 12/08/13 09:00:14 INFO mapred.JobClient: Task Id :
> attempt_201208130857_0001_m_01_0, Status : FAILED
> Too many fetch-failures
>
> Here is what I get when I try to see the tasklog using the links given in
> the output
>
> http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stderr
> --->
> 2012-08-13 08:58:39.189 java[74092:1203] Unable to load realm info from
> SCDynamicStore
>
> http://10.1.66.17:50060/tasklog?plaintext=true&attemptid=attempt_201208130857_0001_m_00_0&filter=stdout
> --->
>
> I have changed my hadoop-env.sh acoording to Mathew Buckett in
> https://issues.apache.org/jira/browse/HADOOP-7489
>
> Also this error of Unable to load realm info from SCDynamicStore does not
> show up when I do 'hadoop namenode -format' or 'start-all.sh'
>
> I am also attaching a zipped copy of my logs
>
>
> Cheers,
>
> Subho.



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: example usage of s3 file system

2012-08-29 Thread Håvard Wahl Kongsgård
see also

http://wiki.apache.org/hadoop/AmazonS3

On Tue, Aug 28, 2012 at 9:14 AM, Chris Collins
 wrote:
> Hi I am trying to use the Hadoop filesystem abstraction with S3 but in my 
> tinkering I am not having a great deal of success.  I am particularly 
> interested in the ability to mimic a directory structure (since s3 native 
> doesnt do it).
>
> Can anyone point me to some good example usage of Hadoop FileSystem with s3?
>
> I created a few directories using transit and AWS S3 console for test.  Doing 
> a liststatus of the bucket returns a FileStatus object of the directory 
> created but if I try to do a liststatus of that path I am getting a 404:
>
> org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: 
> Request Error. HEAD '/' on Host 
>
> Probably not the best list to look for help, any clues appreciated.
>
> C



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Using Multiple Input format with Hadoop Streaming

2012-08-25 Thread Håvard Wahl Kongsgård
dude preprocess the data!

-Håvard

On Sat, Aug 25, 2012 at 7:48 AM, Siddharth Tiwari
 wrote:
>
> How shall I give multple Inputsand keep multiple mappers in Streaming. How
> shall I map each output to specific mapper in Streaming.
>
> **
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of
> God.”
> "Maybe other people will try to limit me but I don't limit myself"



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: secondary namenode storage location

2012-08-24 Thread Håvard Wahl Kongsgård
https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster#CDH3DeploymentonaCluster-ConfiguringtheSecondaryNameNode

fs.checkpoint.dir/cache/hadoop/dfs

fs.checkpoint.edits.dir  /cache/hadoop/dfs


-Håvard



On Fri, Aug 24, 2012 at 4:20 PM, Abhay Ratnaparkhi
 wrote:
>  Hello Everyone,
>
> I have specified the secondary namenode is masters file.
> Which property is to be used to show a path used to store HDFS data by
> secondary node?
> What happened if I don't specify that property (or What is the default
> location secondary namenode uses)?
>
> Regards,
> Abhay
>
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: namenode not starting

2012-08-24 Thread Håvard Wahl Kongsgård
You should start with a reboot of the system.

A lesson to everyone, this is exactly why you should have a secondary
name node 
(http://wiki.apache.org/hadoop/FAQ#What_is_the_purpose_of_the_secondary_name-node.3F)
and run the namenode a mirrored RAID-5/10 disk.


-Håvard



On Fri, Aug 24, 2012 at 9:40 AM, Abhay Ratnaparkhi
 wrote:
> Hello,
>
> I was using cluster for long time and not formatted the namenode.
> I ran bin/stop-all.sh and bin/start-all.sh scripts only.
>
> I am using NFS for dfs.name.dir.
> hadoop.tmp.dir is a /tmp directory. I've not restarted the OS.  Any way to
> recover the data?
>
> Thanks,
> Abhay
>
>
> On Fri, Aug 24, 2012 at 1:01 PM, Bejoy KS  wrote:
>>
>> Hi Abhay
>>
>> What is the value for hadoop.tmp.dir or dfs.name.dir . If it was set to
>> /tmp the contents would be deleted on a OS restart. You need to change this
>> location before you start your NN.
>> Regards
>> Bejoy KS
>>
>> Sent from handheld, please excuse typos.
>> 
>> From: Abhay Ratnaparkhi 
>> Date: Fri, 24 Aug 2012 12:58:41 +0530
>> To: 
>> ReplyTo: user@hadoop.apache.org
>> Subject: namenode not starting
>>
>> Hello,
>>
>> I had a running hadoop cluster.
>> I restarted it and after that namenode is unable to start. I am getting
>> error saying that it's not formatted. :(
>> Is it possible to recover the data on HDFS?
>>
>> 2012-08-24 03:17:55,378 ERROR
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
>> initialization failed.
>> java.io.IOException: NameNode is not formatted.
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:434)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:270)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:271)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:303)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:433)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:421)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1359)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
>> 2012-08-24 03:17:55,380 ERROR
>> org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException:
>> NameNode is not formatted.
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:434)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:110)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:291)
>> at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:270)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:271)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:303)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:433)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:421)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1359)
>> at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
>>
>> Regards,
>> Abhay
>>
>>
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Install Hive and Pig

2012-08-24 Thread Håvard Wahl Kongsgård
you also have CDH3,

https://ccp.cloudera.com/display/CDHDOC/Pig+Installation

https://ccp.cloudera.com/display/CDHDOC/Hive+Installation

-Håvard

On Thu, Aug 23, 2012 at 5:57 PM, rajesh bathala
 wrote:
> Hi Friends,
>
> I am new to Hadoop. Can you please let us know how to install Hive and Pig?
>
> Thank you in advance.
>
> Thanks
> Rajesh
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Hadoop on EC2 Managing Internal/External IPs

2012-08-24 Thread Håvard Wahl Kongsgård
Hi, a vpn or simply first uploading the files to an ec2 node is the best option

but an alternative is to use the external interface/IP instead of the
internal in the hadoop config¸ I assume this will be slower and more
costly...

-Håvard

On Fri, Aug 24, 2012 at 4:54 AM, igor Finkelshteyn  wrote:
> I've seen a bunch of people with this exact same question all over Google 
> with no answers. I know people have successful non-temporary clusters in EC2. 
> Is there really no one that's needed to deal with having EC2 expose external 
> addresses instead of internal addresses before? This seems like it should be 
> a common thing.
>
> On Aug 23, 2012, at 12:34 PM, igor Finkelshteyn wrote:
>
>> Hi,
>> I'm currently setting up a Hadoop cluster on EC2, and everything works just 
>> fine when accessing the cluster from inside EC2, but as soon as I try to do 
>> something like upload a file from an external client, I get timeout errors 
>> like:
>>
>> 12/08/23 12:06:16 ERROR hdfs.DFSClient: Failed to close file 
>> /user/some_file._COPYING_
>> java.net.SocketTimeoutException: 65000 millis timeout while waiting for 
>> channel to be ready for connect. ch : 
>> java.nio.channels.SocketChannel[connection-pending remote=/10.123.x.x:50010]
>>
>> What's clearly happening is my NameNode is resolving my DataNode's IPs to 
>> their internal EC2 values instead of their external values, and then sending 
>> along the internal IP to my external client, which is obviously unable to 
>> reach those. I'm thinking this must be a common problem. How do other people 
>> deal with it? Is there a way to just force my name node to send along my 
>> DataNode's hostname instead of IP, so that the hostname can be resolved 
>> properly from whatever box will be sending files?
>>
>> Eli
>



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Reading multiple lines from a microsoft doc in hadoop

2012-08-24 Thread Håvard Wahl Kongsgård
Hi, maybe you should check out the old nutch project
http://nutch.apache.org/ (hadoop was developed for nutch).
It's a web crawler and indexer, but the malinglists hold much info
doc/pdf parsing which also relates to hadoop.

Have never parsed many docx or doc files, but it should be
strait-forward. But generally for text analysis preprocessing is the
KEY! For example replace dual lines \r\n\r\n or (\n\n) with  is a
simple trick)


-Håvard

On Fri, Aug 24, 2012 at 9:30 AM, Siddharth Tiwari
 wrote:
> Hi,
> Thank you for the suggestion. Actually I was using poi to extract text, but
> since now  I  have so many  documents I thought I will use hadoop directly
> to parse as well. Average size of each document is around 120 kb. Also I
> want to read multiple lines from the text until I find a blank line. I do
> not have any idea ankit how to design custom input format and record reader.
> Pleaser help with some tutorial tutorial, code or resource around it. I am
> struggling with the issue. I will be highly grateful. Thank you so much once
> again
>
>> Date: Fri, 24 Aug 2012 08:07:39 +0200
>> Subject: Re: Reading multiple lines from a microsoft doc in hadoop
>> From: haavard.kongsga...@gmail.com
>> To: user@hadoop.apache.org
>
>>
>> It's much easier if you convert the documents to text first
>>
>> use
>> http://tika.apache.org/
>>
>> or some other doc parser
>>
>>
>> -Håvard
>>
>> On Fri, Aug 24, 2012 at 7:52 AM, Siddharth Tiwari
>>  wrote:
>> > hi,
>> > I have doc files in msword doc and docx format. These have entries which
>> > are
>> > seperated by an empty line. Is it possible for me to read
>> > these lines separated from empty lines at a time. Also which inpurformat
>> > shall I use to read doc docx. Please help
>> >
>> > **
>> > Cheers !!!
>> > Siddharth Tiwari
>> > Have a refreshing day !!!
>> > "Every duty is holy, and devotion to duty is the highest form of worship
>> > of
>> > God.”
>> > "Maybe other people will try to limit me but I don't limit myself"
>>
>>
>>
>> --
>> Håvard Wahl Kongsgård
>> Faculty of Medicine &
>> Department of Mathematical Sciences
>> NTNU
>>
>> http://havard.security-review.net/



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: Reading multiple lines from a microsoft doc in hadoop

2012-08-23 Thread Håvard Wahl Kongsgård
It's much easier if you convert the documents to text first

use
http://tika.apache.org/

or some other doc parser


-Håvard

On Fri, Aug 24, 2012 at 7:52 AM, Siddharth Tiwari
 wrote:
> hi,
> I have doc files in msword doc and docx format. These have entries which are
> seperated by an empty line. Is it possible for me to read
> these lines separated from empty lines at a time. Also which inpurformat
> shall I use to read doc docx. Please help
>
> **
> Cheers !!!
> Siddharth Tiwari
> Have a refreshing day !!!
> "Every duty is holy, and devotion to duty is the highest form of worship of
> God.”
> "Maybe other people will try to limit me but I don't limit myself"



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


Re: pipes(pydoop) and hbase classpath

2012-08-15 Thread Håvard Wahl Kongsgård
however, when run hadoop pipes -conf myconf_job.conf -input
name_of_table -output /tmp/out

I don't get any error, hadoop just stalls with

12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.3.5-cdh3u4--1, built on 05/07/2012
21:08 GMT
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client environment:host.name=kongs1
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_31
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.31/jre
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.class.path=/usr/lib/hadoop-0.20/conf:/usr/lib/jvm/java-6-sun//lib/tools.jar:/usr/lib/hadoop-0.20:/usr/lib/hadoop-0.20/hadoop-core-0.20.2-cdh3u4.jar:/usr/lib/hadoop-0.20/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop-0.20/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop-0.20/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop-0.20/lib/commons-cli-1.2.jar:/usr/lib/hadoop-0.20/lib/commons-codec-1.4.jar:/usr/lib/hadoop-0.20/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop-0.20/lib/commons-el-1.0.jar:/usr/lib/hadoop-0.20/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop-0.20/lib/commons-lang-2.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop-0.20/lib/commons-net-3.1.jar:/usr/lib/hadoop-0.20/lib/core-3.1.1.jar:/usr/lib/hadoop-0.20/lib/guava-r09-jarjar.jar:/usr/lib/hadoop-0.20/lib/hadoop-fairscheduler-0.20.2-cdh3u4.jar:/usr/lib/hadoop-0.20/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop-0.20/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop-0.20/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop-0.20/lib/jets3t-0.6.1.jar:/usr/lib/hadoop-0.20/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-servlet-tester-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop-0.20/lib/jsch-0.1.42.jar:/usr/lib/hadoop-0.20/lib/junit-4.5.jar:/usr/lib/hadoop-0.20/lib/kfs-0.2.2.jar:/usr/lib/hadoop-0.20/lib/log4j-1.2.15.jar:/usr/lib/hadoop-0.20/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop-0.20/lib/oro-2.0.8.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop-0.20/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop-0.20/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop-0.20/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop-0.20/lib/xmlenc-0.52.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-2.1.jar:/usr/lib/hadoop-0.20/lib/jsp-2.1/jsp-api-2.1.jar:/usr/lib/hbase/hbase-0.90.6-cdh3u4.jar:/usr/lib/zookeeper/zookeeper-3.3.5-cdh3u4.jar
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/usr/lib/hadoop-0.20/lib/native/Linux-amd64-64
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/tmp
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:os.version=2.6.32-41-server
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client environment:user.name=hdfs
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:user.home=/usr/lib/hadoop-0.20
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/home/havard/d/graph
12/08/15 11:27:54 INFO zookeeper.ZooKeeper: Initiating client
connection, connectString=localhost:2181 sessionTimeout=18
watcher=hconnection
12/08/15 11:27:54 INFO zookeeper.ClientCnxn: Opening socket connection
to server localhost/127.0.0.1:2181
12/08/15 11:27:54 INFO zookeeper.ClientCnxn: Socket connection
established to localhost/127.0.0.1:2181, initiating session
12/08/15 11:27:54 INFO zookeeper.ClientCnxn: Session establishment
complete on server localhost/127.0.0.1:2181, sessionid =
0x139266be8b90004, negotiated timeout = 4


-Håvard


On Wed, Aug 15, 2012 at 10:01 AM, Håvard Wahl Kongsgård
 wrote:
> Hi, needed to add this as well
>
>
> 
> hbase.mapred.tablecolumns
> col_fam:name
> 
>
> -Håvard
>
>
> On Wed, Aug 15, 2012 at 9:42 AM, Håvard Wahl Kongsgård
>  wrote:
>> Hi, my job config is
>>
>> 
>> mapred.input.format.class
>> org.apache.hadoop.hbase.mapred.TableInputFormat
>> 
>>
>> 
>>   hadoop.pipes.java.recordreader
>>   true
>> 
>>
>>
>> Exception in thread "main" java.lang.RuntimeException: Error in
>> configuring object
>> at 
>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>> at 
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>> at 

Re: pipes(pydoop) and hbase classpath

2012-08-15 Thread Håvard Wahl Kongsgård
Hi, needed to add this as well



hbase.mapred.tablecolumns
col_fam:name


-Håvard


On Wed, Aug 15, 2012 at 9:42 AM, Håvard Wahl Kongsgård
 wrote:
> Hi, my job config is
>
> 
> mapred.input.format.class
> org.apache.hadoop.hbase.mapred.TableInputFormat
> 
>
> 
>   hadoop.pipes.java.recordreader
>   true
> 
>
>
> Exception in thread "main" java.lang.RuntimeException: Error in
> configuring object
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:596)
> at 
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:977)
> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:969)
> at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1248)
> at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)
> at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
> at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> ... 17 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
>
>
> should I included the col names? according to the api it's deprecated?
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableInputFormat.html
>
>
> -Håvard
>
>
> On Tue, Aug 14, 2012 at 11:17 PM, Harsh J  wrote:
>> Hi,
>>
>> Per:
>>
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat not
>> org.apache.hadoop.mapred.InputFormat
>>
>> Pydoop seems to be expecting you to pass it an old API class for
>> InputFormat/etc. but you've passed in the newer class. I am unsure
>> what part of your code exactly may be at fault since I do not have
>> access to it, but you probably want to use the deprecated
>> org.apache.hadoop.hbase.mapred.* package classes such as
>> org.apache.hadoop.hbase.mapred.TableInputFormat, and not the
>> org.apache.hadoop.hbase.mapreduce.* classes, as you are using at the
>> moment.
>>
>> HTH!
>>
>> On Wed, Aug 15, 2012 at 2:39 AM, Håvard Wahl Kongsgård
>>  wrote:
>>> Hi, I'am trying to read hbase key-values with pipes(pydoop). As hadoop
>>> is unable to find the hbase jar files. I get
>>>
>>> Exception in thread "main" java.lang.RuntimeException:
>>> java.lang.RuntimeException: class
>>> org.apache.hadoop.hbase.mapreduce.TableInputFormat not
>>> org.apache.hadoop.mapred.InputFormat
>>>
>>> have added export
>>> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-0.90.6-cdh3u4.jar to my
>>> hadoop-env.sh
>>>
>>> According to the doc from cloudera,
>>> https://ccp.cloudera.com/display/CDHDOC/HBase+Installation#HBaseInstallation-UsingMapReducewithHBase
>>> TableMapReduceUtil.addDependencyJars(job); can be used as an
>>> alternative. But is that possible with pipes?
>>>
>>> -Håvard
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Håvard Wahl Kongsgård
> Faculty of Medicine &
> Department of Mathematical Sciences
> NTNU
>
> http://havard.security-review.net/


Re: pipes(pydoop) and hbase classpath

2012-08-15 Thread Håvard Wahl Kongsgård
Hi, my job config is


mapred.input.format.class
org.apache.hadoop.hbase.mapred.TableInputFormat



  hadoop.pipes.java.recordreader
  true



Exception in thread "main" java.lang.RuntimeException: Error in
configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:596)
at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:977)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:969)
at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1248)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)


should I included the col names? according to the api it's deprecated?
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapred/TableInputFormat.html


-Håvard


On Tue, Aug 14, 2012 at 11:17 PM, Harsh J  wrote:
> Hi,
>
> Per:
>
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat not
> org.apache.hadoop.mapred.InputFormat
>
> Pydoop seems to be expecting you to pass it an old API class for
> InputFormat/etc. but you've passed in the newer class. I am unsure
> what part of your code exactly may be at fault since I do not have
> access to it, but you probably want to use the deprecated
> org.apache.hadoop.hbase.mapred.* package classes such as
> org.apache.hadoop.hbase.mapred.TableInputFormat, and not the
> org.apache.hadoop.hbase.mapreduce.* classes, as you are using at the
> moment.
>
> HTH!
>
> On Wed, Aug 15, 2012 at 2:39 AM, Håvard Wahl Kongsgård
>  wrote:
>> Hi, I'am trying to read hbase key-values with pipes(pydoop). As hadoop
>> is unable to find the hbase jar files. I get
>>
>> Exception in thread "main" java.lang.RuntimeException:
>> java.lang.RuntimeException: class
>> org.apache.hadoop.hbase.mapreduce.TableInputFormat not
>> org.apache.hadoop.mapred.InputFormat
>>
>> have added export
>> HADOOP_CLASSPATH=/usr/lib/hbase/hbase-0.90.6-cdh3u4.jar to my
>> hadoop-env.sh
>>
>> According to the doc from cloudera,
>> https://ccp.cloudera.com/display/CDHDOC/HBase+Installation#HBaseInstallation-UsingMapReducewithHBase
>> TableMapReduceUtil.addDependencyJars(job); can be used as an
>> alternative. But is that possible with pipes?
>>
>> -Håvard
>
>
>
> --
> Harsh J



-- 
Håvard Wahl Kongsgård
Faculty of Medicine &
Department of Mathematical Sciences
NTNU

http://havard.security-review.net/


pipes(pydoop) and hbase classpath

2012-08-14 Thread Håvard Wahl Kongsgård
Hi, I'am trying to read hbase key-values with pipes(pydoop). As hadoop
is unable to find the hbase jar files. I get

Exception in thread "main" java.lang.RuntimeException:
java.lang.RuntimeException: class
org.apache.hadoop.hbase.mapreduce.TableInputFormat not
org.apache.hadoop.mapred.InputFormat

have added export
HADOOP_CLASSPATH=/usr/lib/hbase/hbase-0.90.6-cdh3u4.jar to my
hadoop-env.sh

According to the doc from cloudera,
https://ccp.cloudera.com/display/CDHDOC/HBase+Installation#HBaseInstallation-UsingMapReducewithHBase
TableMapReduceUtil.addDependencyJars(job); can be used as an
alternative. But is that possible with pipes?

-Håvard