Datanode error

2012-07-20 Thread Pablo Musa
Hey guys,
I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
However my datanodes keep having the same errors, over and over.

I googled the problems and tried different flags (ex: 
-XX:MaxDirectMemorySize=2G)
and different configs (xceivers=8192) but could not solve it.

Does anyone know what is the problem and how can I solve it? (the stacktrace is 
at the end)

I am running:
Java 1.7
Hadoop 0.20.2
Hbase 0.90.6
Zoo 3.3.5

% top -> shows low load average (6% most of the time up to 60%), already 
considering the number of cpus
% vmstat -> shows no swap at all
% sar -> shows 75% idle cpu in the worst case

Hope you guys can help me.
Thanks in advance,
Pablo

2012-07-20 00:03:44,455 INFO 
org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /DN01:50010, 
dest:
/DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
blk_914960691839012728_14061688, duration:
480061254006
2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(DN01:50010, 
storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to 
/DN01:
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/DN01:50010 remote=/DN01:43516]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

2012-07-20 00:03:44,455 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(DN01:50010, 
storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/DN01:50010 remote=/DN01:43516]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)

2012-07-20 00:12:11,949 INFO 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded 
for blk_4602445008578088178_5707787
2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
writeBlock blk_-8916344806514717841_14081066 received exception 
java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/DN01:36634 remote=/DN03:50010]
2012-07-20 00:12:11,962 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(DN01:50010, 
storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 63000 millis timeout while waiting for channel 
to be ready for read. ch : java.nio.channels.SocketChannel[connected 
local=/DN01:36634 remote=/DN03:50010]
at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:83)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:447)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:183)


2012-07-20 00:12:20,670 INFO 
org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded 
for blk_7238561256016868237_3555939
2012-07-20 00:12:22,541 INFO org.apache.hadoop.hdfs.server.datanode.Dat

Re: Datanode error

2012-07-20 Thread anil gupta
Hi Pablo,

Are you sure that Hadoop 0.20.2 is supported on Java 1.7? (AFAIK it's Java
1.6)

Thanks,
Anil

On Fri, Jul 20, 2012 at 6:07 AM, Pablo Musa  wrote:

> Hey guys,
> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and
> working.
> However my datanodes keep having the same errors, over and over.
>
> I googled the problems and tried different flags (ex:
> -XX:MaxDirectMemorySize=2G)
> and different configs (xceivers=8192) but could not solve it.
>
> Does anyone know what is the problem and how can I solve it? (the
> stacktrace is at the end)
>
> I am running:
> Java 1.7
> Hadoop 0.20.2
> Hbase 0.90.6
> Zoo 3.3.5
>
> % top -> shows low load average (6% most of the time up to 60%), already
> considering the number of cpus
> % vmstat -> shows no swap at all
> % sar -> shows 75% idle cpu in the worst case
>
> Hope you guys can help me.
> Thanks in advance,
> Pablo
>
> 2012-07-20 00:03:44,455 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /DN01:50010, dest:
> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID:
> DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544,
> srvID: DS-798921853-DN01-50010-1328651609047, blockid:
> blk_914960691839012728_14061688, duration:
> 480061254006
> 2012-07-20 00:03:44,455 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(DN01:50010,
> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075,
> ipcPort=50020):Got exception while serving blk_914960691839012728_14061688
> to /DN01:
> java.net.SocketTimeoutException: 48 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/DN01:50010
> remote=/DN01:43516]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)
>
> 2012-07-20 00:03:44,455 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(DN01:50010,
> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for
> channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/DN01:50010
> remote=/DN01:43516]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)
>
> 2012-07-20 00:12:11,949 INFO
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification
> succeeded for blk_4602445008578088178_5707787
> 2012-07-20 00:12:11,962 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-8916344806514717841_14081066 received exception
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/DN01:36634
> remote=/DN03:50010]
> 2012-07-20 00:12:11,962 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(DN01:50010,
> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/DN01:36634
> remote=/DN03:50010]
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:116)
> at java.io.FilterInputStream.read(FilterInputStream.java:83)
> at java.io.DataInputStream.readShort(DataInputStream.java:312)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock

Re: Datanode error

2012-07-20 Thread Harsh J
Pablo,

These all seem to be timeouts from clients when they wish to read a
block and drops from clients when they try to write a block. I
wouldn't think of them as critical errors. Aside of being worried that
a DN is logging these, are you noticing any usability issue in your
cluster? If not, I'd simply blame this on stuff like speculative
tasks, region servers, general HDFS client misbehavior, etc.

Please do also share if you're seeing an issue that you think is
related to these log messages.

On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa  wrote:
> Hey guys,
> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and working.
> However my datanodes keep having the same errors, over and over.
>
> I googled the problems and tried different flags (ex: 
> -XX:MaxDirectMemorySize=2G)
> and different configs (xceivers=8192) but could not solve it.
>
> Does anyone know what is the problem and how can I solve it? (the stacktrace 
> is at the end)
>
> I am running:
> Java 1.7
> Hadoop 0.20.2
> Hbase 0.90.6
> Zoo 3.3.5
>
> % top -> shows low load average (6% most of the time up to 60%), already 
> considering the number of cpus
> % vmstat -> shows no swap at all
> % sar -> shows 75% idle cpu in the worst case
>
> Hope you guys can help me.
> Thanks in advance,
> Pablo
>
> 2012-07-20 00:03:44,455 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
> /DN01:50010, dest:
> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
> DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
> srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
> blk_914960691839012728_14061688, duration:
> 480061254006
> 2012-07-20 00:03:44,455 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(DN01:50010, 
> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
> ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 to 
> /DN01:
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/DN01:50010 remote=/DN01:43516]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)
>
> 2012-07-20 00:03:44,455 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(DN01:50010, 
> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
> channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
> local=/DN01:50010 remote=/DN01:43516]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
> at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
> at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
> at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)
>
> 2012-07-20 00:12:11,949 INFO 
> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
> succeeded for blk_4602445008578088178_5707787
> 2012-07-20 00:12:11,962 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> writeBlock blk_-8916344806514717841_14081066 received exception 
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DN01:36634 remote=/DN03:50010]
> 2012-07-20 00:12:11,962 ERROR 
> org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(DN01:50010, 
> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
> ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/DN01:36634 remote=/DN03:50010]
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at 
> org.apache.hadoop.net.SocketInputStream.

Re: Datanode error

2012-07-20 Thread Raj Vishwanathan
Could also be due to network issues. Number of sockets could be less or number 
of threads could be less.

Raj



>
> From: Harsh J 
>To: common-user@hadoop.apache.org 
>Sent: Friday, July 20, 2012 9:06 AM
>Subject: Re: Datanode error
> 
>Pablo,
>
>These all seem to be timeouts from clients when they wish to read a
>block and drops from clients when they try to write a block. I
>wouldn't think of them as critical errors. Aside of being worried that
>a DN is logging these, are you noticing any usability issue in your
>cluster? If not, I'd simply blame this on stuff like speculative
>tasks, region servers, general HDFS client misbehavior, etc.
>
>Please do also share if you're seeing an issue that you think is
>related to these log messages.
>
>On Fri, Jul 20, 2012 at 6:37 PM, Pablo Musa  wrote:
>> Hey guys,
>> I have a cluster with 11 nodes (1 NN and 10 DNs) which is running and 
>> working.
>> However my datanodes keep having the same errors, over and over.
>>
>> I googled the problems and tried different flags (ex: 
>> -XX:MaxDirectMemorySize=2G)
>> and different configs (xceivers=8192) but could not solve it.
>>
>> Does anyone know what is the problem and how can I solve it? (the stacktrace 
>> is at the end)
>>
>> I am running:
>> Java 1.7
>> Hadoop 0.20.2
>> Hbase 0.90.6
>> Zoo 3.3.5
>>
>> % top -> shows low load average (6% most of the time up to 60%), already 
>> considering the number of cpus
>> % vmstat -> shows no swap at all
>> % sar -> shows 75% idle cpu in the worst case
>>
>> Hope you guys can help me.
>> Thanks in advance,
>> Pablo
>>
>> 2012-07-20 00:03:44,455 INFO 
>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: 
>> /DN01:50010, dest:
>> /DN01:43516, bytes: 396288, op: HDFS_READ, cliID: 
>> DFSClient_hb_rs_DN01,60020,1342734302945_1342734303427, offset: 54956544, 
>> srvID: DS-798921853-DN01-50010-1328651609047, blockid: 
>> blk_914960691839012728_14061688, duration:
>> 480061254006
>> 2012-07-20 00:03:44,455 WARN 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>> DatanodeRegistration(DN01:50010, 
>> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
>> ipcPort=50020):Got exception while serving blk_914960691839012728_14061688 
>> to /DN01:
>> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
>> channel to be ready for write. ch : 
>> java.nio.channels.SocketChannel[connected local=/DN01:50010 
>> remote=/DN01:43516]
>>         at 
>>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)
>>
>> 2012-07-20 00:03:44,455 ERROR 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>> DatanodeRegistration(DN01:50010, 
>> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=50075, 
>> ipcPort=50020):DataXceiver
>> java.net.SocketTimeoutException: 48 millis timeout while waiting for 
>> channel to be ready for write. ch : 
>> java.nio.channels.SocketChannel[connected local=/DN01:50010 
>> remote=/DN01:43516]
>>         at 
>>org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>         at 
>>org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:397)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:493)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:279)
>>         at 
>>org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:175)
>>
>> 2012-07-20 00:12:11,949 INFO 
>> org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification 
>> succeeded for blk_4602445008578088178_5707787
>> 2012-07-20 00:12:11,962 INFO 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock 
>> blk_-8916344806514717841_14081066 received exception 
>> java.net.SocketTimeoutException: 63000 millis timeout while waiting for 
>> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
>> local=/DN01:36634 remote=/DN03:50010]
>> 2012-07-20 00:12:11,962 ERROR 
>> org.apache.hadoop.hdfs.server.datanode.DataNode: 
>> DatanodeRegistration(DN01:50010, 
>> storageID=DS-798921853-DN01-50010-1328651609047, infoPort=

fail and kill all tasks without killing job.

2012-07-20 Thread jay vyas
Hi guys : I want my tasks to end/fail, but I don't want to kill my 
entire hadoop job.


I have a hadoop job that runs 5 hadoop jobs in a row.
Im on the last of those sub-jobs, and want to fail all tasks so that the 
task tracker stops delegating them,

and the hadoop main job can naturally come to a close.

However, when I run "hadoop job kill-attempt / fail-attempt ", the 
jobtracker seems to simply relaunch

the same tasks with new ids.

How can I tell the jobtracker to give up on redelegating?


hdfs inode count

2012-07-20 Thread Stan Rosenberg
Hi,

Current filesystem api does not expose inode count.   Why?
FileSystem.listStatus(root) incurs too much overhead if all one needs
is a count of the number of
children.  INodeDirectory has the list of children, so in theory it's
trivial to return the count.

Thoughts/suggestions are welcome.

Thanks,

stan


Re: Avro vs Protocol Buffer

2012-07-20 Thread Edward Capriolo
We just open sourced our protobuf support for Hive. We built it out
because in our line of work protobuf is very common and it gave us the
ability to log protobufs directly to files and then query them.

https://github.com/edwardcapriolo/hive-protobuf

I did not do any heavy benchmarking vs avro. However I did a few
things, sorry that I do not have exact numbers here.

A compresses SequenceFile of Text verses a sequence file of protobufs
is maybe 5-10 percent smaller depending on the data. That is pretty
good compression, so space wise your are not hurting there.

Speed wise I have to do some more analysis. Our input format is doing
reflection so that will have its cost (although we tried to cache
things where possible) protobuf has some DynamicObject components
which I need to explore to possibly avoid reflection. also you have to
consider that protobuf's do more (then TextinputFormat) like validate
data, so if you comparing raw speed you have to watch out for apples
to oranges type stuff.

I never put our ProtoBuf format head to head with the AvroFormat.
Generally I hate those type of benchmarks but I would be curious to
know.

Overall if you have no global serialization format (company wide) you
have to look at what tools you have and what they support. Aka Hive
has avro and protobuf, but maybe pig only has one of the other. Are
you using sqoop? and can it output files in the format that you want?
Are you using a language like Ruby and what support do you have there.

In my mind speed is important but compatibility is more so, for
example, even if reading avro was 2 times slower then reading thrift
(which it is not),your jobs might doing some very complex logic with a
long shuffle sort and reduce phase. Then the performance of physically
reading the file is not as important as it may seem.

On Thu, Jul 19, 2012 at 12:34 PM, Harsh J  wrote:
> +1 to what Bruno's pointed you at. I personally like Avro for its data
> files (schema's stored on file, and a good, splittable container for
> typed data records). I think speed for serde is on-par with Thrift, if
> not faster today. Thrift offers no optimized data container format
> AFAIK.
>
> On Thu, Jul 19, 2012 at 1:57 PM, Bruno Freudensprung
>  wrote:
>> Once new results will be available, you might be interested in:
>> https://github.com/eishay/jvm-serializers/wiki/
>> https://github.com/eishay/jvm-serializers/wiki/Staging-Results
>>
>> My2cts,
>>
>> Bruno.
>>
>> Le 16/07/2012 22:49, Mike S a écrit :
>>
>>> Strictly from speed and performance perspective, is Avro as fast as
>>> protocol buffer?
>>>
>>
>
>
>
> --
> Harsh J


UNSUBCRIBE

2012-07-20 Thread Rork, Michael




-Original Message-
From: Edward Capriolo 
Reply-To: "common-user@hadoop.apache.org" 
Date: Friday, July 20, 2012 6:03 PM
To: "common-user@hadoop.apache.org" 
Subject: Re: Avro vs Protocol Buffer

>We just open sourced our protobuf support for Hive. We built it out
>because in our line of work protobuf is very common and it gave us the
>ability to log protobufs directly to files and then query them.
>
>https://github.com/edwardcapriolo/hive-protobuf
>
>I did not do any heavy benchmarking vs avro. However I did a few
>things, sorry that I do not have exact numbers here.
>
>A compresses SequenceFile of Text verses a sequence file of protobufs
>is maybe 5-10 percent smaller depending on the data. That is pretty
>good compression, so space wise your are not hurting there.
>
>Speed wise I have to do some more analysis. Our input format is doing
>reflection so that will have its cost (although we tried to cache
>things where possible) protobuf has some DynamicObject components
>which I need to explore to possibly avoid reflection. also you have to
>consider that protobuf's do more (then TextinputFormat) like validate
>data, so if you comparing raw speed you have to watch out for apples
>to oranges type stuff.
>
>I never put our ProtoBuf format head to head with the AvroFormat.
>Generally I hate those type of benchmarks but I would be curious to
>know.
>
>Overall if you have no global serialization format (company wide) you
>have to look at what tools you have and what they support. Aka Hive
>has avro and protobuf, but maybe pig only has one of the other. Are
>you using sqoop? and can it output files in the format that you want?
>Are you using a language like Ruby and what support do you have there.
>
>In my mind speed is important but compatibility is more so, for
>example, even if reading avro was 2 times slower then reading thrift
>(which it is not),your jobs might doing some very complex logic with a
>long shuffle sort and reduce phase. Then the performance of physically
>reading the file is not as important as it may seem.
>
>On Thu, Jul 19, 2012 at 12:34 PM, Harsh J  wrote:
>> +1 to what Bruno's pointed you at. I personally like Avro for its data
>> files (schema's stored on file, and a good, splittable container for
>> typed data records). I think speed for serde is on-par with Thrift, if
>> not faster today. Thrift offers no optimized data container format
>> AFAIK.
>>
>> On Thu, Jul 19, 2012 at 1:57 PM, Bruno Freudensprung
>>  wrote:
>>> Once new results will be available, you might be interested in:
>>> https://github.com/eishay/jvm-serializers/wiki/
>>> https://github.com/eishay/jvm-serializers/wiki/Staging-Results
>>>
>>> My2cts,
>>>
>>> Bruno.
>>>
>>> Le 16/07/2012 22:49, Mike S a écrit :
>>>
 Strictly from speed and performance perspective, is Avro as fast as
 protocol buffer?

>>>
>>
>>
>>
>> --
>> Harsh J



Re: fail and kill all tasks without killing job.

2012-07-20 Thread Bejoy KS
Hi Jay

Did you try
hadoop job -kill-task  ? And is that not working as desired? 

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: jay vyas 
Date: Fri, 20 Jul 2012 17:17:58 
To: common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: fail and kill all tasks without killing job.

Hi guys : I want my tasks to end/fail, but I don't want to kill my 
entire hadoop job.

I have a hadoop job that runs 5 hadoop jobs in a row.
Im on the last of those sub-jobs, and want to fail all tasks so that the 
task tracker stops delegating them,
and the hadoop main job can naturally come to a close.

However, when I run "hadoop job kill-attempt / fail-attempt ", the 
jobtracker seems to simply relaunch
the same tasks with new ids.

How can I tell the jobtracker to give up on redelegating?



web-tracker question

2012-07-20 Thread Keith Wiley
I'm curious about the relationship between the namenode/job/task trackers and 
the machine's web server?  Do the former require the latter?  Does successful 
connection to the trackers imply that the machine has a web server up and 
running?  I realize the ports are totally different (web is generally port 80), 
but the trackers are headed with the "http" URI, so I'm a little unsure if that 
means anything w.r.t. my question.  Can you run and access the trackers on a 
machine that doesn't have a webserver installed or running?

I know, it's a weird question.  Thanks for any quick response.

Thanks.


Keith Wiley kwi...@keithwiley.com keithwiley.commusic.keithwiley.com

"The easy confidence with which I know another man's religion is folly teaches
me to suspect that my own is also."
   --  Mark Twain




Re: web-tracker question

2012-07-20 Thread Al Thompson
Hi Keith:

On Fri, Jul 20, 2012 at 3:28 PM, Keith Wiley  wrote:

> I'm curious about the relationship between the namenode/job/task trackers
> and the machine's web server?  Do the former require the latter?


The hadoop daemons embed a jetty instance to serve their user interfaces
over http.
You will see the jetty instance come online in your logs with lines like
this:

2012-07-20 22:55:14,519 INFO org.apache.hadoop.http.HttpServer: Port
returned by webServer.getConnectors()[0].getLocalPort() before open() is
-1. Opening the listener on 50060
2012-07-20 22:55:14,519 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50060
webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening
the listener on 50060
2012-07-20 22:55:14,519 INFO org.apache.hadoop.http.HttpServer:
listener.getLocalPort() returned 50060
webServer.getConnectors()[0].getLocalPort() returned 50060
2012-07-20 22:55:14,519 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 50060

Does successful connection to the trackers imply that the machine has a web
> server up and running?


A tasktracker that is "up" should have it's web interface served by it's
embedded jetty instance.


> I realize the ports are totally different (web is generally port 80), but
> the trackers are headed with the "http" URI, so I'm a little unsure if that
> means anything w.r.t. my question.  Can you run and access the trackers on
> a machine that doesn't have a webserver installed or running?
>
>
The hadoop daemons do not require a web server to be running on any grid
nodes.


> I know, it's a weird question.  Thanks for any quick response.
>
>
HTH

Regards,
Al


Re: fail and kill all tasks without killing job.

2012-07-20 Thread JAX
I believe that kill-task simple kills the task, but then the same process (i.e. 
"task") starts, with a new id.

Jay Vyas 
MMSB
UCHC

On Jul 20, 2012, at 6:23 PM, "Bejoy KS"  wrote:

> Hi Jay
> 
> Did you try
> hadoop job -kill-task  ? And is that not working as desired? 
> 
> Regards
> Bejoy KS
> 
> Sent from handheld, please excuse typos.
> 
> -Original Message-
> From: jay vyas 
> Date: Fri, 20 Jul 2012 17:17:58 
> To: common-user@hadoop.apache.org
> Reply-To: common-user@hadoop.apache.org
> Subject: fail and kill all tasks without killing job.
> 
> Hi guys : I want my tasks to end/fail, but I don't want to kill my 
> entire hadoop job.
> 
> I have a hadoop job that runs 5 hadoop jobs in a row.
> Im on the last of those sub-jobs, and want to fail all tasks so that the 
> task tracker stops delegating them,
> and the hadoop main job can naturally come to a close.
> 
> However, when I run "hadoop job kill-attempt / fail-attempt ", the 
> jobtracker seems to simply relaunch
> the same tasks with new ids.
> 
> How can I tell the jobtracker to give up on redelegating?
> 


Re: fail and kill all tasks without killing job.

2012-07-20 Thread Harsh J
Hi Jay,

Fail a single task four times (default), and the job will be marked as
failed. Is that what you're looking for?

Or if you wanted your job to have succeeded even if not all tasks
succeeded, tweak the "mapred.max.map/reduce.failures.percent" property
in your job (by default it expects 0% failures, so set a number
between 0-1 that is acceptable for you).

To then avoid having to do it four times for a single task, lower
"mapred.map/reduce.max.attempts" down from its default of 4.

Does this answer your question?

On Sat, Jul 21, 2012 at 2:47 AM, jay vyas  wrote:
> Hi guys : I want my tasks to end/fail, but I don't want to kill my entire
> hadoop job.
>
> I have a hadoop job that runs 5 hadoop jobs in a row.
> Im on the last of those sub-jobs, and want to fail all tasks so that the
> task tracker stops delegating them,
> and the hadoop main job can naturally come to a close.
>
> However, when I run "hadoop job kill-attempt / fail-attempt ", the
> jobtracker seems to simply relaunch
> the same tasks with new ids.
>
> How can I tell the jobtracker to give up on redelegating?



-- 
Harsh J


Re: IOException: too many length or distance symbols

2012-07-20 Thread Harsh J
Prashant,

Can you add in some context on how these files were written, etc.?
Perhaps open a JIRA with a sample file and test-case to reproduce
this? Other env stuff with info on version of hadoop, etc. would help
too.

On Sat, Jul 21, 2012 at 2:05 AM, Prashant Kommireddi
 wrote:
> I am seeing these exceptions, anyone know what they might be caused due to?
> Case of corrupt file?
>
> java.io.IOException: too many length or distance symbols
> at 
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.inflateBytesDirect(Native
> Method)
> at 
> org.apache.hadoop.io.compress.zlib.ZlibDecompressor.decompress(ZlibDecompressor.java:221)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:80)
> at 
> org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
> at java.io.InputStream.read(InputStream.java:85)
> at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordReader.java:97)
> at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
> at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>
> Thanks,
> Prashant



-- 
Harsh J