Custom FileInputFormat.class

2014-12-01 Thread 胡斐
Hi,

I want to custom FileInputFormat.class. In order to determine which host
the specific part of a file belongs to, I need to open the file in HDFS and
read some information. It will take me nearly 500ms to open a file and get
the information I need. But now I have thousands of files to deal with, so
it would be a long time if I deal with all of them as the above.

Is there any better solution to reduce the time when the number of files is
large?

Thanks in advance!
Fei


Re: Custom FileInputFormat.class

2014-12-01 Thread Pradeep Gollakota
Can you expand on your use case a little bit please? It may be that you're
duplicating functionality.

You can take a look at the CombineFileInputFormat for inspiration. If this
is indeed taking a long time, one cheap to implement thing you can do is to
parallelize the calls to get block locations.

Another question to ask yourself is whether it is worth it to optimize this
portion. In many use cases, (certainly mine), the bottleneck is the running
job itself. So the launch overhead is comparatively minimal.

Hope this helps.
Pradeep

On Mon Dec 01 2014 at 8:38:30 AM 胡斐 hufe...@gmail.com wrote:

 Hi,

 I want to custom FileInputFormat.class. In order to determine which host
 the specific part of a file belongs to, I need to open the file in HDFS and
 read some information. It will take me nearly 500ms to open a file and get
 the information I need. But now I have thousands of files to deal with, so
 it would be a long time if I deal with all of them as the above.

 Is there any better solution to reduce the time when the number of files
 is large?

 Thanks in advance!
 Fei




Errors with Checkpoint NameNode

2014-12-01 Thread Long Jin
Hi,

I want to setup 2-node HDFS: a master NameNode and a Checkpoint
NameNode. My configuration is pretty standard (mostly the default).

//on the 1st machine
#hdfs namenode

//on the 2nd machine
#hdfs namenode -checkpoint

After the startup, the Checkpoint NameNode repeatedly print out the
following errors (I've got thousands of the same error)

14/12/01 17:21:59 ERROR namenode.FSNamesystem: Swallowing exception in
NameNodeEditLogRoller:
java.lang.IllegalStateException: Bad state: BETWEEN_LOG_SEGMENTS
at com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.getCurSegmentTxId(FSEditLog.java:493)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeEditLogRoller.run(FSNamesystem.java:4366)
at java.lang.Thread.run(Thread.java:745)

Could anyone tell me what did I do wrongly, or pointed me some
direction to look at?

Thanks a lot!
Long


Re: Namenode HA failover time

2014-12-01 Thread Lixiang Ao
I am curious about this, too.

On Sat, Nov 29, 2014 at 2:35 PM, Alice 6900848...@gmail.com wrote:

 Hi,all:

 Namenode HA (NFS, QJM) is available in hadoop 2.x (HDFS-1623). It provides
 fast failover for Namenode, but I can't find any description on how long
 does it take to recover from failure.

 Could any one tell me?

 Thanks.