Re: Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Jeff Hubbs

I'm not sure if I get the point of so doing, though.

With replication set to the default of three, your only-1-GB file will 
get cut up into a mere 24 blocks spread among some number of your worker 
nodes. The "multithreaded" comes in when the various worker nodes are 
reading these blocks at once.?? Your disk I/O is only going to be so fast 
no matter how many threads on a machine are trying to read from them; 
Hadoop gets you beyond that by parallelizing that I/O across multiple 
entire machines.


That being said, if all you're trafficking in are 1-GB data files, why 
are you even messing with Hadoop?


On 6/26/19 1:44 PM, Arpit Agarwal wrote:

HDFS reads blocks sequentially. We can implement a multi-threaded block reader 
in theory.



On Jun 26, 2019, at 5:05 AM, Daegyu Han  wrote:

Hi all,

Assuming HDFS has a 1GB file input.dat and a block size of 128MB.

Can the user read multithreaded when reading the input.dat file?

In other words, is not the block being read sequentially, but reading
multiple blocks at the same time?

If not, is it difficult to implement a multi-threaded block read?

Best Regards,
Daegyu

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Arpit Agarwal
Correct. The blocks will be read sequentially.


> On Jun 26, 2019, at 10:51 AM, Daegyu Han  wrote:
> 
> Thank you for your response.
> 
> Assuming HDFS blocks (blk1~blk8) for file input.dat are on the local data 
> node, 
> does the map task read these blocks sequentially when trying to read local 
> blocks?
> 
> 
> 2019년 6월 27일 (목) 02:45, Arpit Agarwal  >님이 작성:
> HDFS reads blocks sequentially. We can implement a multi-threaded block 
> reader in theory.
> 
> 
> > On Jun 26, 2019, at 5:05 AM, Daegyu Han  > > wrote:
> > 
> > Hi all,
> > 
> > Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
> > 
> > Can the user read multithreaded when reading the input.dat file?
> > 
> > In other words, is not the block being read sequentially, but reading
> > multiple blocks at the same time?
> > 
> > If not, is it difficult to implement a multi-threaded block read?
> > 
> > Best Regards,
> > Daegyu
> > 
> > -
> > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org 
> > 
> > For additional commands, e-mail: user-h...@hadoop.apache.org 
> > 
> > 
> 



Re: Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Daegyu Han
Thank you for your response.

Assuming HDFS blocks (blk1~blk8) for file input.dat are on the local data
node,
does the map task read these blocks sequentially when trying to read local
blocks?


2019년 6월 27일 (목) 02:45, Arpit Agarwal 님이 작성:

> HDFS reads blocks sequentially. We can implement a multi-threaded block
> reader in theory.
>
>
> > On Jun 26, 2019, at 5:05 AM, Daegyu Han  wrote:
> >
> > Hi all,
> >
> > Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
> >
> > Can the user read multithreaded when reading the input.dat file?
> >
> > In other words, is not the block being read sequentially, but reading
> > multiple blocks at the same time?
> >
> > If not, is it difficult to implement a multi-threaded block read?
> >
> > Best Regards,
> > Daegyu
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> > For additional commands, e-mail: user-h...@hadoop.apache.org
> >
>
>


Re: Does HDFS read blocks simultaneously in multi-threaded way?

2019-06-26 Thread Arpit Agarwal
HDFS reads blocks sequentially. We can implement a multi-threaded block reader 
in theory.


> On Jun 26, 2019, at 5:05 AM, Daegyu Han  wrote:
> 
> Hi all,
> 
> Assuming HDFS has a 1GB file input.dat and a block size of 128MB.
> 
> Can the user read multithreaded when reading the input.dat file?
> 
> In other words, is not the block being read sequentially, but reading
> multiple blocks at the same time?
> 
> If not, is it difficult to implement a multi-threaded block read?
> 
> Best Regards,
> Daegyu
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: user-h...@hadoop.apache.org
> 


-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org