[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-09-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741935#comment-14741935
 ] 

Duo Zhang commented on HDFS-7966:
-

No, the testcase uses multiple connections... But yes, this is not a typical 
usage in real world. Let me try to deploy an HBase on top of HDFS and run YCSB 
to collect some performance data. Thanks.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-09-11 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741145#comment-14741145
 ] 

Haohui Mai commented on HDFS-7966:
--

I never understand the the performance numbers. What does 4% means in the data? 
Do you repeat the experiment multiple times to get a 95% confidence intervals?

Can you please explain them a little bit more? A chart would definitely help.

It looks to me that the test case is only having a single connection performing 
reads for a single block? In production it is there are will be tens of 
thousands on concurrent reads. Does the thread pool help? How does the current 
implementation look like in this scenario?

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-09-09 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736658#comment-14736658
 ] 

Duo Zhang commented on HDFS-7966:
-

Netty-4.1.0Beta6 is out so I'm back. I have added a simple {{asyncRead}} 
method(not fully asynchronous since this is only a POC) to {{DFSInputStream}} 
and write a performance test for it. Here is the test result(two times for 
every test)

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest async /test 100 
5 4096 // 100 here means max concurrency which used to prevent OOM.
*** time based on http2 230946
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest async /test 100 
5 4096
*** time based on http2 231066

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
5 4096 pread
*** time based on tcp 231410
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
5 4096 pread
*** time based on tcp 231038

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
5 4096 pread
*** time based on http2 236069
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
5 4096 pread
*** time based on http2 231773
{noformat}

The performance difference is ~±4% and async is a little better than tcp.

Thanks.

> New Data Transfer Protocol via HTTP/2
> -
>
> Key: HDFS-7966
> URL: https://issues.apache.org/jira/browse/HDFS-7966
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Haohui Mai
>Assignee: Qianqian Shi
>  Labels: gsoc, gsoc2015, mentor
> Attachments: GSoC2015_Proposal.pdf, 
> TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
> TestHttp2ReadBlockInsideEventLoop.svg
>
>
> The current Data Transfer Protocol (DTP) implements a rich set of features 
> that span across multiple layers, including:
> * Connection pooling and authentication (session layer)
> * Encryption (presentation layer)
> * Data writing pipeline (application layer)
> All these features are HDFS-specific and defined by implementation. As a 
> result it requires non-trivial amount of work to implement HDFS clients and 
> servers.
> This jira explores to delegate the responsibilities of the session and 
> presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
> connection multiplexing, QoS, authentication and encryption, reducing the 
> scope of DTP to the application layer only. By leveraging the existing HTTP/2 
> library, it should simplify the implementation of both HDFS clients and 
> servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652579#comment-14652579
 ] 

Haohui Mai commented on HDFS-7966:
--

bq. What's the upside of this new implementation? 

Performance is definitely one important factor. One of the motivation is to 
improve the efficiency of DN when there are hundreds of thousands of reads by 
reducing the overhead of context switches. [~Apache9], do you have any 
performance numbers on this scenario?

HTTP/2-based DTP also serves as a building block of the next-level of 
innovation, just to quote the description in the jira:

{quote}
This jira explores to delegate the responsibilities of the session and 
presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
connection multiplexing, QoS, authentication and encryption, reducing the scope 
of DTP to the application layer only. By leveraging the existing HTTP/2 
library, it should simplify the implementation of both HDFS clients and servers.
{quote}

bq. If it were the same performance but had other redeeming qualities (e.g. 
less code) then it's still worth consideration.

This is designed to be a new code path so that it is compatible with older 
releases. You can still rely on the old DTP protocol depending on the 
application scenario.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652698#comment-14652698
 ] 

Duo Zhang commented on HDFS-7966:
-

I do not have enough machines to test the scenario... What I see if I create 
lots of thread to read from datanode concurrently is that HTTP/2 will start the 
request almost at the same time, but TCP will start the request one by 
one(maybe tens by tens where the number is cpu count). So there won't be a 
situation that DN really handle lots of concurrent read from client, and the 
context switch maybe small than HTTP/2 implementation since we also have a 
ThreadPool besides EventLoopGroup in HTTP/2 connection. And what make things 
worse is that our client is not event driven so we can not reduce the thread 
count of client...
Let me see if I can make a scenario that HTTP/2 fast than TCP...
Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652657#comment-14652657
 ] 

Andrew Wang commented on HDFS-7966:
---

Agree there potentially are performance advantages, but it looks like all the 
benchmarks thus far show worse performance. I'd be very happy to see positive 
results, since erasure coding will lead to a lot more remote reads and thus 
possibly hitting this code path.

There has to be some upside though for this to be merged. The existing DTP 
already implements a number of the features mentioned, so not sure how much we 
gain there. And if perf isn't as good or better, then we're increasing our 
maintenance burden for something that won't get used.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-03 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652549#comment-14652549
 ] 

Andrew Wang commented on HDFS-7966:
---

I guess my question here is similar to what [~stack] and [~tlipcon] posed at 
the beginning. What's the upside of this new implementation? Seems like it's 
between 10 to 30% slower than the current implementation, which is not good. If 
it were the same performance but had other redeeming qualities (e.g. less code) 
then it's still worth consideration.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-02 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14651021#comment-14651021
 ] 

Duo Zhang commented on HDFS-7966:
-

I modified {{Http2ConnectionPool}} to allow creating multiple HTTP/2 
connections to one datanode(default is 10). Here is the test result

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 10 
10 1024 pread
*** time based on tcp 38343

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 10 
10 1024 pread
*** time based on http2 45799

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
1 1024 pread
*** time based on tcp 20206

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
1 1024 pread
*** time based on http2 21980

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 500 
2000 1024 pread
*** time based on tcp 20146

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 500 
2000 1024 pread
*** time based on http2 22461
{noformat}

HTTP/2 is 19%, 9% and 11% slower than TCP. Notice that the {{DFSClient}} is not 
event-driven thus we have more threads when using HTTP/2 at client side, so I 
think the performance here is acceptable? We could introduce a new event-driven 
{{FileSystem}}(maybe like HDFS-8707?) later to improve client performance.

The performance testing is almost done here. Next I will begin to pick code 
from POC branch to HDFS-7966 branch. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-08-01 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650242#comment-14650242
 ] 

Duo Zhang commented on HDFS-7966:
-

OK I'm back. Here is the test result of HTTP/2 that remove context-switch 
overhead. See the 'noswitch' part in

https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/http2/PerformanceTest.java

And I also remove the thread pool in ReadBlockHandler.

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 
100 1024 pread
*** time based on tcp 260776

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 
100 1024 pread
*** time based on http2 301257

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest noswitch /test 
100 1024
*** time based on http2 264012
{noformat}

(264012 - 260776) / 260776 = 0.012
So if I remove context-switch, HTTP/2 is only 1.2% slower than TCP.

Of course it is not acceptable to write code like this in real production. It 
is only used to prove that context-switch is the primary overhead.

And in fact, although HTTP/2 is about 30% slower than TCP in this case, it is 
still fast enough I think? It is only 0.32ms per read, so maybe it is 
acceptable?

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg, 
 TestHttp2ReadBlockInsideEventLoop.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-20 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633089#comment-14633089
 ] 

Duo Zhang commented on HDFS-7966:
-

Write a single threaded testcase that do all the test works inside event loop.

https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/web/dtp/TestHttp2ReadBlockInsideEventLoop.java

And at server side, I remove the thread pool in {{ReadBlockHandler}}.

The result is
{noformat}
*** time based on tcp 17734ms
*** time based on http2 20019ms

*** time based on tcp 18878ms
*** time based on http2 21422ms

*** time based on tcp 17562ms
*** time based on http2 20568ms

*** time based on tcp 18726ms
*** time based on http2 20251ms

*** time based on tcp 18632ms
*** time based on http2 21227ms
{noformat}

The average time of original tcp is 18306.4ms, and HTTP/2 is 20697.4ms. 

20697.4 / 18306.4 = 1.13, so HTTP/2 is 13% slower than tcp. In the above test 
it is 30% slower, so I think context switch maybe one of the problem why HTTP/2 
is much slower than tcp. Will do this test on a real cluster to get more data.

And the one {{EventLoop}} per datanode problem, I think it is a problem on a 
small cluster. So I think we should allow creating multiple HTTP/2 connections 
to one datanode. I will modify {{Http2ConnectionPool}} and do the test again.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-15 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627827#comment-14627827
 ] 

Duo Zhang commented on HDFS-7966:
-

Small read using {{PerformanceTest}}. Unit is millisecond.

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 
1(thread number) 100(read count per thread) 1024(bytes per read) pread(use 
pread)
{noformat}

{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 
100 1024 pread
*** time based on tcp 242730

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 
100 1024 pread
*** time based on http2 324491

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 10 
10 1024 pread
*** time based on tcp 40688

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 10 
10 1024 pread
*** time based on http2 82819

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 100 
1 1024 pread
*** time based on tcp 21612

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 100 
1 1024 pread
*** time based on http2 69658

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 500 
2000 1024 pread
*** time based on tcp 19931

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 500 
2000 1024 pread
*** time based on http2 151727

./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1000 
1000 1024 pread
*** time based on http2 251735
{noformat}

For the single threaded test, 324491/242730=1.34, so http2 is 30% slow than 
tcp. Will try to find the overhead later.

And for multi threaded test, http2 is much slow than tcp. And tcp failed the 
1000 threads test.

I think the problem is that I only use one connection in http2 so there is only 
one EventLoop(which means only one thread) which sends or receives data. And 
for tcp, the thread number is same with connection number. The {{%CPU}} of 
datanode when using http2 is always around 100% no matter the thread number is 
10 or 100 or 1000. But when using tcp the {{%CPU}} could be higher than 1500% 
when the number of thread increasing. Next I will write new test which can use 
multiple http2 connections.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-14 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627512#comment-14627512
 ] 

Duo Zhang commented on HDFS-7966:
-

I'd say sorry... The performance result above is useless since the flow control 
part of my code does not work at that time... I found it when I tried to 
transfer 512MB block-I got an OOM...

I have rewritten the flow control part, and setup a cluster with 1 NN and DN to 
evaluate the performance. There is a netty 
bug(https://github.com/netty/netty/pull/3929) so I need to modify my code when 
running different tests.

The performance test code is here
https://github.com/Apache9/hadoop/blob/HDFS-7966-POC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/web/http2/PerformanceTest.java

First I ran a large read test with 1 file with a 1GB block. Each ran 5 times 
with the command
{noformat}
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest http2 /test 1 1 
1073741824
./bin/hadoop org.apache.hadoop.hdfs.web.http2.PerformanceTest tcp /test 1 1 
1073741824
{noformat}

Note that I set {{dfs.datanode.transferTo.allowed}} to {{false}} since http2 
implementation can not use transferTo(I'm currently working on implementing 
{{FileRegion}} support in netty-http2-codec, see 
https://github.com/netty/netty/issues/3927)

The result is
{noformat}
*** time based on http2 9953
*** time based on http2 9967
*** time based on http2 9954
*** time based on http2 9985
*** time based on http2 9976

*** time based on tcp 9383
*** time based on tcp 9375
*** time based on tcp 9377
*** time based on tcp 9373
*** time based on tcp 9376
{noformat}

The average latency of http2 is 9967ms, and for tcp it is 9376.8ms.

9967/9376.8=1.063, so http2 is about 6% slow than tcp. I think this is an 
acceptable result?

Let me test small read later and post the result here. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-12 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14623778#comment-14623778
 ] 

Duo Zhang commented on HDFS-7966:
-

And there is a good news, in {{TestHttp2RandomReadPerformance}}, HTTP/2 
implementation beats the old implementation. We create 4000 connections to one 
datanode, and use one thread to peek a connection and read a small chunk of 
data sequentially.

The result is
{noformat}
*** time based on http2 124ms
*** time based on tcp 274ms
{noformat}

And 5000 connections will cause OOM(can not create thread) in the tcp test. I 
think this is reasonable since NIO based framework has much less threads than 
OIO.

I'm busy these days so only have some time at weekend. Will do these tests on a 
cluster ASAP. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, 
 TestHttp2LargeReadPerformance.svg, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-07 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617005#comment-14617005
 ] 

Haohui Mai commented on HDFS-7966:
--

Thanks for the benchmark. Some questions:

* Do you have an idea whether the overhead is coming from the server side or 
the client side implementation? Does it help when you're using {{curl}} as the 
client?
* How does the size of read affect performance? How does HTTP/2 perform for 
large reads (e.g., a full block size)?

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-07-07 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617985#comment-14617985
 ] 

Duo Zhang commented on HDFS-7966:
-

This is the worst scenario for testing a NIO framework I think. NIO is consider 
to have less threads than OIO,  but in this test, NIO at least needs 4 threads 
and OIO only needs 2. You can see that the context switching costs a lot in the 
flame graph(ThreadPoolExecutor related operations, EventLoop.execute, 
selector.wakeup, etc.). And the buffer pooling here is also redundant. In OIO, 
one buffer for server and one buffer for client. At last, I think test through 
localhost can make things worse since now the network speed and latency are not 
bottleneck any more.

I plan to test these things next:
1. Read a large block(256MB or more)
2. Simulate the scenario that datanode caches a lot of connections from 
different machine and only a few of them read at the same time.
3. Run all tests on a real cluster(which means read data from other machine).

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf, TestHttp2Performance.svg


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-18 Thread Duo Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14591434#comment-14591434
 ] 

Duo Zhang commented on HDFS-7966:
-

https://github.com/Apache9/hadoop/tree/HDFS-7966-POC

Will implement read block at this branch and collect some performance results 
first.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588688#comment-14588688
 ] 

stack commented on HDFS-7966:
-

bq. The code path is completely separated so it should be few risks in terms of 
destabilizing the trunk.

My concern is not so much destabilization. My concern is a bunch of new code 
that may never get used.

Reviewing the patches so far, it seems like you fellas are working it out as 
you go. Nothing wrong with that. It just seems like something better done in a 
branch than in mainline.

The justification for this work is a little nebulous. It has it that [DTP on 
HTTP/2] ...should simplify the implementation of both HDFS clients and 
servers. Apart from the fact that DTP is but a severe subset, the 'easy' part, 
of what an alternative client/server would have to implement, what if HTTP/2 
complicates rather than simplifies new clients and servers?  Better to figure 
this, and 'fit criteria' that prove it simplifies, on a branch I'd say.

Also, why would folks move to using this new transport? Will it be more 
performant than current DTP? (I'd guess not... given HTTP/2 does a bunch of 
'extras' and going by the PoC done over in HBase) When complete, we might have 
a bunch of new code that is slower than what is currently there and that folks 
are wary to try given it is 'new'. This state of affairs could go on such that 
the code is never exercised.

I am suggesting a branch because there you work out implementation, perf 
characteristics, and answers to any questions such as the sample posed above 
and come merge time, you will have a more solid story to tell.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588913#comment-14588913
 ] 

Todd Lipcon commented on HDFS-7966:
---

+1 to what Stack said. We already have many access mechanisms to HDFS. Adding 
another, unless it has some clear value, just adds more code to maintain, more 
dependencies, etc. There should be some clear criteria for merging this into 
HDFS proper -- either that there's a strong performance argument or proof that 
it's substantially easier to implement this vs the existing protocol (eg if the 
new server/client support all the features of the existing client at the same 
performance but end up being substantially fewer lines of code to maintain).



 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589122#comment-14589122
 ] 

Haohui Mai commented on HDFS-7966:
--

bq. Should we get Duo Zhang set up w/ commit rights on the branch?

This is a good idea. I'll start the discussion and get things going.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-16 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589024#comment-14589024
 ] 

Haohui Mai commented on HDFS-7966:
--

Thanks for the inputs. I think that the arguments of a feature branch are 
valid. I'll create a feature branch shortly.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589040#comment-14589040
 ] 

stack commented on HDFS-7966:
-

Lets me help if I can.  Should we get [~Apache9] set up w/ commit rights on the 
branch?

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-12 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14584160#comment-14584160
 ] 

Haohui Mai commented on HDFS-7966:
--

bq. Do you think this should go into a branch first?

I think the development can be done in mainline. The code path is completely 
separated so it should be few risks in terms of destabilizing the trunk.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578453#comment-14578453
 ] 

stack commented on HDFS-7966:
-

bq. Hopefully we can get things committed soon and continue to make progress.

Do you think this should go into a branch first [~wheat9]? It is shaping up as 
a nice set of sizeable, related patches. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-06-08 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14578303#comment-14578303
 ] 

Haohui Mai commented on HDFS-7966:
--

Sorry for the late reply. Get caught up by multiple things.

bq. 2 - If you're using your own payload encoding then tagging flush points in 
a streaming RPC seems pretty trivial.

Thanks very much for the information. Based on the information it looks like 
that it is possible to use a standard GRPC client to talk the new DTP protocol. 
It would save a lot of effort on implementing a client of the DTP protocol.

I really appreciate if there are any pointers to the client / server code.

bq. Generally interested in progress if any. No harm if none. Thanks.

Currently [~Apache9] is making progress on HDFS-8515 and HDFS-8471. I have been 
closely working with [~Apache9] on this. Hopefully we can get things committed 
soon and continue to make progress.


 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-05-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14556920#comment-14556920
 ] 

stack commented on HDFS-7966:
-

bq. What I was saying is that I'm unsure whether a standard grpc client would 
be able to understand this variant.

Sounds like it might ([~louiscryan] might be able to help out here if issues 
according to above).

Generally interested in progress if any. No harm if none. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-05-13 Thread Louis Ryan (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542307#comment-14542307
 ] 

Louis Ryan commented on HDFS-7966:
--

Hey,

Im the one working on the grpc impl for HBase so I can help answer questions on 
that.

1 - grpc is a general purpose content type agnostic transport layer so you can 
happily use it without using proto. 

2 - If you're using your own payload encoding then tagging flush points in a 
streaming RPC seems pretty trivial.

If you'd like to try grpc let me know and I can provide pointers / answer 
questions



 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-05-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540331#comment-14540331
 ] 

stack commented on HDFS-7966:
-

For your consideration:

On 1., are you sure this the case? There is a hacked up grpc poc over in 
hbaseland that implements hbase's metadata-in-pb-and-data-follows-behind. Here 
is client write: 
https://github.com/louiscryan/hbase/commit/e2dae9e8e3f648125f1e13e0ff88944935ad39a3#diff-f5daefd77cbc7bbca73187505c2a9c13R179

On 2., does the write have to be streaming? It can't be a sized write with 
metadata to say h*() when all received?

Let me ask my migration question another way: We thinking this will be an 
incompatible change?

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-05-12 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540286#comment-14540286
 ] 

Haohui Mai commented on HDFS-7966:
--

Let me try to answer the questions.

There are two reasons that why the proposal needs to diverge from grpc: (1) 
grpc requires the response to be fitted into a protobuf message where a large 
read request ( 2GB) fails to fit in, and (2) the write cannot be a single 
streaming rpc as the protocol needs to implement hflush() and hsync() as well. 

Note that evolving the read path is relatively straightforward as the 
implementation only needs to provide another implementation of {{BlockReader}}. 
The write path, however, might require implementing a new {{DFSOutputStream}}.

There should be no new port required -- the plan is to listen on the HTTP/HTTPS 
port that is available on DN today.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-05-12 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540405#comment-14540405
 ] 

Haohui Mai commented on HDFS-7966:
--

Thanks for the info. I think that the proposal handles the reads the same the 
link you posted when streaming large read requests. What I was saying is that 
I'm unsure whether a standard grpc client would be able to understand this 
variant.

For (2) I think it makes sense and I believe that this is what the current 
DataTransferProtocol is doing. We can definitely adopt this idea.

bq. Let me ask my migration question another way: We thinking this will be an 
incompatible change?

I'm probably missing something -- the new DTP will operate on the HTTPS port 
with a new URL, so it should be backward-compatible.





 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-05-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540424#comment-14540424
 ] 

stack commented on HDFS-7966:
-

bq. I'm probably missing something ..

You answered. New URL. So you have to ask for it. Ok. Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor
 Attachments: GSoC2015_Proposal.pdf


 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-05-11 Thread zhangduo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537805#comment-14537805
 ] 

zhangduo commented on HDFS-7966:


The latest patch of HDFS-5270 proves that the logic of FsDataset and writing 
pipeline could be compatible with a thread pool implementation.

But HDFS-5270 does not address the basic issue-one thread per connection(maybe 
two?). This makes client connection pooling which is very important for HBase 
impossible in large cluster. 

So I think it is time to pick up this issue.

Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-04-19 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502137#comment-14502137
 ] 

stack commented on HDFS-7966:
-

Yes. Do you have pointers [~wheat9] on where to comment in GSoC site? Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-04-15 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14497003#comment-14497003
 ] 

Haohui Mai commented on HDFS-7966:
--

There is one proposal from Qianqian Shi in the GSoC system. I don't know the 
details of the process, but it looks like it is being reviewed. [~stack] can 
you please help on it? Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-04-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493599#comment-14493599
 ] 

stack commented on HDFS-7966:
-

Any progress on this issue? (A student associated?) Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-03-24 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379298#comment-14379298
 ] 

Haohui Mai commented on HDFS-7966:
--

Hi students, please take a look at the GSoC 2015 FAQ and submit a proposal. 
Thanks.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-03-22 Thread SIVA NAGARAJU (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14375371#comment-14375371
 ] 

SIVA NAGARAJU commented on HDFS-7966:
-

I AM ALSO INTRESTED IN BEING PART OF IT.
 I KNOW HDFS AND MAP REDUCE PROGRAMMING

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-03-21 Thread Nisala Mendis (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372698#comment-14372698
 ] 

Nisala Mendis commented on HDFS-7966:
-

Hi all,
I am Nisala Nirmana, post graduate of University of Colombo, Sri Lanka 
specialized in computer and network secuirity. Also partly working as a 
research engineer at Dialog communications at University of Moratuwa Sri Lanka. 
I am interested in this project as I am more familiar with some of the 
technologies. I really appriciate some mentors out there provide me some more 
insight to the project.
Regards
nisala

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Qianqian Shi
  Labels: gsoc, gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7966) New Data Transfer Protocol via HTTP/2

2015-03-20 Thread Qianqian Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372233#comment-14372233
 ] 

Qianqian Shi commented on HDFS-7966:


It sounds like an awesome project. I'm interested in being part of it.

 New Data Transfer Protocol via HTTP/2
 -

 Key: HDFS-7966
 URL: https://issues.apache.org/jira/browse/HDFS-7966
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Haohui Mai
  Labels: gsoc2015, mentor

 The current Data Transfer Protocol (DTP) implements a rich set of features 
 that span across multiple layers, including:
 * Connection pooling and authentication (session layer)
 * Encryption (presentation layer)
 * Data writing pipeline (application layer)
 All these features are HDFS-specific and defined by implementation. As a 
 result it requires non-trivial amount of work to implement HDFS clients and 
 servers.
 This jira explores to delegate the responsibilities of the session and 
 presentation layers to the HTTP/2 protocol. Particularly, HTTP/2 handles 
 connection multiplexing, QoS, authentication and encryption, reducing the 
 scope of DTP to the application layer only. By leveraging the existing HTTP/2 
 library, it should simplify the implementation of both HDFS clients and 
 servers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)