[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2015-05-05 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-6698:
---
Labels: BB2015-05-TBR  (was: )

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
>  Labels: BB2015-05-TBR
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-10-30 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HDFS-6698:

Attachment: HDFS-6698v3.txt

I just ran into this as well while debugging why HBase does not benefit from 
Snappy compression as much as it should. Turns out a non-trivial amount of time 
(as determined by a sampler, not a instrumenting profiler) is spent in this 
method.

To be safe I'd probably also turn LocatedBlocks into an immutable object (well, 
except for blocks) - see attached patch. All members of LocatedBlocks are 
safely published now.

With that I don't think this patch can do any harm.


> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt, HDFS-6698v3.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-08-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6698:


Attachment: HDFS-6698v2.txt

Rebase. Retry.

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt, 
> HDFS-6698v2.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-08-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6698:


Attachment: HDFS-6698v2.txt

v2 adds protection against the scenario Colin suggests (though it can't happen 
w/ code as is).

This patch is conservative.  It does not change semantic. It just livens up the 
getting of file length by keeping a cached copy which it will return unless 
anything has changed since we last did file length.

Discussion on locking and concurrency on DFSIS in general is going on over in 
other issues at levels above where this patch is working.

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt, HDFS-6698v2.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-22 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-6698:


Attachment: HDFS-6698.txt

Retry

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt, HDFS-6698.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-22 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6698:


Issue Type: Sub-task  (was: Improvement)
Parent: HDFS-6735

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-21 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6698:


Attachment: HDFS-6698.txt

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-21 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HDFS-6698:


Status: Patch Available  (was: Open)

> try to optimize DFSInputStream.getFileLength()
> --
>
> Key: HDFS-6698
> URL: https://issues.apache.org/jira/browse/HDFS-6698
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0
>Reporter: Liang Xie
>Assignee: Liang Xie
> Attachments: HDFS-6698.txt
>
>
> HBase prefers to invoke read() serving scan request, and invoke pread() 
> serving get reqeust. Because pread() almost holds no lock.
> Let's image there's a read() running, because the definition is:
> {code}
> public synchronized int read
> {code}
> so no other read() request could run concurrently, this is known, but pread() 
> also could not run...  because:
> {code}
>   public int read(long position, byte[] buffer, int offset, int length)
> throws IOException {
> // sanity checks
> dfsClient.checkOpen();
> if (closed) {
>   throw new IOException("Stream closed");
> }
> failures = 0;
> long filelen = getFileLength();
> {code}
> the getFileLength() also needs lock.  so we need to figure out a no lock impl 
> for getFileLength() before HBase multi stream feature done. 
> [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)