[jira] [Commented] (HDFS-11529) Add libHDFS API to return last exception

2017-05-18 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16016353#comment-16016353
 ] 

Henry Robinson commented on HDFS-11529:
---

This patch introduces a deadlock in {{getGlobalJNIEnv()}} (see HDFS-11851).

> Add libHDFS API to return last exception
> 
>
> Key: HDFS-11529
> URL: https://issues.apache.org/jira/browse/HDFS-11529
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: libhdfs
>Affects Versions: 2.6.0
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: errorhandling, libhdfs
> Fix For: 3.0.0-alpha3
>
> Attachments: HDFS-11529.000.patch, HDFS-11529.001.patch, 
> HDFS-11529.002.patch, HDFS-11529.003.patch, HDFS-11529.004.patch, 
> HDFS-11529.005.patch, HDFS-11529.006.patch
>
>
> libHDFS uses a table to compare exceptions against and returns a 
> corresponding error code to the application in case of an error.
> However, this table is manually populated and many times is disremembered 
> when new exceptions are added.
> This causes libHDFS to return EINTERNAL (or Unknown Error(255)) whenever 
> these exceptions are hit. These are some examples of exceptions that have 
> been observed on an Error(255):
> org.apache.hadoop.ipc.StandbyException (Operation category WRITE is not 
> supported in state standby)
> java.io.EOFException: Cannot seek after EOF
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)
> It is of course not possible to have an error code for each and every type of 
> exception, so one suggestion of how this can be addressed is by having a call 
> such as hdfsGetLastException() that would return the last exception that a 
> libHDFS thread encountered. This way, an application may choose to call 
> hdfsGetLastException() if it receives EINTERNAL.
> We can make use of the Thread Local Storage to store this information. Also, 
> this makes sure that the current functionality is preserved.
> This is a follow up from HDFS-4997.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11851) getGlobalJNIEnv() may deadlock if exception is thrown

2017-05-18 Thread Henry Robinson (JIRA)
Henry Robinson created HDFS-11851:
-

 Summary: getGlobalJNIEnv() may deadlock if exception is thrown
 Key: HDFS-11851
 URL: https://issues.apache.org/jira/browse/HDFS-11851
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: libhdfs
Reporter: Henry Robinson
Priority: Blocker


HDFS-11529 introduced a deadlock into {{getGlobalJNIEnv()}} if an exception is 
thrown. {{getGlobalJNIEnv()}} holds {{jvmMutex}}, but 
{{printExceptionAndFree()}} will eventually try to acquire that lock in 
{{setTLSExceptionStrings()}}.

The exception might get caught from {{loadFileSystems}}:

{code}
jthr = invokeMethod(env, NULL, STATIC, NULL,
 "org/apache/hadoop/fs/FileSystem",
 "loadFileSystems", "()V");
if (jthr) {
printExceptionAndFree(env, jthr, PRINT_EXC_ALL, "loadFileSystems");
}
}
{code}

and here's the relevant parts of the stack trace from where I call this API in 
Impala, which uses {{libhdfs}}:

{code}
#0  __lll_lock_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x74a8d657 in _L_lock_909 () from 
/lib/x86_64-linux-gnu/libpthread.so.0
#2  0x74a8d480 in __GI___pthread_mutex_lock (mutex=0x47ce960 
) at ../nptl/pthread_mutex_lock.c:79
#3  0x02f06056 in mutexLock (m=) at 
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
#4  0x02efe817 in setTLSExceptionStrings (rootCause=0x0, 
stackTrace=0x0) at 
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
#5  0x02f065d7 in printExceptionAndFreeV (env=0x513c1e8, exc=0x508a8c0, 
noPrintFlags=, fmt=0x34349cf "loadFileSystems", 
ap=0x7fffb660)
at 
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
#6  0x02f0683d in printExceptionAndFree (env=, 
exc=, noPrintFlags=, fmt=)
at 
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
#7  0x02eff60f in getGlobalJNIEnv () at 
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-14 Thread Henry Robinson (JIRA)
Henry Robinson created HDFS-4824:


 Summary: FileInputStreamCache.close leaves dangling reference to 
FileInputStreamCache.cacheCleaner
 Key: HDFS-4824
 URL: https://issues.apache.org/jira/browse/HDFS-4824
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs-client
Affects Versions: 2.0.4-alpha
Reporter: Henry Robinson
Assignee: Colin Patrick McCabe


{{FileInputStreamCache}} leaves around a reference to its {{cacheCleaner}} 
after {{close()}}.

The {{cacheCleaner}} is created like this:

{code}
if (cacheCleaner == null) {
  cacheCleaner = new CacheCleaner();
  executor.scheduleAtFixedRate(cacheCleaner, expiryTimeMs, expiryTimeMs,
  TimeUnit.MILLISECONDS);
}
{code}

and supposedly removed like this:

{code}
if (cacheCleaner != null) {
  executor.remove(cacheCleaner);
}
{code}

However, {{ScheduledThreadPoolExecutor.remove}} returns a success boolean which 
should be checked. And I _think_ from a quick read of that class that the 
return value of {{scheduleAtFixedRate}} should be used as the argument to 
{{remove}}. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3228) Use fadvise in local read path

2012-06-12 Thread Henry Robinson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293962#comment-13293962
 ] 

Henry Robinson commented on HDFS-3228:
--

These messages refer to HDFS-3428, not this issue. Looks like a typo in the 
commit msg. 

 Use fadvise in local read path
 --

 Key: HDFS-3228
 URL: https://issues.apache.org/jira/browse/HDFS-3228
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs client, performance
Reporter: Henry Robinson
Assignee: Henry Robinson

 The read path through BlockReaderLocal does not take advantage of readahead 
 or drop-behind in the way that BlockSender does. We could arguably stand to 
 gain even more from hinting about read patterns to the kernel here, so we 
 should add the same mechanisms to BlockReaderLocal. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3514) Add missing TestParallelLocalRead

2012-06-06 Thread Henry Robinson (JIRA)
Henry Robinson created HDFS-3514:


 Summary: Add missing TestParallelLocalRead
 Key: HDFS-3514
 URL: https://issues.apache.org/jira/browse/HDFS-3514
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Henry Robinson


Somewhere in the later stages of HDFS-2834 I accidentally left 
TestParallelLocalRead out of the patch, and we didn't catch it before it got 
checked in. 

Here's that file, which provides test coverage for shortcircuit local reads 
using the copying and non-copying paths. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3514) Add missing TestParallelLocalRead

2012-06-06 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3514:
-

Attachment: HDFS-3514.patch

 Add missing TestParallelLocalRead
 -

 Key: HDFS-3514
 URL: https://issues.apache.org/jira/browse/HDFS-3514
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Henry Robinson
 Attachments: HDFS-3514.patch


 Somewhere in the later stages of HDFS-2834 I accidentally left 
 TestParallelLocalRead out of the patch, and we didn't catch it before it got 
 checked in. 
 Here's that file, which provides test coverage for shortcircuit local reads 
 using the copying and non-copying paths. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-3514) Add missing TestParallelLocalRead

2012-06-06 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson reassigned HDFS-3514:


Assignee: Henry Robinson

 Add missing TestParallelLocalRead
 -

 Key: HDFS-3514
 URL: https://issues.apache.org/jira/browse/HDFS-3514
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: HDFS-3514.patch


 Somewhere in the later stages of HDFS-2834 I accidentally left 
 TestParallelLocalRead out of the patch, and we didn't catch it before it got 
 checked in. 
 Here's that file, which provides test coverage for shortcircuit local reads 
 using the copying and non-copying paths. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3514) Add missing TestParallelLocalRead

2012-06-06 Thread Henry Robinson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Henry Robinson updated HDFS-3514:
-

Status: Patch Available  (was: Open)

 Add missing TestParallelLocalRead
 -

 Key: HDFS-3514
 URL: https://issues.apache.org/jira/browse/HDFS-3514
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Henry Robinson
Assignee: Henry Robinson
 Attachments: HDFS-3514.patch


 Somewhere in the later stages of HDFS-2834 I accidentally left 
 TestParallelLocalRead out of the patch, and we didn't catch it before it got 
 checked in. 
 Here's that file, which provides test coverage for shortcircuit local reads 
 using the copying and non-copying paths. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira