[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448566#comment-16448566
 ] 

Sergey Shelukhin commented on HADOOP-15403:
---

[~jlowe] would a change in config be ok? I think it is better to add another 
config, but we can also make the existing one "true, false, -file not found- 
ignore", where ignore will have the new behavior. False can still work for 
people if they override listFiles.

[~ste...@apache.org] will fix both with other concerns once we decide on those.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446460#comment-16446460
 ] 

Sergey Shelukhin commented on HADOOP-15403:
---

I fixed in getSplits since list... can be overridden by implementations, that 
can probably still return directories. It would be good to ignore them all, 
esp. if someone copy-pasted the fetching code.

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15403:
--
Status: Patch Available  (was: Open)

[~gsaha] [~leftnoteasy] can you take a look?

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15403:
--
Attachment: HADOOP-15403.patch

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HADOOP-15403.patch
>
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HADOOP-15403:
-

Assignee: Sergey Shelukhin

> FileInputFormat recursive=false fails instead of ignoring the directories.
> --
>
> Key: HADOOP-15403
> URL: https://issues.apache.org/jira/browse/HADOOP-15403
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> We are trying to create a split in Hive that will only read files in a 
> directory and not subdirectories.
> That fails with the below error.
> Given how this error comes about (two pieces of code interact, one explicitly 
> adding directories to results without failing, and one failing on any 
> directories in results), this seems like a bug.
> {noformat}
> Caused by: java.io.IOException: Not a file: 
> file:/,...warehouse/simple_to_mm_text/delta_001_001_
>   at 
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
> ~[hadoop-mapreduce-client-core-3.1.0.jar:?]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
>  ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
> {noformat}
> This code, when recursion is disabled, adds directories to results 
> {noformat} 
> if (recursive && stat.isDirectory()) {
>   result.dirsNeedingRecursiveCalls.add(stat);
> } else {
>   result.locatedFileStatuses.add(stat);
> }
> {noformat} 
> However the getSplits code after that computes the size like this
> {noformat}
> long totalSize = 0;   // compute total size
> for (FileStatus file: files) {// check we have valid files
>   if (file.isDirectory()) {
> throw new IOException("Not a file: "+ file.getPath());
>   }
>   totalSize +=
> {noformat}
> which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.

2018-04-20 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-15403:
-

 Summary: FileInputFormat recursive=false fails instead of ignoring 
the directories.
 Key: HADOOP-15403
 URL: https://issues.apache.org/jira/browse/HADOOP-15403
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sergey Shelukhin


We are trying to create a split in Hive that will only read files in a 
directory and not subdirectories.
That fails with the below error.
Given how this error comes about (two pieces of code interact, one explicitly 
adding directories to results without failing, and one failing on any 
directories in results), this seems like a bug.

{noformat}
Caused by: java.io.IOException: Not a file: 
file:/,...warehouse/simple_to_mm_text/delta_001_001_
at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) 
~[hadoop-mapreduce-client-core-3.1.0.jar:?]
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203)
 ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT]
{noformat}

This code, when recursion is disabled, adds directories to results 
{noformat} 
if (recursive && stat.isDirectory()) {
  result.dirsNeedingRecursiveCalls.add(stat);
} else {
  result.locatedFileStatuses.add(stat);
}
{noformat} 
However the getSplits code after that computes the size like this
{noformat}
long totalSize = 0;   // compute total size
for (FileStatus file: files) {// check we have valid files
  if (file.isDirectory()) {
throw new IOException("Not a file: "+ file.getPath());
  }
  totalSize +=
{noformat}
which would always fail combined with the above code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors

2018-02-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351016#comment-16351016
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Update: turns out, end() was a red herring after all - any reuse of the same 
object without calling reset causes the issue.
Given that the object does not support the zlib library model of repeatedly 
calling inflate with more data, it basically never makes sense to call 
decompress without calling reset.
Perhaps the call should be built in? I cannot find whether zlib itself actually 
requires one to reset (at least, for the continuous decompression case, it 
doesn't look like it's the case), so perhaps cleanup could be improved too.
At any rate, error handling should be fixed to not return 0.

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly 
> handles some zlib errors
> -
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors

2018-02-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15171:
--
Summary: native ZLIB decompressor produces 0 bytes on the 2nd call; also 
incorrrectly handles some zlib errors  (was: native ZLIB decompressor produces 
0 bytes after end() is called on a different decompressor; also incorrrectly 
handles some zlib errors)

> native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly 
> handles some zlib errors
> -
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15171) native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor; also incorrrectly handles some zlib errors

2018-02-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15171:
--
Summary: native ZLIB decompressor produces 0 bytes after end() is called on 
a different decompressor; also incorrrectly handles some zlib errors  (was: 
Hadoop native ZLIB decompressor produces 0 bytes after end() is called on a 
different decompressor)

> native ZLIB decompressor produces 0 bytes after end() is called on a 
> different decompressor; also incorrrectly handles some zlib errors
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor

2018-02-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15171:
--
Summary: Hadoop native ZLIB decompressor produces 0 bytes after end() is 
called on a different decompressor  (was: Hadoop native ZLIB decompressor 
produces 0 bytes for some input)

> Hadoop native ZLIB decompressor produces 0 bytes after end() is called on a 
> different decompressor
> --
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349513#comment-16349513
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Ok, here's repro code. I get the bar exception if dd2 is added. Note that dd2 
is not used for anything and is not related in any way to dd1.
Note that if I end dd1 and then reuse it, I get a NPE in Java code. But if I 
end dd2, internals of Java object are not affected in dd1; looks like the 
native side has some issue.
{noformat}

dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);

ZlibDecompressor.ZlibDirectDecompressor dd1 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("foo");
}

ZlibDecompressor.ZlibDirectDecompressor dd2 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);
dd2.end();
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("bar");
}

{noformat}

As a side note, Z_BUF_ERROR error in the native code is not processed 
correctly. See the detailed example for this error here 
http://zlib.net/zlib_how.html ; given that neither the Java nor native code 
handles partial reads; and nothing propagates the state to the caller, this 
should throw an error just like Z_DATA_ERROR.
The buffer address null checks should probably also throw and not exit silently.
Z_NEED_DICT handling is also suspicious. Does anything actually handle this?


> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349513#comment-16349513
 ] 

Sergey Shelukhin edited comment on HADOOP-15171 at 2/1/18 11:43 PM:


Ok, here's repro code. I get the bar exception if dd2 is added. Note that dd2 
is not used for anything and is not related in any way to dd1.
If I instead end dd1 and then reuse it, I get a NPE in Java code. But if I end 
dd2, internals of Java object are not affected in dd1; looks like the native 
side has some issue.
{noformat}

dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);

ZlibDecompressor.ZlibDirectDecompressor dd1 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("foo");
}

ZlibDecompressor.ZlibDirectDecompressor dd2 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);
dd2.end();
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("bar");
}

{noformat}

As a side note, Z_BUF_ERROR error in the native code is not processed 
correctly. See the detailed example for this error here 
http://zlib.net/zlib_how.html ; given that neither the Java nor native code 
handles partial reads; and nothing propagates the state to the caller, this 
should throw an error just like Z_DATA_ERROR.
The buffer address null checks should probably also throw and not exit silently.
Z_NEED_DICT handling is also suspicious. Does anything actually handle this?



was (Author: sershe):
Ok, here's repro code. I get the bar exception if dd2 is added. Note that dd2 
is not used for anything and is not related in any way to dd1.
Note that if I end dd1 and then reuse it, I get a NPE in Java code. But if I 
end dd2, internals of Java object are not affected in dd1; looks like the 
native side has some issue.
{noformat}

dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);

ZlibDecompressor.ZlibDirectDecompressor dd1 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("foo");
}

ZlibDecompressor.ZlibDirectDecompressor dd2 = new 
ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0);
dest.position(startPos);
dest.limit(startLim);
src.position(startSrcPos);
src.limit(startSrcLim);
dd2.end();
dd1.decompress(src, dest);
dest.limit(dest.position()); // Set the new limit to where the decompressor 
stopped.
dest.position(startPos);
if (dest.remaining() == 0) {
  throw new RuntimeException("bar");
}

{noformat}

As a side note, Z_BUF_ERROR error in the native code is not processed 
correctly. See the detailed example for this error here 
http://zlib.net/zlib_how.html ; given that neither the Java nor native code 
handles partial reads; and nothing propagates the state to the caller, this 
should throw an error just like Z_DATA_ERROR.
The buffer address null checks should probably also throw and not exit silently.
Z_NEED_DICT handling is also suspicious. Does anything actually handle this?


> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 1

[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349307#comment-16349307
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Hmm, nm, it might be a red herring

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-02-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349238#comment-16349238
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

Tentative cause (still confirming) - calling end() on ZlibDirectDecompressor 
breaks other unrelated ZlibDirectDecompressor-s. So it may not be related to 
buffers as such.


> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Assignee: Lokesh Jain
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15171:
--
Priority: Blocker  (was: Critical)

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Blocker
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344184#comment-16344184
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

[~ste...@apache.org] [~jnp] is it possible to get some traction on this 
actually? We now also have to work around this in ORC project, and this is 
becoming a pain

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Critical
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15171:
--
Fix Version/s: 3.0.1
   3.1.0

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Critical
> Fix For: 3.1.0, 3.0.1
>
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324924#comment-16324924
 ] 

Sergey Shelukhin commented on HADOOP-15171:
---

[~jnp] [~hagleitn] fyi

> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to 
> JDK decompressor with memcopy; got 155 bytes
> {noformat}
> Hadoop version is based on 3.1 snapshot.
> The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 
> FWIW. Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-15171:
--
Description: 
While reading some ORC file via direct buffers, Hive gets a 0-sized buffer for 
a particular compressed segment of the file. We narrowed it down to Hadoop 
native ZLIB codec; when the data is copied to heap-based buffer and the JDK 
Inflater is used, it produces correct output. Input is only 127 bytes so I can 
paste it here.
All the other (many) blocks of the file are decompressed without problems by 
the same code.

{noformat}
2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
127 bytes to dest buffer pos 524288, limit 786432
2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 66 
60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 30 
70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 15 
a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 b4 
01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 11 
c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to JDK 
decompressor with memcopy; got 155 bytes
{noformat}

Hadoop version is based on 3.1 snapshot.
The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 FWIW. 
Not sure how to extract versions from those. 

  was:
While reading some ORC file via direct buffers, Hive gets a 0-sized buffer for 
a particular compressed segment of the file. We narrowed it down to Hadoop 
native ZLIB codec; when the data is copied to heap-based buffer and the JDK 
Inflater is used, it produces correct output. Input is only 127 bytes so I can 
paste it here.

{noformat}
2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
127 bytes to dest buffer pos 524288, limit 786432
2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 66 
60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 30 
70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 15 
a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 b4 
01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 11 
c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to JDK 
decompressor with memcopy; got 155 bytes
{noformat}

Hadoop version is based on 3.1 snapshot.
The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 FWIW. 
Not sure how to extract versions from those. 


> Hadoop native ZLIB decompressor produces 0 bytes for some input
> ---
>
> Key: HADOOP-15171
> URL: https://issues.apache.org/jira/browse/HADOOP-15171
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.1.0
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> While reading some ORC file via direct buffers, Hive gets a 0-sized buffer 
> for a particular compressed segment of the file. We narrowed it down to 
> Hadoop native ZLIB codec; when the data is copied to heap-based buffer and 
> the JDK Inflater is used, it produces correct output. Input is only 127 bytes 
> so I can paste it here.
> All the other (many) blocks of the file are decompressed without problems by 
> the same code.
> {noformat}
> 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
> 127 bytes to dest buffer pos 524288, limit 786432
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
> produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 
> 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 
> 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 
> 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 
> b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 
> 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
> 2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
> (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderI

[jira] [Created] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input

2018-01-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-15171:
-

 Summary: Hadoop native ZLIB decompressor produces 0 bytes for some 
input
 Key: HADOOP-15171
 URL: https://issues.apache.org/jira/browse/HADOOP-15171
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.1.0
Reporter: Sergey Shelukhin
Priority: Critical


While reading some ORC file via direct buffers, Hive gets a 0-sized buffer for 
a particular compressed segment of the file. We narrowed it down to Hadoop 
native ZLIB codec; when the data is copied to heap-based buffer and the JDK 
Inflater is used, it produces correct output. Input is only 127 bytes so I can 
paste it here.

{noformat}
2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 
127 bytes to dest buffer pos 524288, limit 786432
2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has 
produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 66 
60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 30 
70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 15 
a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 b4 
01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 11 
c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00
2018-01-13T02:47:40,816  WARN [IO-Elevator-Thread-0 
(1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to JDK 
decompressor with memcopy; got 155 bytes
{noformat}

Hadoop version is based on 3.1 snapshot.
The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 FWIW. 
Not sure how to extract versions from those. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13500) Concurrency issues when using Configuration iterator

2017-09-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183481#comment-16183481
 ] 

Sergey Shelukhin commented on HADOOP-13500:
---

[~jnp] can we fix get this fixed eventually? ;)

> Concurrency issues when using Configuration iterator
> 
>
> Key: HADOOP-13500
> URL: https://issues.apache.org/jira/browse/HADOOP-13500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: conf
>Reporter: Jason Lowe
>
> It is possible to encounter a ConcurrentModificationException while trying to 
> iterate a Configuration object.  The iterator method tries to walk the 
> underlying Property object without proper synchronization, so another thread 
> simultaneously calling the set method can trigger it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13500) Concurrency issues when using Configuration iterator

2017-09-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183478#comment-16183478
 ] 

Sergey Shelukhin edited comment on HADOOP-13500 at 9/28/17 12:07 AM:
-

Hive is also hitting this issue with a different call stack


was (Author: sershe):
Hive is also hitting this issue.

> Concurrency issues when using Configuration iterator
> 
>
> Key: HADOOP-13500
> URL: https://issues.apache.org/jira/browse/HADOOP-13500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: conf
>Reporter: Jason Lowe
>
> It is possible to encounter a ConcurrentModificationException while trying to 
> iterate a Configuration object.  The iterator method tries to walk the 
> underlying Property object without proper synchronization, so another thread 
> simultaneously calling the set method can trigger it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13500) Concurrency issues when using Configuration iterator

2017-09-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183478#comment-16183478
 ] 

Sergey Shelukhin commented on HADOOP-13500:
---

Hive is also hitting this issue.

> Concurrency issues when using Configuration iterator
> 
>
> Key: HADOOP-13500
> URL: https://issues.apache.org/jira/browse/HADOOP-13500
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: conf
>Reporter: Jason Lowe
>
> It is possible to encounter a ConcurrentModificationException while trying to 
> iterate a Configuration object.  The iterator method tries to walk the 
> underlying Property object without proper synchronization, so another thread 
> simultaneously calling the set method can trigger it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14683) FileStatus.compareTo binary compatible issue

2017-08-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110045#comment-16110045
 ] 

Sergey Shelukhin commented on HADOOP-14683:
---

Thanks!

> FileStatus.compareTo binary compatible issue
> 
>
> Key: HADOOP-14683
> URL: https://issues.apache.org/jira/browse/HADOOP-14683
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.8.1
>Reporter: Sergey Shelukhin
>Assignee: Akira Ajisaka
>Priority: Blocker
> Fix For: 2.9.0, 3.0.0-beta1, 2.8.2
>
> Attachments: HADOOP-14683-branch-2-01.patch, 
> HADOOP-14683-branch-2-02.patch
>
>
> See HIVE-17133. Looks like the signature change is causing issues; according 
> to [~jnp] this is a public API.
> Is it possible to add the old overload back (keeping the new one presumably) 
> in a point release on 2.8? That way we can avoid creating yet another shim 
> for this in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-14683) FileStatus.compareTo binary compat issue between 2.7 and 2.8

2017-07-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102470#comment-16102470
 ] 

Sergey Shelukhin commented on HADOOP-14683:
---

+1 non-binding

> FileStatus.compareTo binary compat issue between 2.7 and 2.8
> 
>
> Key: HADOOP-14683
> URL: https://issues.apache.org/jira/browse/HADOOP-14683
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.8.0, 2.8.1
>Reporter: Sergey Shelukhin
>Assignee: Akira Ajisaka
>Priority: Blocker
> Attachments: HADOOP-14683-branch-2-01.patch
>
>
> See HIVE-17133. Looks like the signature change is causing issues; according 
> to [~jnp] this is a public API.
> Is it possible to add the old overload back (keeping the new one presumably) 
> in a point release on 2.8? That way we can avoid creating yet another shim 
> for this in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14684) get rid of "skipCorrupt" flag

2017-07-24 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-14684:
-

 Summary: get rid of "skipCorrupt" flag
 Key: HADOOP-14684
 URL: https://issues.apache.org/jira/browse/HADOOP-14684
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sergey Shelukhin


The error that caused the issue was a long time ago and it's probably ok to get 
rid of this flag.
Perhaps we should provide a small tool to overwrite these files without the 
corrupt values.
cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-14683) FileStatus.compareTo binary compat issue between 2.7 and 2.8

2017-07-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-14683:
--
Description: 
See HIVE-17133. Looks like the signature change is causing issues; according to 
[~jnp] this is a public API.
Is it possible to add the old overload back (keeping the new one presumably) in 
a point release on 2.8? That way we can avoid creating yet another shim for 
this in Hive.

  was:
See HIVE-17133. Looks like the signature change is causing issues; according to 
[~jnp] this is a public API.
Is it possible to add the old overload back in a point release on 2.8? That way 
we can avoid creating yet another shim for this in Hive.


> FileStatus.compareTo binary compat issue between 2.7 and 2.8
> 
>
> Key: HADOOP-14683
> URL: https://issues.apache.org/jira/browse/HADOOP-14683
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> See HIVE-17133. Looks like the signature change is causing issues; according 
> to [~jnp] this is a public API.
> Is it possible to add the old overload back (keeping the new one presumably) 
> in a point release on 2.8? That way we can avoid creating yet another shim 
> for this in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-14683) FileStatus.compareTo binary compat issue between 2.7 and 2.8

2017-07-24 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-14683:
-

 Summary: FileStatus.compareTo binary compat issue between 2.7 and 
2.8
 Key: HADOOP-14683
 URL: https://issues.apache.org/jira/browse/HADOOP-14683
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sergey Shelukhin


See HIVE-17133. Looks like the signature change is causing issues; according to 
[~jnp] this is a public API.
Is it possible to add the old overload back in a point release on 2.8? That way 
we can avoid creating yet another shim for this in Hive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-14214) DomainSocketWatcher::add()/delete() should not self interrupt while looping await()

2017-03-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937522#comment-15937522
 ] 

Sergey Shelukhin edited comment on HADOOP-14214 at 3/23/17 1:28 AM:


Hmm.. the reason we are interrupting the thread in question is because we want 
it to be interrupted (because the work it's performing is no longer relevant). 
Wouldn't this just cause it to be stuck forever anyway, or at best to continue 
a useless operation? cc [~sseth]


was (Author: sershe):
Hmm.. the reason we are interrupting the thread in question is because we want 
it to be interrupted (because the work it's performing is no longer relevant). 
Wouldn't this just cause it to be stuck forever anyway, or at best to continue 
a useless operation?

> DomainSocketWatcher::add()/delete() should not self interrupt while looping 
> await()
> ---
>
> Key: HADOOP-14214
> URL: https://issues.apache.org/jira/browse/HADOOP-14214
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HADOOP-14214.000.patch
>
>
> Our hive team found a TPCDS job whose queries running on LLAP seem to be 
> getting stuck. Dozens of threads were waiting for the 
> {{DfsClientShmManager::lock}}, as following jstack:
> {code}
> Thread 251 (IO-Elevator-Thread-5):
>   State: WAITING
>   Blocked count: 3871
>   Wtaited count: 4565
>   Waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@16ead198
>   Stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:255)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:476)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1181)
> 
> org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1118)
> org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1478)
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1441)
> org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
> 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
> 
> org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readStripeFooter(RecordReaderUtils.java:166)
> 
> org.apache.hadoop.hive.llap.io.metadata.OrcStripeMetadata.(OrcStripeMetadata.java:64)
> 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.readStripesMetadata(OrcEncodedDataReader.java:622)
> {code}
> The thread that is expected to signal those threads is calling 
> {{DomainSocketWatcher::add()}} method, but it gets stuck there dealing with 
> InterruptedException infinitely. The jstack is like:
> {code}
> Thread 44417 (TezTR-257387_2840_12_10_52_0):
>   State: RUNNABLE
>   Blocked count: 3
>   Wtaited count: 5
>   Stack:
> java.lang.Throwable.fillInStackTrace(Native Method)
> java.lang.Throwable.fillInStackTrace(Throwable.java:783)
> java.lang.Throwable.(Throwable.java:250)
> java.lang.Exception.(Exception.java:54)
> java.lang.InterruptedException.(InterruptedException.java:57)
> 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034)
> 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:325)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:266)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFac

[jira] [Commented] (HADOOP-14214) DomainSocketWatcher::add()/delete() should not self interrupt while looping await()

2017-03-22 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937522#comment-15937522
 ] 

Sergey Shelukhin commented on HADOOP-14214:
---

Hmm.. the reason we are interrupting the thread in question is because we want 
it to be interrupted (because the work it's performing is no longer relevant). 
Wouldn't this just cause it to be stuck forever anyway, or at best to continue 
a useless operation?

> DomainSocketWatcher::add()/delete() should not self interrupt while looping 
> await()
> ---
>
> Key: HADOOP-14214
> URL: https://issues.apache.org/jira/browse/HADOOP-14214
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hdfs-client
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Critical
> Attachments: HADOOP-14214.000.patch
>
>
> Our hive team found a TPCDS job whose queries running on LLAP seem to be 
> getting stuck. Dozens of threads were waiting for the 
> {{DfsClientShmManager::lock}}, as following jstack:
> {code}
> Thread 251 (IO-Elevator-Thread-5):
>   State: WAITING
>   Blocked count: 3871
>   Wtaited count: 4565
>   Waiting on 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@16ead198
>   Stack:
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:255)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:476)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
> 
> org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1181)
> 
> org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1118)
> org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1478)
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1441)
> org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121)
> 
> org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111)
> 
> org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readStripeFooter(RecordReaderUtils.java:166)
> 
> org.apache.hadoop.hive.llap.io.metadata.OrcStripeMetadata.(OrcStripeMetadata.java:64)
> 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.readStripesMetadata(OrcEncodedDataReader.java:622)
> {code}
> The thread that is expected to signal those threads is calling 
> {{DomainSocketWatcher::add()}} method, but it gets stuck there dealing with 
> InterruptedException infinitely. The jstack is like:
> {code}
> Thread 44417 (TezTR-257387_2840_12_10_52_0):
>   State: RUNNABLE
>   Blocked count: 3
>   Wtaited count: 5
>   Stack:
> java.lang.Throwable.fillInStackTrace(Native Method)
> java.lang.Throwable.fillInStackTrace(Throwable.java:783)
> java.lang.Throwable.(Throwable.java:250)
> java.lang.Exception.(Exception.java:54)
> java.lang.InterruptedException.(InterruptedException.java:57)
> 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034)
> 
> org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:325)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:266)
> 
> org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:476)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784)
> 
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718)
> 
> org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
> 
> org.apache.hadoop.h

[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-10-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563221#comment-15563221
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

Sorry, I've posted this in the wrong JIRA apparently:

The scenario is like this; we accept work on behalf of clients that is, 
generally speaking, authorized on a higher level (those are fragments of Hive 
jobs right now, except unlike MR they all run in-process, and we are also 
making the external client which is the crux of the matter). In normal case, 
the service doing the auth (HiveServer2 in case of Hive) gathers the tokens and 
passes them on to the service running the fragment; the external client may 
supply some tokens too. However, apparently for some clients it's difficult (or 
not implemented yet) to gather tokens, so in the cases of perimeter security, 
we want to be able to configure access in such way that they can access all of 
HDFS (for example; it could be some other service that their code touched that 
we have no idea about, hypothetically). The reasoning is that if the work item 
has passed thru the authorization that our service does, they don't care about 
HDFS security any more. In that case, our service would log in from keytab and 
run their item in that context. However, we neither want to require a 
super-user that is able to access all possible services (e.g. HBase), nor 
disable HDFS security altogether. So, the user work items would access HDFS (or 
HBase or whatever) as a user with lots of access, by design, and access other 
services via tokens.
This feature is off by default, obviously, and the of their code talking to 
services is based entirely on tokens by default.
I understand running as such user is not an ideal situation but it is 
apparently a valid scenario for some cases.
So, what we do now is create a master UGI/Subject; for every task, if this is 
enabled, we clone that via reflection and add the tokens. We haven't 
extensively tested this yet since external client is not production ready but 
it appears to work in some tests.

I hope this makes sense, feel free to clarify.
We are using reflection to get the subject and construct the UGI from subject.


> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Issue Comment Deleted] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-10-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13066:
--
Comment: was deleted

(was: The scenario is like this; we accept work on behalf of clients that is, 
generally speaking, authorized on a higher level (those are fragments of Hive 
jobs right now, except unlike MR they all run in-process, and we are also 
making the external client which is the crux of the matter). In normal case, 
the service doing the auth (HiveServer2 in case of Hive) gathers the tokens and 
passes them on to the service running the fragment; the external client may 
supply some tokens too. However, apparently for some clients it's difficult (or 
not implemented yet) to gather tokens, so in the cases of perimeter security, 
we want to be able to configure access in such way that they can access all of 
HDFS (for example; it could be some other service that their code touched that 
we have no idea about, hypothetically). The reasoning is that if the work item 
has passed thru the authorization that our service does, they don't care about 
HDFS security any more. In that case, our service would log in from keytab and 
run their item in that context. However, we neither want to require a 
super-user that is able to access all possible services (e.g. HBase), nor 
disable HDFS security altogether. So, the user work items would access HDFS (or 
HBase or whatever) as a user with lots of access, by design, and access other 
services via tokens.
This feature is off by default, obviously, and the of their code talking to 
services is based entirely on tokens by default.
I understand running as such user is not an ideal situation but it is 
apparently a valid scenario for some cases.
So, what we do now is create a master UGI/Subject; for every task, if this is 
enabled, we clone that via reflection and add the tokens. We haven't 
extensively tested this yet since external client is not production ready but 
it appears to work in some tests.

I hope this makes sense, feel free to clarify.
We are using reflection to get the subject and construct the UGI from subject.)

> UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
> --
>
> Key: HADOOP-13066
> URL: https://issues.apache.org/jira/browse/HADOOP-13066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Sergey Shelukhin
>
> When calling loginFromKerberos, a static variable is set up with the result. 
> If someone logs in as a different user from a different thread, the call to 
> getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-10-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563218#comment-15563218
 ] 

Sergey Shelukhin commented on HADOOP-13066:
---

Sorry, I commented on the wrong jira

> UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
> --
>
> Key: HADOOP-13066
> URL: https://issues.apache.org/jira/browse/HADOOP-13066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Sergey Shelukhin
>
> When calling loginFromKerberos, a static variable is set up with the result. 
> If someone logs in as a different user from a different thread, the call to 
> getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-10-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549646#comment-15549646
 ] 

Sergey Shelukhin edited comment on HADOOP-13066 at 10/5/16 6:57 PM:


The scenario is like this; we accept work on behalf of clients that is, 
generally speaking, authorized on a higher level (those are fragments of Hive 
jobs right now, except unlike MR they all run in-process, and we are also 
making the external client which is the crux of the matter). In normal case, 
the service doing the auth (HiveServer2 in case of Hive) gathers the tokens and 
passes them on to the service running the fragment; the external client may 
supply some tokens too. However, apparently for some clients it's difficult (or 
not implemented yet) to gather tokens, so in the cases of perimeter security, 
we want to be able to configure access in such way that they can access all of 
HDFS (for example; it could be some other service that their code touched that 
we have no idea about, hypothetically). The reasoning is that if the work item 
has passed thru the authorization that our service does, they don't care about 
HDFS security any more. In that case, our service would log in from keytab and 
run their item in that context. However, we neither want to require a 
super-user that is able to access all possible services (e.g. HBase), nor 
disable HDFS security altogether. So, the user work items would access HDFS (or 
HBase or whatever) as a user with lots of access, by design, and access other 
services via tokens.
This feature is off by default, obviously, and the of their code talking to 
services is based entirely on tokens by default.
I understand running as such user is not an ideal situation but it is 
apparently a valid scenario for some cases.
So, what we do now is create a master UGI/Subject; for every task, if this is 
enabled, we clone that via reflection and add the tokens. We haven't 
extensively tested this yet since external client is not production ready but 
it appears to work in some tests.

I hope this makes sense, feel free to clarify.
We are using reflection to get the subject and construct the UGI from subject.


was (Author: sershe):
The scenario is like this; we accept work on behalf of clients that is, 
generally speaking, authorized on a higher level (those are fragments of Hive 
jobs right now, except unlike MR they all run in-process, and we are also 
making the external client which is the crux of the matter). In normal case, 
the service doing the auth gathers the tokens and passes them on; the external 
client may supply some tokens too. However, apparently for some clients it's 
difficult (or not implemented yet) to gather tokens, so in the cases of 
perimeter security, they want to configure access in such way that they can 
access all of HDFS (for example; it could be some other service that their code 
touched that we have no idea about, hypothetically). The reasoning is that if 
the work item has passed thru the authorization of our service, they don't care 
about HDFS security any more. In that case, our service would log in from 
keytab and run their item in that context. However, we neither want to require 
a super-user that is able to access all possible services (e.g. HBase), nor 
disable HDFS security altogether. So, the user work items would access HDFS (or 
HBase or whatever) as a user with lots of access, by design, and access other 
services via tokens.
This feature is off by default, obviously, and the of their code talking to 
services is based entirely on tokens by default.
I understand running as such user is not an ideal situation but it is 
apparently a valid scenario for some cases.
So, what we do now is create a master UGI/Subject; for every task, if this is 
enabled, we clone that via reflection and add the tokens. We haven't 
extensively tested this yet since external client is not production ready but 
it appears to work in some tests.

I hope this makes sense, feel free to clarify.
We are using reflection to get the subject and construct the UGI from subject.

> UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
> --
>
> Key: HADOOP-13066
> URL: https://issues.apache.org/jira/browse/HADOOP-13066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Sergey Shelukhin
>
> When calling loginFromKerberos, a static variable is set up with the result. 
> If someone logs in as a different user from a different thread, the call to 
> getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr..

[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-10-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549646#comment-15549646
 ] 

Sergey Shelukhin commented on HADOOP-13066:
---

The scenario is like this; we accept work on behalf of clients that is, 
generally speaking, authorized on a higher level (those are fragments of Hive 
jobs right now, except unlike MR they all run in-process, and we are also 
making the external client which is the crux of the matter). In normal case, 
the service doing the auth gathers the tokens and passes them on; the external 
client may supply some tokens too. However, apparently for some clients it's 
difficult (or not implemented yet) to gather tokens, so in the cases of 
perimeter security, they want to configure access in such way that they can 
access all of HDFS (for example; it could be some other service that their code 
touched that we have no idea about, hypothetically). The reasoning is that if 
the work item has passed thru the authorization of our service, they don't care 
about HDFS security any more. In that case, our service would log in from 
keytab and run their item in that context. However, we neither want to require 
a super-user that is able to access all possible services (e.g. HBase), nor 
disable HDFS security altogether. So, the user work items would access HDFS (or 
HBase or whatever) as a user with lots of access, by design, and access other 
services via tokens.
This feature is off by default, obviously, and the of their code talking to 
services is based entirely on tokens by default.
I understand running as such user is not an ideal situation but it is 
apparently a valid scenario for some cases.
So, what we do now is create a master UGI/Subject; for every task, if this is 
enabled, we clone that via reflection and add the tokens. We haven't 
extensively tested this yet since external client is not production ready but 
it appears to work in some tests.

I hope this makes sense, feel free to clarify.
We are using reflection to get the subject and construct the UGI from subject.

> UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
> --
>
> Key: HADOOP-13066
> URL: https://issues.apache.org/jira/browse/HADOOP-13066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Sergey Shelukhin
>
> When calling loginFromKerberos, a static variable is set up with the result. 
> If someone logs in as a different user from a different thread, the call to 
> getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-09-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508229#comment-15508229
 ] 

Sergey Shelukhin edited comment on HADOOP-13081 at 9/21/16 12:22 AM:
-

The synchronization issues and preserving the order seem fixable. UGI already 
iterates credentials (e.g. in getTGT or getCredentialsInternal) synchronizing 
on itself or subject only.
User principal only uses the LoginContext to relogin. We could clear it and 
posit that clones cannot be used to relogin (this is rather arbitrary, 
admittedly...)



was (Author: sershe):
The synchronization issues and preserving the order seem fixable. UGI already 
iterates credentials (e.g. in getTGT or getCredentialsInternal) synchronizing 
on itself or subject only.
User principal only uses the LoginContext to relogin. We could clear it and 
posit that clones cannot be logged in.


> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-09-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508229#comment-15508229
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

The synchronization issues and preserving the order seem fixable. UGI already 
iterates credentials (e.g. in getTGT or getCredentialsInternal) synchronizing 
on itself or subject only.
User principal only uses the LoginContext to relogin. We could clear it and 
posit that clones cannot be logged in.


> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-09-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507805#comment-15507805
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

We don't have control over which parts of the code need kerberos or tokens; I 
suspect that usually only one would be needed but we don't know which one.

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-09-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507807#comment-15507807
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

Btw, we do already have the implementation using reflection ;)

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-09-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507759#comment-15507759
 ] 

Sergey Shelukhin edited comment on HADOOP-13081 at 9/20/16 8:51 PM:


[~cnauroth] the concrete use case is where a service runs multiple pieces of 
work on behalf of users; it can be set to log in as a particular user using 
Kerberos (specifically when running these), but the users can also add their 
own tokens.
We cannot add tokens to a single kerberos-based UGI because they will all mix; 
we also cannot log in for every piece of work in most cases, as it would 
overload the KDC.
Ideally, we should be able to reuse the kerberos login and create a separate 
UGI with it for each user, adding the user-specific tokens.


was (Author: sershe):
[~cnauroth] the concrete use case is where a service runs multiple pieces of 
work on behalf of users; it can be set to log in as a particular user using 
Kerberos, but the users can also add their own tokens.
We cannot add tokens to a single kerberos-based UGI because they will all mix; 
we also cannot log in for every piece of work in most cases, as it would 
overload the KDC.
Ideally, we should be able to reuse the kerberos login and create a separate 
UGI with it for each user, adding the user-specific tokens.

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-09-20 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507759#comment-15507759
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

[~cnauroth] the concrete use case is where a service runs multiple pieces of 
work on behalf of users; it can be set to log in as a particular user using 
Kerberos, but the users can also add their own tokens.
We cannot add tokens to a single kerberos-based UGI because they will all mix; 
we also cannot log in for every piece of work in most cases, as it would 
overload the KDC.
Ideally, we should be able to reuse the kerberos login and create a separate 
UGI with it for each user, adding the user-specific tokens.

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.8.0, 3.0.0-alpha1
>
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-08-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: HADOOP-13081.03.patch

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-08-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: (was: HADOOP-13081.03.patch)

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-08-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: HADOOP-13081.03.patch

fixed checkstyle... sigh

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: security
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-08-01 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: HADOOP-13081.03.patch

Updated

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-07-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: HADOOP-13081.02.patch

Updating the patch to remove tabs


> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.02.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-07-29 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: HADOOP-13081.02.patch

Added the test and some additional logic to clone Hadoop credentials, which are 
apparently reused from the set, rather than adding to the set.

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, 
> HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-07-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398592#comment-15398592
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

Yes, it's possible to add the test. It fails however, probably due to problems 
with mocks, I will finish it tomorrow, need to run now. Fixed the rest.

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process

2016-07-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13422:
--
Attachment: HADOOP-13422.01.patch

Updated the patch.

> ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK 
> users in process
> ---
>
> Key: HADOOP-13422
> URL: https://issues.apache.org/jira/browse/HADOOP-13422
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13422.01.patch, HADOOP-13422.patch
>
>
> There's a race in the globals. The non-global APIs from ZOOKEEPER-2139  are 
> not available yet in a stable ZK version and there's no timeline for 
> availability, so for now it would help to make SM aware of other users of the 
> global config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13422:
--
Attachment: HADOOP-13422.patch

The initial patch.

> ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK 
> users in process
> ---
>
> Key: HADOOP-13422
> URL: https://issues.apache.org/jira/browse/HADOOP-13422
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13422.patch
>
>
> There's a race in the globals. The non-global APIs from ZOOKEEPER-2139  are 
> not available yet in a stable ZK version and there's no timeline for 
> availability, so for now it would help to make SM aware of other users of the 
> global config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13422:
--
Status: Patch Available  (was: Open)

> ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK 
> users in process
> ---
>
> Key: HADOOP-13422
> URL: https://issues.apache.org/jira/browse/HADOOP-13422
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13422.patch
>
>
> There's a race in the globals. The non-global APIs from ZOOKEEPER-2139  are 
> not available yet in a stable ZK version and there's no timeline for 
> availability, so for now it would help to make SM aware of other users of the 
> global config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process

2016-07-25 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13422:
--
Description: There's a race in the globals. The non-global APIs from 
ZOOKEEPER-2139  are not available yet in a stable ZK version and there's no 
timeline for availability, so for now it would help to make SM aware of other 
users of the global config.  (was: There's a race where old config )

> ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK 
> users in process
> ---
>
> Key: HADOOP-13422
> URL: https://issues.apache.org/jira/browse/HADOOP-13422
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> There's a race in the globals. The non-global APIs from ZOOKEEPER-2139  are 
> not available yet in a stable ZK version and there's no timeline for 
> availability, so for now it would help to make SM aware of other users of the 
> global config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process

2016-07-25 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-13422:
-

 Summary: ZKDelegationTokenSecretManager JaasConfig does not work 
well with other ZK users in process
 Key: HADOOP-13422
 URL: https://issues.apache.org/jira/browse/HADOOP-13422
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


There's a race where old config 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-07-25 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392570#comment-15392570
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

ping? We are doing this via reflection in Hive now, in certain scenarios, and 
it appears to work as intended.

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-07-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375753#comment-15375753
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

Also can someone please assign this to me?

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-07-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368247#comment-15368247
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

[~cnauroth] ping?

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-06-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356326#comment-15356326
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

ping?

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-06-13 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: HADOOP-13081.01.patch

Updated the patch accordingly.

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-06-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323541#comment-15323541
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

[~cnauroth] ping?

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-05-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305103#comment-15305103
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

Simple patch. Can someone please review? (and also assign to me; looks like I 
don't have permissions to assign)

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-05-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Attachment: HADOOP-13081.patch

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-05-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Status: Patch Available  (was: Open)

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: HADOOP-13081.patch
>
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-05-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276928#comment-15276928
 ] 

Sergey Shelukhin commented on HADOOP-13066:
---

Thanks for the pointer, that method solves the problem. Interestingly, it sets 
keytabFile and keytabPrincipal statics, but not loginUser.

> UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
> --
>
> Key: HADOOP-13066
> URL: https://issues.apache.org/jira/browse/HADOOP-13066
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Reporter: Sergey Shelukhin
>
> When calling loginFromKerberos, a static variable is set up with the result. 
> If someone logs in as a different user from a different thread, the call to 
> getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-05-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267553#comment-15267553
 ] 

Sergey Shelukhin commented on HADOOP-13081:
---

[~cnauroth] [~sseth] fyi

> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-05-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Description: 
We have a scenario where we log in with kerberos as a certain user for some 
tasks, but also want to add tokens to the resulting UGI that would be specific 
to each task. We don't want to authenticate with kerberos for every task.
I am not sure how this can be accomplished with the existing UGI interface. 
Perhaps some clone method would be helpful, similar to createProxyUser minus 
the proxy stuff; or it could just relogin anew from ticket cache. 
getUGIFromTicketCache seems like the best option in existing code, but there 
doesn't appear to be a consistent way of handling ticket cache location - the 
above method, that I only see called in test, is using a config setting that is 
not used anywhere else, and the env variable for the location that is used in 
the main ticket cache related methods is not set uniformly on all paths - 
therefore, trying to find the correct ticket cache and passing it via the 
config setting to getUGIFromTicketCache seems even hackier than doing the clone 
via reflection ;) Moreover, getUGIFromTicketCache ignores the user parameter on 
the main path - it logs a warning for multiple principals and then logs in with 
first available.

  was:
We have a scenario where we log in with kerberos as a certain user for some 
tasks, but also want to add tokens to the resulting UGI that would be specific 
to each task. We don't want to authenticate with kerberos for every task.
I am not sure how this can be accomplished with the existing UGI interface. 
Perhaps some clone method would be helpful, similar to createProxyUser minus 
the proxy stuff; or it could just relogin anew from ticket cache. 
getUGIFromTicketCache seems like the best option in existing code, but there 
doesn't appear to be a consistent way of handling ticket cache location - the 
above method, that I only see called in test, is using a config setting that is 
not used anywhere else, and the env variable for the location is not set on all 
paths - trying to find the correct ticket cache and setting it in the config 
for getUGIFromTicketCache seems even hackier than doing the clone via 
reflection ;)


> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location that is used 
> in the main ticket cache related methods is not set uniformly on all paths - 
> therefore, trying to find the correct ticket cache and passing it via the 
> config setting to getUGIFromTicketCache seems even hackier than doing the 
> clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user 
> parameter on the main path - it logs a warning for multiple principals and 
> then logs in with first available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-05-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-13081:
--
Description: 
We have a scenario where we log in with kerberos as a certain user for some 
tasks, but also want to add tokens to the resulting UGI that would be specific 
to each task. We don't want to authenticate with kerberos for every task.
I am not sure how this can be accomplished with the existing UGI interface. 
Perhaps some clone method would be helpful, similar to createProxyUser minus 
the proxy stuff; or it could just relogin anew from ticket cache. 
getUGIFromTicketCache seems like the best option in existing code, but there 
doesn't appear to be a consistent way of handling ticket cache location - the 
above method, that I only see called in test, is using a config setting that is 
not used anywhere else, and the env variable for the location is not set on all 
paths - trying to find the correct ticket cache and setting it in the config 
for getUGIFromTicketCache seems even hackier than doing the clone via 
reflection ;)

  was:
We have a scenario where we log in with kerberos as a certain user for some 
tasks, but also want to add tokens to the resulting UGI that would be specific 
to each task. 
I am not sure how this can be accomplished with the existing UGI interface. 
Perhaps some clone method would be helpful, similar to createProxyUser minus 
the proxy stuff; or it could just relogin anew from ticket cache. 
getUGIFromTicketCache seems like the best option in existing code, but there 
doesn't appear to be a consistent way of handling ticket cache location - the 
above method, that I only see called in test, is using a config setting that is 
not used anywhere else, and the env variable for the location is not set on all 
paths - trying to find the correct ticket cache and setting it in the config 
for getUGIFromTicketCache seems even hackier than doing the clone via 
reflection ;)


> add the ability to create multiple UGIs/subjects from one kerberos login
> 
>
> Key: HADOOP-13081
> URL: https://issues.apache.org/jira/browse/HADOOP-13081
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> We have a scenario where we log in with kerberos as a certain user for some 
> tasks, but also want to add tokens to the resulting UGI that would be 
> specific to each task. We don't want to authenticate with kerberos for every 
> task.
> I am not sure how this can be accomplished with the existing UGI interface. 
> Perhaps some clone method would be helpful, similar to createProxyUser minus 
> the proxy stuff; or it could just relogin anew from ticket cache. 
> getUGIFromTicketCache seems like the best option in existing code, but there 
> doesn't appear to be a consistent way of handling ticket cache location - the 
> above method, that I only see called in test, is using a config setting that 
> is not used anywhere else, and the env variable for the location is not set 
> on all paths - trying to find the correct ticket cache and setting it in the 
> config for getUGIFromTicketCache seems even hackier than doing the clone via 
> reflection ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login

2016-05-02 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-13081:
-

 Summary: add the ability to create multiple UGIs/subjects from one 
kerberos login
 Key: HADOOP-13081
 URL: https://issues.apache.org/jira/browse/HADOOP-13081
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sergey Shelukhin


We have a scenario where we log in with kerberos as a certain user for some 
tasks, but also want to add tokens to the resulting UGI that would be specific 
to each task. 
I am not sure how this can be accomplished with the existing UGI interface. 
Perhaps some clone method would be helpful, similar to createProxyUser minus 
the proxy stuff; or it could just relogin anew from ticket cache. 
getUGIFromTicketCache seems like the best option in existing code, but there 
doesn't appear to be a consistent way of handling ticket cache location - the 
above method, that I only see called in test, is using a config setting that is 
not used anywhere else, and the env variable for the location is not set on all 
paths - trying to find the correct ticket cache and setting it in the config 
for getUGIFromTicketCache seems even hackier than doing the clone via 
reflection ;)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-04-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261302#comment-15261302
 ] 

Sergey Shelukhin commented on HADOOP-13066:
---

Yeah, this is the method. Class lock doesn't protect across several method 
calls. See my example above...
Thread 1 callls loginUserFromKeytab("hdfs", ...)
Thread 2 calls loginUserFromKeytab("hbase", ...)
Thread 1 calls getLoginUser and will get loginUser hbase.

> UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
> --
>
> Key: HADOOP-13066
> URL: https://issues.apache.org/jira/browse/HADOOP-13066
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> When calling loginFromKerberos, a static variable is set up with the result. 
> If someone logs in as a different user from a different thread, the call to 
> getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-04-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261210#comment-15261210
 ] 

Sergey Shelukhin commented on HADOOP-13066:
---

yes, but nothing synchronizes two calls... t1: login(user1), t2: login(user2), 
t1: getloginuser(), t2: getloginuser()... I think the simplest way to fix would 
be to have login method also return the UGI

> UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
> --
>
> Key: HADOOP-13066
> URL: https://issues.apache.org/jira/browse/HADOOP-13066
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> When calling loginFromKerberos, a static variable is set up with the result. 
> If someone logs in as a different user from a different thread, the call to 
> getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe

2016-04-27 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-13066:
-

 Summary: UserGroupInformation.loginWithKerberos/getLoginUser is 
not thread-safe
 Key: HADOOP-13066
 URL: https://issues.apache.org/jira/browse/HADOOP-13066
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Sergey Shelukhin


When calling loginFromKerberos, a static variable is set up with the result. If 
someone logs in as a different user from a different thread, the call to 
getLoginUser will not return the correct UGI.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12697) IPC retry policies should recognise that SASL auth failures are unrecoverable

2016-01-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089717#comment-15089717
 ] 

Sergey Shelukhin commented on HADOOP-12697:
---

The retries in this case were per-exception (see YARN RMProxy), but they don't 
specify SaslException/GSSException anywhere. I am not sure if this is an issue 
with retry policy setup in YARN, or in ipc.Client.

> IPC retry policies should recognise that SASL auth failures are unrecoverable
> -
>
> Key: HADOOP-12697
> URL: https://issues.apache.org/jira/browse/HADOOP-12697
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: ipc
>Affects Versions: 2.7.1
> Environment: Cluster with kerberos on and client not calling with the 
> right credentials
>Reporter: Steve Loughran
>Priority: Minor
>
> SLIDER-1050 shows that if you don't have the right kerberos settings, the 
> Yarn client IPC channel blocks retrying to the talk to the RM, retrying 
> repeatedly
> {noformat}
> 2016-01-07 02:50:45,111 [main] WARN  ipc.Client - Exception encountered while 
> connecting to the server :
>  javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException:
>  No valid credentials provided (Mechanism level: Failed to find any Kerberos 
> tgt)]
> {noformat}
> SASL exceptions need to be recognised as irreconcilable authentication 
> failures, rather than generic IOEs that might go away if you retry



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer

2015-11-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-12567:
--
Attachment: HADOOP-12567.01.patch

Fixed. The time is ripe to increase the limit to 100 :P

> NPE in SaslRpcServer
> 
>
> Key: HADOOP-12567
> URL: https://issues.apache.org/jira/browse/HADOOP-12567
> Project: Hadoop Common
>  Issue Type: Task
>Affects Versions: 2.7.0, 2.7.1
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-12567.01.patch, HADOOP-12567.patch
>
>
> {noformat}
> if (LOG.isDebugEnabled()) {
> String username =
>   getIdentifier(authzid, secretManager).getUser().getUserName();
> LOG.debug("SASL server DIGEST-MD5 callback: setting "
> + "canonicalized client ID: " + username);
>   }
> {noformat}
> Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
> (and others), I can see that getUser method can return null. If debug logging 
> is enabled, this NPEs.
> If getUser is not expected to return NULL, it should either be checked and 
> erred upon better here, or the error should be allowed to happen where it 
> would otherwise happen, not in some debug log path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12567) NPE in SaslRpcServer

2015-11-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007111#comment-15007111
 ] 

Sergey Shelukhin commented on HADOOP-12567:
---

Test failures do not look related.

> NPE in SaslRpcServer
> 
>
> Key: HADOOP-12567
> URL: https://issues.apache.org/jira/browse/HADOOP-12567
> Project: Hadoop Common
>  Issue Type: Task
>Affects Versions: 2.7.0, 2.7.1
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-12567.patch
>
>
> {noformat}
> if (LOG.isDebugEnabled()) {
> String username =
>   getIdentifier(authzid, secretManager).getUser().getUserName();
> LOG.debug("SASL server DIGEST-MD5 callback: setting "
> + "canonicalized client ID: " + username);
>   }
> {noformat}
> Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
> (and others), I can see that getUser method can return null. If debug logging 
> is enabled, this NPEs.
> If getUser is not expected to return NULL, it should either be checked and 
> erred upon better here, or the error should be allowed to happen where it 
> would otherwise happen, not in some debug log path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer

2015-11-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-12567:
--
Attachment: HADOOP-12567.patch

> NPE in SaslRpcServer
> 
>
> Key: HADOOP-12567
> URL: https://issues.apache.org/jira/browse/HADOOP-12567
> Project: Hadoop Common
>  Issue Type: Task
>Affects Versions: 2.7.0, 2.7.1
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-12567.patch
>
>
> {noformat}
> if (LOG.isDebugEnabled()) {
> String username =
>   getIdentifier(authzid, secretManager).getUser().getUserName();
> LOG.debug("SASL server DIGEST-MD5 callback: setting "
> + "canonicalized client ID: " + username);
>   }
> {noformat}
> Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
> (and others), I can see that getUser method can return null. If debug logging 
> is enabled, this NPEs.
> If getUser is not expected to return NULL, it should either be checked and 
> erred upon better here, or the error should be allowed to happen where it 
> would otherwise happen, not in some debug log path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer

2015-11-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-12567:
--
Affects Version/s: 2.7.0
   2.7.1
   Status: Patch Available  (was: Open)

> NPE in SaslRpcServer
> 
>
> Key: HADOOP-12567
> URL: https://issues.apache.org/jira/browse/HADOOP-12567
> Project: Hadoop Common
>  Issue Type: Task
>Affects Versions: 2.7.1, 2.7.0
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HADOOP-12567.patch
>
>
> {noformat}
> if (LOG.isDebugEnabled()) {
> String username =
>   getIdentifier(authzid, secretManager).getUser().getUserName();
> LOG.debug("SASL server DIGEST-MD5 callback: setting "
> + "canonicalized client ID: " + username);
>   }
> {noformat}
> Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
> (and others), I can see that getUser method can return null. If debug logging 
> is enabled, this NPEs.
> If getUser is not expected to return NULL, it should either be checked and 
> erred upon better here, or the error should be allowed to happen where it 
> would otherwise happen, not in some debug log path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer

2015-11-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-12567:
--
Description: 
{noformat}
if (LOG.isDebugEnabled()) {
String username =
  getIdentifier(authzid, secretManager).getUser().getUserName();
LOG.debug("SASL server DIGEST-MD5 callback: setting "
+ "canonicalized client ID: " + username);
  }
{noformat}

Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
(and others), I can see that getUser method can return null. If debug logging 
is enabled, this NPEs.
If getUser is not expected to return NULL, it should either be checked and 
erred upon better here, or the error should be allowed to happen where it would 
otherwise happen, not in some debug log path.

  was:
{noformat}
if (LOG.isDebugEnabled()) {
String username =
  getIdentifier(authzid, secretManager).getUser().getUserName();
LOG.debug("SASL server DIGEST-MD5 callback: setting "
+ "canonicalized client ID: " + username);
  }
{noformat}

Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
(and others), I can see that getUser method can return null. If debug logging 
is enabled, this NPEs.
If getUser is not expected to return NULL, it should either be checked and 
erred upon better here, or the error should be allowed to happen where it would 
otherwise happen, not in some debug log statement.


> NPE in SaslRpcServer
> 
>
> Key: HADOOP-12567
> URL: https://issues.apache.org/jira/browse/HADOOP-12567
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>
> {noformat}
> if (LOG.isDebugEnabled()) {
> String username =
>   getIdentifier(authzid, secretManager).getUser().getUserName();
> LOG.debug("SASL server DIGEST-MD5 callback: setting "
> + "canonicalized client ID: " + username);
>   }
> {noformat}
> Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
> (and others), I can see that getUser method can return null. If debug logging 
> is enabled, this NPEs.
> If getUser is not expected to return NULL, it should either be checked and 
> erred upon better here, or the error should be allowed to happen where it 
> would otherwise happen, not in some debug log path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12567) NPE in SaslRpcServer

2015-11-12 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-12567:
-

 Summary: NPE in SaslRpcServer
 Key: HADOOP-12567
 URL: https://issues.apache.org/jira/browse/HADOOP-12567
 Project: Hadoop Common
  Issue Type: Task
Reporter: Sergey Shelukhin


{noformat}
if (LOG.isDebugEnabled()) {
String username =
  getIdentifier(authzid, secretManager).getUser().getUserName();
LOG.debug("SASL server DIGEST-MD5 callback: setting "
+ "canonicalized client ID: " + username);
  }
{noformat}

Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
(and others), I can see that getUser method can return null. If debug logging 
is enabled, this NPEs.
If getUser is not expected to return NULL, it should either be checked and 
erred upon better here, or the error should be allowed to happen where it would 
otherwise happen, not in some debug log statement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HADOOP-12567) NPE in SaslRpcServer

2015-11-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HADOOP-12567:
-

Assignee: Sergey Shelukhin

> NPE in SaslRpcServer
> 
>
> Key: HADOOP-12567
> URL: https://issues.apache.org/jira/browse/HADOOP-12567
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> {noformat}
> if (LOG.isDebugEnabled()) {
> String username =
>   getIdentifier(authzid, secretManager).getUser().getUserName();
> LOG.debug("SASL server DIGEST-MD5 callback: setting "
> + "canonicalized client ID: " + username);
>   }
> {noformat}
> Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier 
> (and others), I can see that getUser method can return null. If debug logging 
> is enabled, this NPEs.
> If getUser is not expected to return NULL, it should either be checked and 
> erred upon better here, or the error should be allowed to happen where it 
> would otherwise happen, not in some debug log path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-11771) Configuration::getClassByNameOrNull synchronizes on a static object

2015-03-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-11771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389062#comment-14389062
 ] 

Sergey Shelukhin commented on HADOOP-11771:
---

Why don't we just stop using ReflectionUtils? 

> Configuration::getClassByNameOrNull synchronizes on a static object
> ---
>
> Key: HADOOP-11771
> URL: https://issues.apache.org/jira/browse/HADOOP-11771
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: conf, io, ipc
>Reporter: Gopal V
> Attachments: configuration-cache-bt.png, configuration-sync-cache.png
>
>
> {code}
>   private static final Map>>>
> CACHE_CLASSES = new WeakHashMap WeakReference>>>();
> ...
>  synchronized (CACHE_CLASSES) {
>   map = CACHE_CLASSES.get(classLoader);
>   if (map == null) {
> map = Collections.synchronizedMap(
>   new WeakHashMap>>());
> CACHE_CLASSES.put(classLoader, map);
>   }
> }
> {code}
> !configuration-sync-cache.png!
> !configuration-cache-bt.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-10555) add offset support to MurmurHash

2014-04-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986197#comment-13986197
 ] 

Sergey Shelukhin commented on HADOOP-10555:
---

[~t3rmin4t0r] fyi

> add offset support to MurmurHash
> 
>
> Key: HADOOP-10555
> URL: https://issues.apache.org/jira/browse/HADOOP-10555
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Priority: Trivial
> Attachments: HADOOP-10555.patch
>
>
> From HIVE-6430 code review



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10555) add offset support to MurmurHash

2014-04-30 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-10555:
--

Status: Patch Available  (was: Open)

> add offset support to MurmurHash
> 
>
> Key: HADOOP-10555
> URL: https://issues.apache.org/jira/browse/HADOOP-10555
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Priority: Trivial
> Attachments: HADOOP-10555.patch
>
>
> From HIVE-6430 code review



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HADOOP-10555) add offset support to MurmurHash

2014-04-30 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HADOOP-10555:
--

Attachment: HADOOP-10555.patch

can someone please assign to me? I don't have permissions to assign

> add offset support to MurmurHash
> 
>
> Key: HADOOP-10555
> URL: https://issues.apache.org/jira/browse/HADOOP-10555
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Priority: Trivial
> Attachments: HADOOP-10555.patch
>
>
> From HIVE-6430 code review



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10555) add offset support to MurmurHash

2014-04-30 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HADOOP-10555:
-

 Summary: add offset support to MurmurHash
 Key: HADOOP-10555
 URL: https://issues.apache.org/jira/browse/HADOOP-10555
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Priority: Trivial


>From HIVE-6430 code review



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HADOOP-9487) Deprecation warnings in Configuration should go to their own log or otherwise be suppressible

2013-08-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738771#comment-13738771
 ] 

Sergey Shelukhin commented on HADOOP-9487:
--

ping?

> Deprecation warnings in Configuration should go to their own log or otherwise 
> be suppressible
> -
>
> Key: HADOOP-9487
> URL: https://issues.apache.org/jira/browse/HADOOP-9487
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: HADOOP-9487.patch, HADOOP-9487.patch
>
>
> Running local pig jobs triggers large quantities of warnings about deprecated 
> properties -something I don't care about as I'm not in a position to fix 
> without delving into Pig. 
> I can suppress them by changing the log level, but that can hide other 
> warnings that may actually matter.
> If there was a special Configuration.deprecated log for all deprecation 
> messages, this log could be suppressed by people who don't want noisy logs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HADOOP-9487) Deprecation warnings in Configuration should go to their own log or otherwise be suppressible

2013-07-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702457#comment-13702457
 ] 

Sergey Shelukhin commented on HADOOP-9487:
--

This warning is also output in HBase shell.
The latest patch looks reasonable

> Deprecation warnings in Configuration should go to their own log or otherwise 
> be suppressible
> -
>
> Key: HADOOP-9487
> URL: https://issues.apache.org/jira/browse/HADOOP-9487
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: conf
>Affects Versions: 3.0.0
>Reporter: Steve Loughran
> Attachments: HADOOP-9487.patch, HADOOP-9487.patch
>
>
> Running local pig jobs triggers large quantities of warnings about deprecated 
> properties -something I don't care about as I'm not in a position to fix 
> without delving into Pig. 
> I can suppress them by changing the log level, but that can hide other 
> warnings that may actually matter.
> If there was a special Configuration.deprecated log for all deprecation 
> messages, this log could be suppressed by people who don't want noisy logs

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira