[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.
[ https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448566#comment-16448566 ] Sergey Shelukhin commented on HADOOP-15403: --- [~jlowe] would a change in config be ok? I think it is better to add another config, but we can also make the existing one "true, false, -file not found- ignore", where ignore will have the new behavior. False can still work for people if they override listFiles. [~ste...@apache.org] will fix both with other concerns once we decide on those. > FileInputFormat recursive=false fails instead of ignoring the directories. > -- > > Key: HADOOP-15403 > URL: https://issues.apache.org/jira/browse/HADOOP-15403 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HADOOP-15403.patch > > > We are trying to create a split in Hive that will only read files in a > directory and not subdirectories. > That fails with the below error. > Given how this error comes about (two pieces of code interact, one explicitly > adding directories to results without failing, and one failing on any > directories in results), this seems like a bug. > {noformat} > Caused by: java.io.IOException: Not a file: > file:/,...warehouse/simple_to_mm_text/delta_001_001_ > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) > ~[hadoop-mapreduce-client-core-3.1.0.jar:?] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > {noformat} > This code, when recursion is disabled, adds directories to results > {noformat} > if (recursive && stat.isDirectory()) { > result.dirsNeedingRecursiveCalls.add(stat); > } else { > result.locatedFileStatuses.add(stat); > } > {noformat} > However the getSplits code after that computes the size like this > {noformat} > long totalSize = 0; // compute total size > for (FileStatus file: files) {// check we have valid files > if (file.isDirectory()) { > throw new IOException("Not a file: "+ file.getPath()); > } > totalSize += > {noformat} > which would always fail combined with the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.
[ https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16446460#comment-16446460 ] Sergey Shelukhin commented on HADOOP-15403: --- I fixed in getSplits since list... can be overridden by implementations, that can probably still return directories. It would be good to ignore them all, esp. if someone copy-pasted the fetching code. > FileInputFormat recursive=false fails instead of ignoring the directories. > -- > > Key: HADOOP-15403 > URL: https://issues.apache.org/jira/browse/HADOOP-15403 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HADOOP-15403.patch > > > We are trying to create a split in Hive that will only read files in a > directory and not subdirectories. > That fails with the below error. > Given how this error comes about (two pieces of code interact, one explicitly > adding directories to results without failing, and one failing on any > directories in results), this seems like a bug. > {noformat} > Caused by: java.io.IOException: Not a file: > file:/,...warehouse/simple_to_mm_text/delta_001_001_ > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) > ~[hadoop-mapreduce-client-core-3.1.0.jar:?] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > {noformat} > This code, when recursion is disabled, adds directories to results > {noformat} > if (recursive && stat.isDirectory()) { > result.dirsNeedingRecursiveCalls.add(stat); > } else { > result.locatedFileStatuses.add(stat); > } > {noformat} > However the getSplits code after that computes the size like this > {noformat} > long totalSize = 0; // compute total size > for (FileStatus file: files) {// check we have valid files > if (file.isDirectory()) { > throw new IOException("Not a file: "+ file.getPath()); > } > totalSize += > {noformat} > which would always fail combined with the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.
[ https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15403: -- Status: Patch Available (was: Open) [~gsaha] [~leftnoteasy] can you take a look? > FileInputFormat recursive=false fails instead of ignoring the directories. > -- > > Key: HADOOP-15403 > URL: https://issues.apache.org/jira/browse/HADOOP-15403 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HADOOP-15403.patch > > > We are trying to create a split in Hive that will only read files in a > directory and not subdirectories. > That fails with the below error. > Given how this error comes about (two pieces of code interact, one explicitly > adding directories to results without failing, and one failing on any > directories in results), this seems like a bug. > {noformat} > Caused by: java.io.IOException: Not a file: > file:/,...warehouse/simple_to_mm_text/delta_001_001_ > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) > ~[hadoop-mapreduce-client-core-3.1.0.jar:?] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > {noformat} > This code, when recursion is disabled, adds directories to results > {noformat} > if (recursive && stat.isDirectory()) { > result.dirsNeedingRecursiveCalls.add(stat); > } else { > result.locatedFileStatuses.add(stat); > } > {noformat} > However the getSplits code after that computes the size like this > {noformat} > long totalSize = 0; // compute total size > for (FileStatus file: files) {// check we have valid files > if (file.isDirectory()) { > throw new IOException("Not a file: "+ file.getPath()); > } > totalSize += > {noformat} > which would always fail combined with the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.
[ https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15403: -- Attachment: HADOOP-15403.patch > FileInputFormat recursive=false fails instead of ignoring the directories. > -- > > Key: HADOOP-15403 > URL: https://issues.apache.org/jira/browse/HADOOP-15403 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HADOOP-15403.patch > > > We are trying to create a split in Hive that will only read files in a > directory and not subdirectories. > That fails with the below error. > Given how this error comes about (two pieces of code interact, one explicitly > adding directories to results without failing, and one failing on any > directories in results), this seems like a bug. > {noformat} > Caused by: java.io.IOException: Not a file: > file:/,...warehouse/simple_to_mm_text/delta_001_001_ > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) > ~[hadoop-mapreduce-client-core-3.1.0.jar:?] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > {noformat} > This code, when recursion is disabled, adds directories to results > {noformat} > if (recursive && stat.isDirectory()) { > result.dirsNeedingRecursiveCalls.add(stat); > } else { > result.locatedFileStatuses.add(stat); > } > {noformat} > However the getSplits code after that computes the size like this > {noformat} > long totalSize = 0; // compute total size > for (FileStatus file: files) {// check we have valid files > if (file.isDirectory()) { > throw new IOException("Not a file: "+ file.getPath()); > } > totalSize += > {noformat} > which would always fail combined with the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.
[ https://issues.apache.org/jira/browse/HADOOP-15403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HADOOP-15403: - Assignee: Sergey Shelukhin > FileInputFormat recursive=false fails instead of ignoring the directories. > -- > > Key: HADOOP-15403 > URL: https://issues.apache.org/jira/browse/HADOOP-15403 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > > We are trying to create a split in Hive that will only read files in a > directory and not subdirectories. > That fails with the below error. > Given how this error comes about (two pieces of code interact, one explicitly > adding directories to results without failing, and one failing on any > directories in results), this seems like a bug. > {noformat} > Caused by: java.io.IOException: Not a file: > file:/,...warehouse/simple_to_mm_text/delta_001_001_ > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) > ~[hadoop-mapreduce-client-core-3.1.0.jar:?] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) > ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] > {noformat} > This code, when recursion is disabled, adds directories to results > {noformat} > if (recursive && stat.isDirectory()) { > result.dirsNeedingRecursiveCalls.add(stat); > } else { > result.locatedFileStatuses.add(stat); > } > {noformat} > However the getSplits code after that computes the size like this > {noformat} > long totalSize = 0; // compute total size > for (FileStatus file: files) {// check we have valid files > if (file.isDirectory()) { > throw new IOException("Not a file: "+ file.getPath()); > } > totalSize += > {noformat} > which would always fail combined with the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-15403) FileInputFormat recursive=false fails instead of ignoring the directories.
Sergey Shelukhin created HADOOP-15403: - Summary: FileInputFormat recursive=false fails instead of ignoring the directories. Key: HADOOP-15403 URL: https://issues.apache.org/jira/browse/HADOOP-15403 Project: Hadoop Common Issue Type: Bug Reporter: Sergey Shelukhin We are trying to create a split in Hive that will only read files in a directory and not subdirectories. That fails with the below error. Given how this error comes about (two pieces of code interact, one explicitly adding directories to results without failing, and one failing on any directories in results), this seems like a bug. {noformat} Caused by: java.io.IOException: Not a file: file:/,...warehouse/simple_to_mm_text/delta_001_001_ at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:329) ~[hadoop-mapreduce-client-core-3.1.0.jar:?] at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:553) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:754) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:203) ~[hive-exec-3.1.0-SNAPSHOT.jar:3.1.0-SNAPSHOT] {noformat} This code, when recursion is disabled, adds directories to results {noformat} if (recursive && stat.isDirectory()) { result.dirsNeedingRecursiveCalls.add(stat); } else { result.locatedFileStatuses.add(stat); } {noformat} However the getSplits code after that computes the size like this {noformat} long totalSize = 0; // compute total size for (FileStatus file: files) {// check we have valid files if (file.isDirectory()) { throw new IOException("Not a file: "+ file.getPath()); } totalSize += {noformat} which would always fail combined with the above code. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351016#comment-16351016 ] Sergey Shelukhin commented on HADOOP-15171: --- Update: turns out, end() was a red herring after all - any reuse of the same object without calling reset causes the issue. Given that the object does not support the zlib library model of repeatedly calling inflate with more data, it basically never makes sense to call decompress without calling reset. Perhaps the call should be built in? I cannot find whether zlib itself actually requires one to reset (at least, for the continuous decompression case, it doesn't look like it's the case), so perhaps cleanup could be improved too. At any rate, error handling should be fixed to not return 0. > native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly > handles some zlib errors > - > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15171) native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15171: -- Summary: native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly handles some zlib errors (was: native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor; also incorrrectly handles some zlib errors) > native ZLIB decompressor produces 0 bytes on the 2nd call; also incorrrectly > handles some zlib errors > - > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15171) native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor; also incorrrectly handles some zlib errors
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15171: -- Summary: native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor; also incorrrectly handles some zlib errors (was: Hadoop native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor) > native ZLIB decompressor produces 0 bytes after end() is called on a > different decompressor; also incorrrectly handles some zlib errors > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15171: -- Summary: Hadoop native ZLIB decompressor produces 0 bytes after end() is called on a different decompressor (was: Hadoop native ZLIB decompressor produces 0 bytes for some input) > Hadoop native ZLIB decompressor produces 0 bytes after end() is called on a > different decompressor > -- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349513#comment-16349513 ] Sergey Shelukhin commented on HADOOP-15171: --- Ok, here's repro code. I get the bar exception if dd2 is added. Note that dd2 is not used for anything and is not related in any way to dd1. Note that if I end dd1 and then reuse it, I get a NPE in Java code. But if I end dd2, internals of Java object are not affected in dd1; looks like the native side has some issue. {noformat} dest.position(startPos); dest.limit(startLim); src.position(startSrcPos); src.limit(startSrcLim); ZlibDecompressor.ZlibDirectDecompressor dd1 = new ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0); dd1.decompress(src, dest); dest.limit(dest.position()); // Set the new limit to where the decompressor stopped. dest.position(startPos); if (dest.remaining() == 0) { throw new RuntimeException("foo"); } ZlibDecompressor.ZlibDirectDecompressor dd2 = new ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0); dest.position(startPos); dest.limit(startLim); src.position(startSrcPos); src.limit(startSrcLim); dd2.end(); dd1.decompress(src, dest); dest.limit(dest.position()); // Set the new limit to where the decompressor stopped. dest.position(startPos); if (dest.remaining() == 0) { throw new RuntimeException("bar"); } {noformat} As a side note, Z_BUF_ERROR error in the native code is not processed correctly. See the detailed example for this error here http://zlib.net/zlib_how.html ; given that neither the Java nor native code handles partial reads; and nothing propagates the state to the caller, this should throw an error just like Z_DATA_ERROR. The buffer address null checks should probably also throw and not exit silently. Z_NEED_DICT handling is also suspicious. Does anything actually handle this? > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349513#comment-16349513 ] Sergey Shelukhin edited comment on HADOOP-15171 at 2/1/18 11:43 PM: Ok, here's repro code. I get the bar exception if dd2 is added. Note that dd2 is not used for anything and is not related in any way to dd1. If I instead end dd1 and then reuse it, I get a NPE in Java code. But if I end dd2, internals of Java object are not affected in dd1; looks like the native side has some issue. {noformat} dest.position(startPos); dest.limit(startLim); src.position(startSrcPos); src.limit(startSrcLim); ZlibDecompressor.ZlibDirectDecompressor dd1 = new ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0); dd1.decompress(src, dest); dest.limit(dest.position()); // Set the new limit to where the decompressor stopped. dest.position(startPos); if (dest.remaining() == 0) { throw new RuntimeException("foo"); } ZlibDecompressor.ZlibDirectDecompressor dd2 = new ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0); dest.position(startPos); dest.limit(startLim); src.position(startSrcPos); src.limit(startSrcLim); dd2.end(); dd1.decompress(src, dest); dest.limit(dest.position()); // Set the new limit to where the decompressor stopped. dest.position(startPos); if (dest.remaining() == 0) { throw new RuntimeException("bar"); } {noformat} As a side note, Z_BUF_ERROR error in the native code is not processed correctly. See the detailed example for this error here http://zlib.net/zlib_how.html ; given that neither the Java nor native code handles partial reads; and nothing propagates the state to the caller, this should throw an error just like Z_DATA_ERROR. The buffer address null checks should probably also throw and not exit silently. Z_NEED_DICT handling is also suspicious. Does anything actually handle this? was (Author: sershe): Ok, here's repro code. I get the bar exception if dd2 is added. Note that dd2 is not used for anything and is not related in any way to dd1. Note that if I end dd1 and then reuse it, I get a NPE in Java code. But if I end dd2, internals of Java object are not affected in dd1; looks like the native side has some issue. {noformat} dest.position(startPos); dest.limit(startLim); src.position(startSrcPos); src.limit(startSrcLim); ZlibDecompressor.ZlibDirectDecompressor dd1 = new ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0); dd1.decompress(src, dest); dest.limit(dest.position()); // Set the new limit to where the decompressor stopped. dest.position(startPos); if (dest.remaining() == 0) { throw new RuntimeException("foo"); } ZlibDecompressor.ZlibDirectDecompressor dd2 = new ZlibDecompressor.ZlibDirectDecompressor(CompressionHeader.NO_HEADER, 0); dest.position(startPos); dest.limit(startLim); src.position(startSrcPos); src.limit(startSrcLim); dd2.end(); dd1.decompress(src, dest); dest.limit(dest.position()); // Set the new limit to where the decompressor stopped. dest.position(startPos); if (dest.remaining() == 0) { throw new RuntimeException("bar"); } {noformat} As a side note, Z_BUF_ERROR error in the native code is not processed correctly. See the detailed example for this error here http://zlib.net/zlib_how.html ; given that neither the Java nor native code handles partial reads; and nothing propagates the state to the caller, this should throw an error just like Z_DATA_ERROR. The buffer address null checks should probably also throw and not exit silently. Z_NEED_DICT handling is also suspicious. Does anything actually handle this? > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 1
[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349307#comment-16349307 ] Sergey Shelukhin commented on HADOOP-15171: --- Hmm, nm, it might be a red herring > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349238#comment-16349238 ] Sergey Shelukhin commented on HADOOP-15171: --- Tentative cause (still confirming) - calling end() on ZlibDirectDecompressor breaks other unrelated ZlibDirectDecompressor-s. So it may not be related to buffers as such. > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Assignee: Lokesh Jain >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15171: -- Priority: Blocker (was: Critical) > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Priority: Blocker > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344184#comment-16344184 ] Sergey Shelukhin commented on HADOOP-15171: --- [~ste...@apache.org] [~jnp] is it possible to get some traction on this actually? We now also have to work around this in ORC project, and this is becoming a pain > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Priority: Critical > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15171: -- Fix Version/s: 3.0.1 3.1.0 > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Priority: Critical > Fix For: 3.1.0, 3.0.1 > > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16324924#comment-16324924 ] Sergey Shelukhin commented on HADOOP-15171: --- [~jnp] [~hagleitn] fyi > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Priority: Critical > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to > JDK decompressor with memcopy; got 155 bytes > {noformat} > Hadoop version is based on 3.1 snapshot. > The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 > FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
[ https://issues.apache.org/jira/browse/HADOOP-15171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-15171: -- Description: While reading some ORC file via direct buffers, Hive gets a 0-sized buffer for a particular compressed segment of the file. We narrowed it down to Hadoop native ZLIB codec; when the data is copied to heap-based buffer and the JDK Inflater is used, it produces correct output. Input is only 127 bytes so I can paste it here. All the other (many) blocks of the file are decompressed without problems by the same code. {noformat} 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 127 bytes to dest buffer pos 524288, limit 786432 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to JDK decompressor with memcopy; got 155 bytes {noformat} Hadoop version is based on 3.1 snapshot. The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 FWIW. Not sure how to extract versions from those. was: While reading some ORC file via direct buffers, Hive gets a 0-sized buffer for a particular compressed segment of the file. We narrowed it down to Hadoop native ZLIB codec; when the data is copied to heap-based buffer and the JDK Inflater is used, it produces correct output. Input is only 127 bytes so I can paste it here. {noformat} 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 127 bytes to dest buffer pos 524288, limit 786432 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to JDK decompressor with memcopy; got 155 bytes {noformat} Hadoop version is based on 3.1 snapshot. The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 FWIW. Not sure how to extract versions from those. > Hadoop native ZLIB decompressor produces 0 bytes for some input > --- > > Key: HADOOP-15171 > URL: https://issues.apache.org/jira/browse/HADOOP-15171 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 3.1.0 >Reporter: Sergey Shelukhin >Priority: Critical > > While reading some ORC file via direct buffers, Hive gets a 0-sized buffer > for a particular compressed segment of the file. We narrowed it down to > Hadoop native ZLIB codec; when the data is copied to heap-based buffer and > the JDK Inflater is used, it produces correct output. Input is only 127 bytes > so I can paste it here. > All the other (many) blocks of the file are decompressed without problems by > the same code. > {noformat} > 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing > 127 bytes to dest buffer pos 524288, limit 786432 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has > produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 > 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 > 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa > 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 > b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 > 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 > 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 > (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderI
[jira] [Created] (HADOOP-15171) Hadoop native ZLIB decompressor produces 0 bytes for some input
Sergey Shelukhin created HADOOP-15171: - Summary: Hadoop native ZLIB decompressor produces 0 bytes for some input Key: HADOOP-15171 URL: https://issues.apache.org/jira/browse/HADOOP-15171 Project: Hadoop Common Issue Type: Bug Affects Versions: 3.1.0 Reporter: Sergey Shelukhin Priority: Critical While reading some ORC file via direct buffers, Hive gets a 0-sized buffer for a particular compressed segment of the file. We narrowed it down to Hadoop native ZLIB codec; when the data is copied to heap-based buffer and the JDK Inflater is used, it produces correct output. Input is only 127 bytes so I can paste it here. {noformat} 2018-01-13T02:47:40,815 TRACE [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Decompressing 127 bytes to dest buffer pos 524288, limit 786432 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: The codec has produced 0 bytes for 127 bytes at pos 0, data hash 1719565039: [e3 92 e1 62 66 60 60 10 12 e5 98 e0 27 c4 c7 f1 e8 12 8f 40 c3 7b 5e 89 09 7f 6e 74 73 04 30 70 c9 72 b1 30 14 4d 60 82 49 37 bd e7 15 58 d0 cd 2f 31 a1 a1 e3 35 4c fa 15 a3 02 4c 7a 51 37 bf c0 81 e5 02 12 13 5a b6 9f e2 04 ea 96 e3 62 65 b8 c3 b4 01 ae fd d0 72 01 81 07 87 05 25 26 74 3c 5b c9 05 35 fd 0a b3 03 50 7b 83 11 c8 f2 c3 82 02 0f 96 0b 49 34 7c fa ff 9f 2d 80 01 00 2018-01-13T02:47:40,816 WARN [IO-Elevator-Thread-0 (1515637158315_0079_1_00_00_0)] encoded.EncodedReaderImpl: Fell back to JDK decompressor with memcopy; got 155 bytes {noformat} Hadoop version is based on 3.1 snapshot. The size of libhadoop.so is 824403 bytes, and libgplcompression is 78273 FWIW. Not sure how to extract versions from those. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13500) Concurrency issues when using Configuration iterator
[ https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183481#comment-16183481 ] Sergey Shelukhin commented on HADOOP-13500: --- [~jnp] can we fix get this fixed eventually? ;) > Concurrency issues when using Configuration iterator > > > Key: HADOOP-13500 > URL: https://issues.apache.org/jira/browse/HADOOP-13500 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Jason Lowe > > It is possible to encounter a ConcurrentModificationException while trying to > iterate a Configuration object. The iterator method tries to walk the > underlying Property object without proper synchronization, so another thread > simultaneously calling the set method can trigger it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-13500) Concurrency issues when using Configuration iterator
[ https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183478#comment-16183478 ] Sergey Shelukhin edited comment on HADOOP-13500 at 9/28/17 12:07 AM: - Hive is also hitting this issue with a different call stack was (Author: sershe): Hive is also hitting this issue. > Concurrency issues when using Configuration iterator > > > Key: HADOOP-13500 > URL: https://issues.apache.org/jira/browse/HADOOP-13500 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Jason Lowe > > It is possible to encounter a ConcurrentModificationException while trying to > iterate a Configuration object. The iterator method tries to walk the > underlying Property object without proper synchronization, so another thread > simultaneously calling the set method can trigger it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13500) Concurrency issues when using Configuration iterator
[ https://issues.apache.org/jira/browse/HADOOP-13500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16183478#comment-16183478 ] Sergey Shelukhin commented on HADOOP-13500: --- Hive is also hitting this issue. > Concurrency issues when using Configuration iterator > > > Key: HADOOP-13500 > URL: https://issues.apache.org/jira/browse/HADOOP-13500 > Project: Hadoop Common > Issue Type: Bug > Components: conf >Reporter: Jason Lowe > > It is possible to encounter a ConcurrentModificationException while trying to > iterate a Configuration object. The iterator method tries to walk the > underlying Property object without proper synchronization, so another thread > simultaneously calling the set method can trigger it. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14683) FileStatus.compareTo binary compatible issue
[ https://issues.apache.org/jira/browse/HADOOP-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110045#comment-16110045 ] Sergey Shelukhin commented on HADOOP-14683: --- Thanks! > FileStatus.compareTo binary compatible issue > > > Key: HADOOP-14683 > URL: https://issues.apache.org/jira/browse/HADOOP-14683 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.0, 2.8.1 >Reporter: Sergey Shelukhin >Assignee: Akira Ajisaka >Priority: Blocker > Fix For: 2.9.0, 3.0.0-beta1, 2.8.2 > > Attachments: HADOOP-14683-branch-2-01.patch, > HADOOP-14683-branch-2-02.patch > > > See HIVE-17133. Looks like the signature change is causing issues; according > to [~jnp] this is a public API. > Is it possible to add the old overload back (keeping the new one presumably) > in a point release on 2.8? That way we can avoid creating yet another shim > for this in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-14683) FileStatus.compareTo binary compat issue between 2.7 and 2.8
[ https://issues.apache.org/jira/browse/HADOOP-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102470#comment-16102470 ] Sergey Shelukhin commented on HADOOP-14683: --- +1 non-binding > FileStatus.compareTo binary compat issue between 2.7 and 2.8 > > > Key: HADOOP-14683 > URL: https://issues.apache.org/jira/browse/HADOOP-14683 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.8.0, 2.8.1 >Reporter: Sergey Shelukhin >Assignee: Akira Ajisaka >Priority: Blocker > Attachments: HADOOP-14683-branch-2-01.patch > > > See HIVE-17133. Looks like the signature change is causing issues; according > to [~jnp] this is a public API. > Is it possible to add the old overload back (keeping the new one presumably) > in a point release on 2.8? That way we can avoid creating yet another shim > for this in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14684) get rid of "skipCorrupt" flag
Sergey Shelukhin created HADOOP-14684: - Summary: get rid of "skipCorrupt" flag Key: HADOOP-14684 URL: https://issues.apache.org/jira/browse/HADOOP-14684 Project: Hadoop Common Issue Type: Bug Reporter: Sergey Shelukhin The error that caused the issue was a long time ago and it's probably ok to get rid of this flag. Perhaps we should provide a small tool to overwrite these files without the corrupt values. cc [~prasanth_j] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-14683) FileStatus.compareTo binary compat issue between 2.7 and 2.8
[ https://issues.apache.org/jira/browse/HADOOP-14683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-14683: -- Description: See HIVE-17133. Looks like the signature change is causing issues; according to [~jnp] this is a public API. Is it possible to add the old overload back (keeping the new one presumably) in a point release on 2.8? That way we can avoid creating yet another shim for this in Hive. was: See HIVE-17133. Looks like the signature change is causing issues; according to [~jnp] this is a public API. Is it possible to add the old overload back in a point release on 2.8? That way we can avoid creating yet another shim for this in Hive. > FileStatus.compareTo binary compat issue between 2.7 and 2.8 > > > Key: HADOOP-14683 > URL: https://issues.apache.org/jira/browse/HADOOP-14683 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > > See HIVE-17133. Looks like the signature change is causing issues; according > to [~jnp] this is a public API. > Is it possible to add the old overload back (keeping the new one presumably) > in a point release on 2.8? That way we can avoid creating yet another shim > for this in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-14683) FileStatus.compareTo binary compat issue between 2.7 and 2.8
Sergey Shelukhin created HADOOP-14683: - Summary: FileStatus.compareTo binary compat issue between 2.7 and 2.8 Key: HADOOP-14683 URL: https://issues.apache.org/jira/browse/HADOOP-14683 Project: Hadoop Common Issue Type: Bug Reporter: Sergey Shelukhin See HIVE-17133. Looks like the signature change is causing issues; according to [~jnp] this is a public API. Is it possible to add the old overload back in a point release on 2.8? That way we can avoid creating yet another shim for this in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-14214) DomainSocketWatcher::add()/delete() should not self interrupt while looping await()
[ https://issues.apache.org/jira/browse/HADOOP-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937522#comment-15937522 ] Sergey Shelukhin edited comment on HADOOP-14214 at 3/23/17 1:28 AM: Hmm.. the reason we are interrupting the thread in question is because we want it to be interrupted (because the work it's performing is no longer relevant). Wouldn't this just cause it to be stuck forever anyway, or at best to continue a useless operation? cc [~sseth] was (Author: sershe): Hmm.. the reason we are interrupting the thread in question is because we want it to be interrupted (because the work it's performing is no longer relevant). Wouldn't this just cause it to be stuck forever anyway, or at best to continue a useless operation? > DomainSocketWatcher::add()/delete() should not self interrupt while looping > await() > --- > > Key: HADOOP-14214 > URL: https://issues.apache.org/jira/browse/HADOOP-14214 > Project: Hadoop Common > Issue Type: Bug > Components: hdfs-client >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HADOOP-14214.000.patch > > > Our hive team found a TPCDS job whose queries running on LLAP seem to be > getting stuck. Dozens of threads were waiting for the > {{DfsClientShmManager::lock}}, as following jstack: > {code} > Thread 251 (IO-Elevator-Thread-5): > State: WAITING > Blocked count: 3871 > Wtaited count: 4565 > Waiting on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@16ead198 > Stack: > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:255) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017) > > org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:476) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718) > > org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422) > > org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333) > > org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1181) > > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1118) > org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1478) > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1441) > org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111) > > org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readStripeFooter(RecordReaderUtils.java:166) > > org.apache.hadoop.hive.llap.io.metadata.OrcStripeMetadata.(OrcStripeMetadata.java:64) > > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.readStripesMetadata(OrcEncodedDataReader.java:622) > {code} > The thread that is expected to signal those threads is calling > {{DomainSocketWatcher::add()}} method, but it gets stuck there dealing with > InterruptedException infinitely. The jstack is like: > {code} > Thread 44417 (TezTR-257387_2840_12_10_52_0): > State: RUNNABLE > Blocked count: 3 > Wtaited count: 5 > Stack: > java.lang.Throwable.fillInStackTrace(Native Method) > java.lang.Throwable.fillInStackTrace(Throwable.java:783) > java.lang.Throwable.(Throwable.java:250) > java.lang.Exception.(Exception.java:54) > java.lang.InterruptedException.(InterruptedException.java:57) > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034) > > org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:325) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:266) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017) > > org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFac
[jira] [Commented] (HADOOP-14214) DomainSocketWatcher::add()/delete() should not self interrupt while looping await()
[ https://issues.apache.org/jira/browse/HADOOP-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937522#comment-15937522 ] Sergey Shelukhin commented on HADOOP-14214: --- Hmm.. the reason we are interrupting the thread in question is because we want it to be interrupted (because the work it's performing is no longer relevant). Wouldn't this just cause it to be stuck forever anyway, or at best to continue a useless operation? > DomainSocketWatcher::add()/delete() should not self interrupt while looping > await() > --- > > Key: HADOOP-14214 > URL: https://issues.apache.org/jira/browse/HADOOP-14214 > Project: Hadoop Common > Issue Type: Bug > Components: hdfs-client >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Critical > Attachments: HADOOP-14214.000.patch > > > Our hive team found a TPCDS job whose queries running on LLAP seem to be > getting stuck. Dozens of threads were waiting for the > {{DfsClientShmManager::lock}}, as following jstack: > {code} > Thread 251 (IO-Elevator-Thread-5): > State: WAITING > Blocked count: 3871 > Wtaited count: 4565 > Waiting on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@16ead198 > Stack: > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUninterruptibly(AbstractQueuedSynchronizer.java:1976) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:255) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017) > > org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:476) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718) > > org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422) > > org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333) > > org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1181) > > org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1118) > org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1478) > org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1441) > org.apache.hadoop.fs.FSInputStream.readFully(FSInputStream.java:121) > > org.apache.hadoop.fs.FSDataInputStream.readFully(FSDataInputStream.java:111) > > org.apache.orc.impl.RecordReaderUtils$DefaultDataReader.readStripeFooter(RecordReaderUtils.java:166) > > org.apache.hadoop.hive.llap.io.metadata.OrcStripeMetadata.(OrcStripeMetadata.java:64) > > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.readStripesMetadata(OrcEncodedDataReader.java:622) > {code} > The thread that is expected to signal those threads is calling > {{DomainSocketWatcher::add()}} method, but it gets stuck there dealing with > InterruptedException infinitely. The jstack is like: > {code} > Thread 44417 (TezTR-257387_2840_12_10_52_0): > State: RUNNABLE > Blocked count: 3 > Wtaited count: 5 > Stack: > java.lang.Throwable.fillInStackTrace(Native Method) > java.lang.Throwable.fillInStackTrace(Throwable.java:783) > java.lang.Throwable.(Throwable.java:250) > java.lang.Exception.(Exception.java:54) > java.lang.InterruptedException.(InterruptedException.java:57) > > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2034) > > org.apache.hadoop.net.unix.DomainSocketWatcher.add(DomainSocketWatcher.java:325) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager.allocSlot(DfsClientShmManager.java:266) > > org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager.allocSlot(DfsClientShmManager.java:434) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.allocShmSlot(ShortCircuitCache.java:1017) > > org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:476) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:784) > > org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:718) > > org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422) > > org.apache.hadoop.h
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563221#comment-15563221 ] Sergey Shelukhin commented on HADOOP-13081: --- Sorry, I've posted this in the wrong JIRA apparently: The scenario is like this; we accept work on behalf of clients that is, generally speaking, authorized on a higher level (those are fragments of Hive jobs right now, except unlike MR they all run in-process, and we are also making the external client which is the crux of the matter). In normal case, the service doing the auth (HiveServer2 in case of Hive) gathers the tokens and passes them on to the service running the fragment; the external client may supply some tokens too. However, apparently for some clients it's difficult (or not implemented yet) to gather tokens, so in the cases of perimeter security, we want to be able to configure access in such way that they can access all of HDFS (for example; it could be some other service that their code touched that we have no idea about, hypothetically). The reasoning is that if the work item has passed thru the authorization that our service does, they don't care about HDFS security any more. In that case, our service would log in from keytab and run their item in that context. However, we neither want to require a super-user that is able to access all possible services (e.g. HBase), nor disable HDFS security altogether. So, the user work items would access HDFS (or HBase or whatever) as a user with lots of access, by design, and access other services via tokens. This feature is off by default, obviously, and the of their code talking to services is based entirely on tokens by default. I understand running as such user is not an ideal situation but it is apparently a valid scenario for some cases. So, what we do now is create a master UGI/Subject; for every task, if this is enabled, we clone that via reflection and add the tokens. We haven't extensively tested this yet since external client is not production ready but it appears to work in some tests. I hope this makes sense, feel free to clarify. We are using reflection to get the subject and construct the UGI from subject. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Issue Comment Deleted] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13066: -- Comment: was deleted (was: The scenario is like this; we accept work on behalf of clients that is, generally speaking, authorized on a higher level (those are fragments of Hive jobs right now, except unlike MR they all run in-process, and we are also making the external client which is the crux of the matter). In normal case, the service doing the auth (HiveServer2 in case of Hive) gathers the tokens and passes them on to the service running the fragment; the external client may supply some tokens too. However, apparently for some clients it's difficult (or not implemented yet) to gather tokens, so in the cases of perimeter security, we want to be able to configure access in such way that they can access all of HDFS (for example; it could be some other service that their code touched that we have no idea about, hypothetically). The reasoning is that if the work item has passed thru the authorization that our service does, they don't care about HDFS security any more. In that case, our service would log in from keytab and run their item in that context. However, we neither want to require a super-user that is able to access all possible services (e.g. HBase), nor disable HDFS security altogether. So, the user work items would access HDFS (or HBase or whatever) as a user with lots of access, by design, and access other services via tokens. This feature is off by default, obviously, and the of their code talking to services is based entirely on tokens by default. I understand running as such user is not an ideal situation but it is apparently a valid scenario for some cases. So, what we do now is create a master UGI/Subject; for every task, if this is enabled, we clone that via reflection and add the tokens. We haven't extensively tested this yet since external client is not production ready but it appears to work in some tests. I hope this makes sense, feel free to clarify. We are using reflection to get the subject and construct the UGI from subject.) > UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe > -- > > Key: HADOOP-13066 > URL: https://issues.apache.org/jira/browse/HADOOP-13066 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Sergey Shelukhin > > When calling loginFromKerberos, a static variable is set up with the result. > If someone logs in as a different user from a different thread, the call to > getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563218#comment-15563218 ] Sergey Shelukhin commented on HADOOP-13066: --- Sorry, I commented on the wrong jira > UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe > -- > > Key: HADOOP-13066 > URL: https://issues.apache.org/jira/browse/HADOOP-13066 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Sergey Shelukhin > > When calling loginFromKerberos, a static variable is set up with the result. > If someone logs in as a different user from a different thread, the call to > getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549646#comment-15549646 ] Sergey Shelukhin edited comment on HADOOP-13066 at 10/5/16 6:57 PM: The scenario is like this; we accept work on behalf of clients that is, generally speaking, authorized on a higher level (those are fragments of Hive jobs right now, except unlike MR they all run in-process, and we are also making the external client which is the crux of the matter). In normal case, the service doing the auth (HiveServer2 in case of Hive) gathers the tokens and passes them on to the service running the fragment; the external client may supply some tokens too. However, apparently for some clients it's difficult (or not implemented yet) to gather tokens, so in the cases of perimeter security, we want to be able to configure access in such way that they can access all of HDFS (for example; it could be some other service that their code touched that we have no idea about, hypothetically). The reasoning is that if the work item has passed thru the authorization that our service does, they don't care about HDFS security any more. In that case, our service would log in from keytab and run their item in that context. However, we neither want to require a super-user that is able to access all possible services (e.g. HBase), nor disable HDFS security altogether. So, the user work items would access HDFS (or HBase or whatever) as a user with lots of access, by design, and access other services via tokens. This feature is off by default, obviously, and the of their code talking to services is based entirely on tokens by default. I understand running as such user is not an ideal situation but it is apparently a valid scenario for some cases. So, what we do now is create a master UGI/Subject; for every task, if this is enabled, we clone that via reflection and add the tokens. We haven't extensively tested this yet since external client is not production ready but it appears to work in some tests. I hope this makes sense, feel free to clarify. We are using reflection to get the subject and construct the UGI from subject. was (Author: sershe): The scenario is like this; we accept work on behalf of clients that is, generally speaking, authorized on a higher level (those are fragments of Hive jobs right now, except unlike MR they all run in-process, and we are also making the external client which is the crux of the matter). In normal case, the service doing the auth gathers the tokens and passes them on; the external client may supply some tokens too. However, apparently for some clients it's difficult (or not implemented yet) to gather tokens, so in the cases of perimeter security, they want to configure access in such way that they can access all of HDFS (for example; it could be some other service that their code touched that we have no idea about, hypothetically). The reasoning is that if the work item has passed thru the authorization of our service, they don't care about HDFS security any more. In that case, our service would log in from keytab and run their item in that context. However, we neither want to require a super-user that is able to access all possible services (e.g. HBase), nor disable HDFS security altogether. So, the user work items would access HDFS (or HBase or whatever) as a user with lots of access, by design, and access other services via tokens. This feature is off by default, obviously, and the of their code talking to services is based entirely on tokens by default. I understand running as such user is not an ideal situation but it is apparently a valid scenario for some cases. So, what we do now is create a master UGI/Subject; for every task, if this is enabled, we clone that via reflection and add the tokens. We haven't extensively tested this yet since external client is not production ready but it appears to work in some tests. I hope this makes sense, feel free to clarify. We are using reflection to get the subject and construct the UGI from subject. > UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe > -- > > Key: HADOOP-13066 > URL: https://issues.apache.org/jira/browse/HADOOP-13066 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Sergey Shelukhin > > When calling loginFromKerberos, a static variable is set up with the result. > If someone logs in as a different user from a different thread, the call to > getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr..
[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15549646#comment-15549646 ] Sergey Shelukhin commented on HADOOP-13066: --- The scenario is like this; we accept work on behalf of clients that is, generally speaking, authorized on a higher level (those are fragments of Hive jobs right now, except unlike MR they all run in-process, and we are also making the external client which is the crux of the matter). In normal case, the service doing the auth gathers the tokens and passes them on; the external client may supply some tokens too. However, apparently for some clients it's difficult (or not implemented yet) to gather tokens, so in the cases of perimeter security, they want to configure access in such way that they can access all of HDFS (for example; it could be some other service that their code touched that we have no idea about, hypothetically). The reasoning is that if the work item has passed thru the authorization of our service, they don't care about HDFS security any more. In that case, our service would log in from keytab and run their item in that context. However, we neither want to require a super-user that is able to access all possible services (e.g. HBase), nor disable HDFS security altogether. So, the user work items would access HDFS (or HBase or whatever) as a user with lots of access, by design, and access other services via tokens. This feature is off by default, obviously, and the of their code talking to services is based entirely on tokens by default. I understand running as such user is not an ideal situation but it is apparently a valid scenario for some cases. So, what we do now is create a master UGI/Subject; for every task, if this is enabled, we clone that via reflection and add the tokens. We haven't extensively tested this yet since external client is not production ready but it appears to work in some tests. I hope this makes sense, feel free to clarify. We are using reflection to get the subject and construct the UGI from subject. > UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe > -- > > Key: HADOOP-13066 > URL: https://issues.apache.org/jira/browse/HADOOP-13066 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Sergey Shelukhin > > When calling loginFromKerberos, a static variable is set up with the result. > If someone logs in as a different user from a different thread, the call to > getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508229#comment-15508229 ] Sergey Shelukhin edited comment on HADOOP-13081 at 9/21/16 12:22 AM: - The synchronization issues and preserving the order seem fixable. UGI already iterates credentials (e.g. in getTGT or getCredentialsInternal) synchronizing on itself or subject only. User principal only uses the LoginContext to relogin. We could clear it and posit that clones cannot be used to relogin (this is rather arbitrary, admittedly...) was (Author: sershe): The synchronization issues and preserving the order seem fixable. UGI already iterates credentials (e.g. in getTGT or getCredentialsInternal) synchronizing on itself or subject only. User principal only uses the LoginContext to relogin. We could clear it and posit that clones cannot be logged in. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508229#comment-15508229 ] Sergey Shelukhin commented on HADOOP-13081: --- The synchronization issues and preserving the order seem fixable. UGI already iterates credentials (e.g. in getTGT or getCredentialsInternal) synchronizing on itself or subject only. User principal only uses the LoginContext to relogin. We could clear it and posit that clones cannot be logged in. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507805#comment-15507805 ] Sergey Shelukhin commented on HADOOP-13081: --- We don't have control over which parts of the code need kerberos or tokens; I suspect that usually only one would be needed but we don't know which one. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507807#comment-15507807 ] Sergey Shelukhin commented on HADOOP-13081: --- Btw, we do already have the implementation using reflection ;) > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507759#comment-15507759 ] Sergey Shelukhin edited comment on HADOOP-13081 at 9/20/16 8:51 PM: [~cnauroth] the concrete use case is where a service runs multiple pieces of work on behalf of users; it can be set to log in as a particular user using Kerberos (specifically when running these), but the users can also add their own tokens. We cannot add tokens to a single kerberos-based UGI because they will all mix; we also cannot log in for every piece of work in most cases, as it would overload the KDC. Ideally, we should be able to reuse the kerberos login and create a separate UGI with it for each user, adding the user-specific tokens. was (Author: sershe): [~cnauroth] the concrete use case is where a service runs multiple pieces of work on behalf of users; it can be set to log in as a particular user using Kerberos, but the users can also add their own tokens. We cannot add tokens to a single kerberos-based UGI because they will all mix; we also cannot log in for every piece of work in most cases, as it would overload the KDC. Ideally, we should be able to reuse the kerberos login and create a separate UGI with it for each user, adding the user-specific tokens. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507759#comment-15507759 ] Sergey Shelukhin commented on HADOOP-13081: --- [~cnauroth] the concrete use case is where a service runs multiple pieces of work on behalf of users; it can be set to log in as a particular user using Kerberos, but the users can also add their own tokens. We cannot add tokens to a single kerberos-based UGI because they will all mix; we also cannot log in for every piece of work in most cases, as it would overload the KDC. Ideally, we should be able to reuse the kerberos login and create a separate UGI with it for each user, adding the user-specific tokens. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.8.0, 3.0.0-alpha1 > > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: HADOOP-13081.03.patch > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: (was: HADOOP-13081.03.patch) > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: HADOOP-13081.03.patch fixed checkstyle... sigh > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Improvement > Components: security >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.03.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: HADOOP-13081.03.patch Updated > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.03.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: HADOOP-13081.02.patch Updating the patch to remove tabs > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.02.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: HADOOP-13081.02.patch Added the test and some additional logic to clone Hadoop credentials, which are apparently reused from the set, rather than adding to the set. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.02.patch, > HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398592#comment-15398592 ] Sergey Shelukhin commented on HADOOP-13081: --- Yes, it's possible to add the test. It fails however, probably due to problems with mocks, I will finish it tomorrow, need to run now. Fixed the rest. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process
[ https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13422: -- Attachment: HADOOP-13422.01.patch Updated the patch. > ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK > users in process > --- > > Key: HADOOP-13422 > URL: https://issues.apache.org/jira/browse/HADOOP-13422 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13422.01.patch, HADOOP-13422.patch > > > There's a race in the globals. The non-global APIs from ZOOKEEPER-2139 are > not available yet in a stable ZK version and there's no timeline for > availability, so for now it would help to make SM aware of other users of the > global config. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process
[ https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13422: -- Attachment: HADOOP-13422.patch The initial patch. > ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK > users in process > --- > > Key: HADOOP-13422 > URL: https://issues.apache.org/jira/browse/HADOOP-13422 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13422.patch > > > There's a race in the globals. The non-global APIs from ZOOKEEPER-2139 are > not available yet in a stable ZK version and there's no timeline for > availability, so for now it would help to make SM aware of other users of the > global config. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process
[ https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13422: -- Status: Patch Available (was: Open) > ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK > users in process > --- > > Key: HADOOP-13422 > URL: https://issues.apache.org/jira/browse/HADOOP-13422 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13422.patch > > > There's a race in the globals. The non-global APIs from ZOOKEEPER-2139 are > not available yet in a stable ZK version and there's no timeline for > availability, so for now it would help to make SM aware of other users of the > global config. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process
[ https://issues.apache.org/jira/browse/HADOOP-13422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13422: -- Description: There's a race in the globals. The non-global APIs from ZOOKEEPER-2139 are not available yet in a stable ZK version and there's no timeline for availability, so for now it would help to make SM aware of other users of the global config. (was: There's a race where old config ) > ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK > users in process > --- > > Key: HADOOP-13422 > URL: https://issues.apache.org/jira/browse/HADOOP-13422 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > There's a race in the globals. The non-global APIs from ZOOKEEPER-2139 are > not available yet in a stable ZK version and there's no timeline for > availability, so for now it would help to make SM aware of other users of the > global config. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13422) ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process
Sergey Shelukhin created HADOOP-13422: - Summary: ZKDelegationTokenSecretManager JaasConfig does not work well with other ZK users in process Key: HADOOP-13422 URL: https://issues.apache.org/jira/browse/HADOOP-13422 Project: Hadoop Common Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin There's a race where old config -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392570#comment-15392570 ] Sergey Shelukhin commented on HADOOP-13081: --- ping? We are doing this via reflection in Hive now, in certain scenarios, and it appears to work as intended. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375753#comment-15375753 ] Sergey Shelukhin commented on HADOOP-13081: --- Also can someone please assign this to me? > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368247#comment-15368247 ] Sergey Shelukhin commented on HADOOP-13081: --- [~cnauroth] ping? > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356326#comment-15356326 ] Sergey Shelukhin commented on HADOOP-13081: --- ping? > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: HADOOP-13081.01.patch Updated the patch accordingly. > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.01.patch, HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15323541#comment-15323541 ] Sergey Shelukhin commented on HADOOP-13081: --- [~cnauroth] ping? > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15305103#comment-15305103 ] Sergey Shelukhin commented on HADOOP-13081: --- Simple patch. Can someone please review? (and also assign to me; looks like I don't have permissions to assign) > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Attachment: HADOOP-13081.patch > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Status: Patch Available (was: Open) > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HADOOP-13081.patch > > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276928#comment-15276928 ] Sergey Shelukhin commented on HADOOP-13066: --- Thanks for the pointer, that method solves the problem. Interestingly, it sets keytabFile and keytabPrincipal statics, but not loginUser. > UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe > -- > > Key: HADOOP-13066 > URL: https://issues.apache.org/jira/browse/HADOOP-13066 > Project: Hadoop Common > Issue Type: Bug > Components: security >Reporter: Sergey Shelukhin > > When calling loginFromKerberos, a static variable is set up with the result. > If someone logs in as a different user from a different thread, the call to > getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267553#comment-15267553 ] Sergey Shelukhin commented on HADOOP-13081: --- [~cnauroth] [~sseth] fyi > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Description: We have a scenario where we log in with kerberos as a certain user for some tasks, but also want to add tokens to the resulting UGI that would be specific to each task. We don't want to authenticate with kerberos for every task. I am not sure how this can be accomplished with the existing UGI interface. Perhaps some clone method would be helpful, similar to createProxyUser minus the proxy stuff; or it could just relogin anew from ticket cache. getUGIFromTicketCache seems like the best option in existing code, but there doesn't appear to be a consistent way of handling ticket cache location - the above method, that I only see called in test, is using a config setting that is not used anywhere else, and the env variable for the location that is used in the main ticket cache related methods is not set uniformly on all paths - therefore, trying to find the correct ticket cache and passing it via the config setting to getUGIFromTicketCache seems even hackier than doing the clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user parameter on the main path - it logs a warning for multiple principals and then logs in with first available. was: We have a scenario where we log in with kerberos as a certain user for some tasks, but also want to add tokens to the resulting UGI that would be specific to each task. We don't want to authenticate with kerberos for every task. I am not sure how this can be accomplished with the existing UGI interface. Perhaps some clone method would be helpful, similar to createProxyUser minus the proxy stuff; or it could just relogin anew from ticket cache. getUGIFromTicketCache seems like the best option in existing code, but there doesn't appear to be a consistent way of handling ticket cache location - the above method, that I only see called in test, is using a config setting that is not used anywhere else, and the env variable for the location is not set on all paths - trying to find the correct ticket cache and setting it in the config for getUGIFromTicketCache seems even hackier than doing the clone via reflection ;) > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location that is used > in the main ticket cache related methods is not set uniformly on all paths - > therefore, trying to find the correct ticket cache and passing it via the > config setting to getUGIFromTicketCache seems even hackier than doing the > clone via reflection ;) Moreover, getUGIFromTicketCache ignores the user > parameter on the main path - it logs a warning for multiple principals and > then logs in with first available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
[ https://issues.apache.org/jira/browse/HADOOP-13081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-13081: -- Description: We have a scenario where we log in with kerberos as a certain user for some tasks, but also want to add tokens to the resulting UGI that would be specific to each task. We don't want to authenticate with kerberos for every task. I am not sure how this can be accomplished with the existing UGI interface. Perhaps some clone method would be helpful, similar to createProxyUser minus the proxy stuff; or it could just relogin anew from ticket cache. getUGIFromTicketCache seems like the best option in existing code, but there doesn't appear to be a consistent way of handling ticket cache location - the above method, that I only see called in test, is using a config setting that is not used anywhere else, and the env variable for the location is not set on all paths - trying to find the correct ticket cache and setting it in the config for getUGIFromTicketCache seems even hackier than doing the clone via reflection ;) was: We have a scenario where we log in with kerberos as a certain user for some tasks, but also want to add tokens to the resulting UGI that would be specific to each task. I am not sure how this can be accomplished with the existing UGI interface. Perhaps some clone method would be helpful, similar to createProxyUser minus the proxy stuff; or it could just relogin anew from ticket cache. getUGIFromTicketCache seems like the best option in existing code, but there doesn't appear to be a consistent way of handling ticket cache location - the above method, that I only see called in test, is using a config setting that is not used anywhere else, and the env variable for the location is not set on all paths - trying to find the correct ticket cache and setting it in the config for getUGIFromTicketCache seems even hackier than doing the clone via reflection ;) > add the ability to create multiple UGIs/subjects from one kerberos login > > > Key: HADOOP-13081 > URL: https://issues.apache.org/jira/browse/HADOOP-13081 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > > We have a scenario where we log in with kerberos as a certain user for some > tasks, but also want to add tokens to the resulting UGI that would be > specific to each task. We don't want to authenticate with kerberos for every > task. > I am not sure how this can be accomplished with the existing UGI interface. > Perhaps some clone method would be helpful, similar to createProxyUser minus > the proxy stuff; or it could just relogin anew from ticket cache. > getUGIFromTicketCache seems like the best option in existing code, but there > doesn't appear to be a consistent way of handling ticket cache location - the > above method, that I only see called in test, is using a config setting that > is not used anywhere else, and the env variable for the location is not set > on all paths - trying to find the correct ticket cache and setting it in the > config for getUGIFromTicketCache seems even hackier than doing the clone via > reflection ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-13081) add the ability to create multiple UGIs/subjects from one kerberos login
Sergey Shelukhin created HADOOP-13081: - Summary: add the ability to create multiple UGIs/subjects from one kerberos login Key: HADOOP-13081 URL: https://issues.apache.org/jira/browse/HADOOP-13081 Project: Hadoop Common Issue Type: Bug Reporter: Sergey Shelukhin We have a scenario where we log in with kerberos as a certain user for some tasks, but also want to add tokens to the resulting UGI that would be specific to each task. I am not sure how this can be accomplished with the existing UGI interface. Perhaps some clone method would be helpful, similar to createProxyUser minus the proxy stuff; or it could just relogin anew from ticket cache. getUGIFromTicketCache seems like the best option in existing code, but there doesn't appear to be a consistent way of handling ticket cache location - the above method, that I only see called in test, is using a config setting that is not used anywhere else, and the env variable for the location is not set on all paths - trying to find the correct ticket cache and setting it in the config for getUGIFromTicketCache seems even hackier than doing the clone via reflection ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261302#comment-15261302 ] Sergey Shelukhin commented on HADOOP-13066: --- Yeah, this is the method. Class lock doesn't protect across several method calls. See my example above... Thread 1 callls loginUserFromKeytab("hdfs", ...) Thread 2 calls loginUserFromKeytab("hbase", ...) Thread 1 calls getLoginUser and will get loginUser hbase. > UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe > -- > > Key: HADOOP-13066 > URL: https://issues.apache.org/jira/browse/HADOOP-13066 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > > When calling loginFromKerberos, a static variable is set up with the result. > If someone logs in as a different user from a different thread, the call to > getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
[ https://issues.apache.org/jira/browse/HADOOP-13066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15261210#comment-15261210 ] Sergey Shelukhin commented on HADOOP-13066: --- yes, but nothing synchronizes two calls... t1: login(user1), t2: login(user2), t1: getloginuser(), t2: getloginuser()... I think the simplest way to fix would be to have login method also return the UGI > UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe > -- > > Key: HADOOP-13066 > URL: https://issues.apache.org/jira/browse/HADOOP-13066 > Project: Hadoop Common > Issue Type: Bug >Reporter: Sergey Shelukhin > > When calling loginFromKerberos, a static variable is set up with the result. > If someone logs in as a different user from a different thread, the call to > getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-13066) UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe
Sergey Shelukhin created HADOOP-13066: - Summary: UserGroupInformation.loginWithKerberos/getLoginUser is not thread-safe Key: HADOOP-13066 URL: https://issues.apache.org/jira/browse/HADOOP-13066 Project: Hadoop Common Issue Type: Bug Reporter: Sergey Shelukhin When calling loginFromKerberos, a static variable is set up with the result. If someone logs in as a different user from a different thread, the call to getLoginUser will not return the correct UGI. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12697) IPC retry policies should recognise that SASL auth failures are unrecoverable
[ https://issues.apache.org/jira/browse/HADOOP-12697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089717#comment-15089717 ] Sergey Shelukhin commented on HADOOP-12697: --- The retries in this case were per-exception (see YARN RMProxy), but they don't specify SaslException/GSSException anywhere. I am not sure if this is an issue with retry policy setup in YARN, or in ipc.Client. > IPC retry policies should recognise that SASL auth failures are unrecoverable > - > > Key: HADOOP-12697 > URL: https://issues.apache.org/jira/browse/HADOOP-12697 > Project: Hadoop Common > Issue Type: Bug > Components: ipc >Affects Versions: 2.7.1 > Environment: Cluster with kerberos on and client not calling with the > right credentials >Reporter: Steve Loughran >Priority: Minor > > SLIDER-1050 shows that if you don't have the right kerberos settings, the > Yarn client IPC channel blocks retrying to the talk to the RM, retrying > repeatedly > {noformat} > 2016-01-07 02:50:45,111 [main] WARN ipc.Client - Exception encountered while > connecting to the server : > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: > No valid credentials provided (Mechanism level: Failed to find any Kerberos > tgt)] > {noformat} > SASL exceptions need to be recognised as irreconcilable authentication > failures, rather than generic IOEs that might go away if you retry -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer
[ https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-12567: -- Attachment: HADOOP-12567.01.patch Fixed. The time is ripe to increase the limit to 100 :P > NPE in SaslRpcServer > > > Key: HADOOP-12567 > URL: https://issues.apache.org/jira/browse/HADOOP-12567 > Project: Hadoop Common > Issue Type: Task >Affects Versions: 2.7.0, 2.7.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-12567.01.patch, HADOOP-12567.patch > > > {noformat} > if (LOG.isDebugEnabled()) { > String username = > getIdentifier(authzid, secretManager).getUser().getUserName(); > LOG.debug("SASL server DIGEST-MD5 callback: setting " > + "canonicalized client ID: " + username); > } > {noformat} > Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier > (and others), I can see that getUser method can return null. If debug logging > is enabled, this NPEs. > If getUser is not expected to return NULL, it should either be checked and > erred upon better here, or the error should be allowed to happen where it > would otherwise happen, not in some debug log path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-12567) NPE in SaslRpcServer
[ https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15007111#comment-15007111 ] Sergey Shelukhin commented on HADOOP-12567: --- Test failures do not look related. > NPE in SaslRpcServer > > > Key: HADOOP-12567 > URL: https://issues.apache.org/jira/browse/HADOOP-12567 > Project: Hadoop Common > Issue Type: Task >Affects Versions: 2.7.0, 2.7.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-12567.patch > > > {noformat} > if (LOG.isDebugEnabled()) { > String username = > getIdentifier(authzid, secretManager).getUser().getUserName(); > LOG.debug("SASL server DIGEST-MD5 callback: setting " > + "canonicalized client ID: " + username); > } > {noformat} > Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier > (and others), I can see that getUser method can return null. If debug logging > is enabled, this NPEs. > If getUser is not expected to return NULL, it should either be checked and > erred upon better here, or the error should be allowed to happen where it > would otherwise happen, not in some debug log path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer
[ https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-12567: -- Attachment: HADOOP-12567.patch > NPE in SaslRpcServer > > > Key: HADOOP-12567 > URL: https://issues.apache.org/jira/browse/HADOOP-12567 > Project: Hadoop Common > Issue Type: Task >Affects Versions: 2.7.0, 2.7.1 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-12567.patch > > > {noformat} > if (LOG.isDebugEnabled()) { > String username = > getIdentifier(authzid, secretManager).getUser().getUserName(); > LOG.debug("SASL server DIGEST-MD5 callback: setting " > + "canonicalized client ID: " + username); > } > {noformat} > Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier > (and others), I can see that getUser method can return null. If debug logging > is enabled, this NPEs. > If getUser is not expected to return NULL, it should either be checked and > erred upon better here, or the error should be allowed to happen where it > would otherwise happen, not in some debug log path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer
[ https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-12567: -- Affects Version/s: 2.7.0 2.7.1 Status: Patch Available (was: Open) > NPE in SaslRpcServer > > > Key: HADOOP-12567 > URL: https://issues.apache.org/jira/browse/HADOOP-12567 > Project: Hadoop Common > Issue Type: Task >Affects Versions: 2.7.1, 2.7.0 >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HADOOP-12567.patch > > > {noformat} > if (LOG.isDebugEnabled()) { > String username = > getIdentifier(authzid, secretManager).getUser().getUserName(); > LOG.debug("SASL server DIGEST-MD5 callback: setting " > + "canonicalized client ID: " + username); > } > {noformat} > Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier > (and others), I can see that getUser method can return null. If debug logging > is enabled, this NPEs. > If getUser is not expected to return NULL, it should either be checked and > erred upon better here, or the error should be allowed to happen where it > would otherwise happen, not in some debug log path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HADOOP-12567) NPE in SaslRpcServer
[ https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-12567: -- Description: {noformat} if (LOG.isDebugEnabled()) { String username = getIdentifier(authzid, secretManager).getUser().getUserName(); LOG.debug("SASL server DIGEST-MD5 callback: setting " + "canonicalized client ID: " + username); } {noformat} Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier (and others), I can see that getUser method can return null. If debug logging is enabled, this NPEs. If getUser is not expected to return NULL, it should either be checked and erred upon better here, or the error should be allowed to happen where it would otherwise happen, not in some debug log path. was: {noformat} if (LOG.isDebugEnabled()) { String username = getIdentifier(authzid, secretManager).getUser().getUserName(); LOG.debug("SASL server DIGEST-MD5 callback: setting " + "canonicalized client ID: " + username); } {noformat} Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier (and others), I can see that getUser method can return null. If debug logging is enabled, this NPEs. If getUser is not expected to return NULL, it should either be checked and erred upon better here, or the error should be allowed to happen where it would otherwise happen, not in some debug log statement. > NPE in SaslRpcServer > > > Key: HADOOP-12567 > URL: https://issues.apache.org/jira/browse/HADOOP-12567 > Project: Hadoop Common > Issue Type: Task >Reporter: Sergey Shelukhin > > {noformat} > if (LOG.isDebugEnabled()) { > String username = > getIdentifier(authzid, secretManager).getUser().getUserName(); > LOG.debug("SASL server DIGEST-MD5 callback: setting " > + "canonicalized client ID: " + username); > } > {noformat} > Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier > (and others), I can see that getUser method can return null. If debug logging > is enabled, this NPEs. > If getUser is not expected to return NULL, it should either be checked and > erred upon better here, or the error should be allowed to happen where it > would otherwise happen, not in some debug log path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HADOOP-12567) NPE in SaslRpcServer
Sergey Shelukhin created HADOOP-12567: - Summary: NPE in SaslRpcServer Key: HADOOP-12567 URL: https://issues.apache.org/jira/browse/HADOOP-12567 Project: Hadoop Common Issue Type: Task Reporter: Sergey Shelukhin {noformat} if (LOG.isDebugEnabled()) { String username = getIdentifier(authzid, secretManager).getUser().getUserName(); LOG.debug("SASL server DIGEST-MD5 callback: setting " + "canonicalized client ID: " + username); } {noformat} Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier (and others), I can see that getUser method can return null. If debug logging is enabled, this NPEs. If getUser is not expected to return NULL, it should either be checked and erred upon better here, or the error should be allowed to happen where it would otherwise happen, not in some debug log statement. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HADOOP-12567) NPE in SaslRpcServer
[ https://issues.apache.org/jira/browse/HADOOP-12567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HADOOP-12567: - Assignee: Sergey Shelukhin > NPE in SaslRpcServer > > > Key: HADOOP-12567 > URL: https://issues.apache.org/jira/browse/HADOOP-12567 > Project: Hadoop Common > Issue Type: Task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > {noformat} > if (LOG.isDebugEnabled()) { > String username = > getIdentifier(authzid, secretManager).getUser().getUserName(); > LOG.debug("SASL server DIGEST-MD5 callback: setting " > + "canonicalized client ID: " + username); > } > {noformat} > Looking at identifier implementations, e.g. AbstractDelegationTokenIdentifier > (and others), I can see that getUser method can return null. If debug logging > is enabled, this NPEs. > If getUser is not expected to return NULL, it should either be checked and > erred upon better here, or the error should be allowed to happen where it > would otherwise happen, not in some debug log path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-11771) Configuration::getClassByNameOrNull synchronizes on a static object
[ https://issues.apache.org/jira/browse/HADOOP-11771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389062#comment-14389062 ] Sergey Shelukhin commented on HADOOP-11771: --- Why don't we just stop using ReflectionUtils? > Configuration::getClassByNameOrNull synchronizes on a static object > --- > > Key: HADOOP-11771 > URL: https://issues.apache.org/jira/browse/HADOOP-11771 > Project: Hadoop Common > Issue Type: Sub-task > Components: conf, io, ipc >Reporter: Gopal V > Attachments: configuration-cache-bt.png, configuration-sync-cache.png > > > {code} > private static final Map>>> > CACHE_CLASSES = new WeakHashMap WeakReference>>>(); > ... > synchronized (CACHE_CLASSES) { > map = CACHE_CLASSES.get(classLoader); > if (map == null) { > map = Collections.synchronizedMap( > new WeakHashMap>>()); > CACHE_CLASSES.put(classLoader, map); > } > } > {code} > !configuration-sync-cache.png! > !configuration-cache-bt.png! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HADOOP-10555) add offset support to MurmurHash
[ https://issues.apache.org/jira/browse/HADOOP-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986197#comment-13986197 ] Sergey Shelukhin commented on HADOOP-10555: --- [~t3rmin4t0r] fyi > add offset support to MurmurHash > > > Key: HADOOP-10555 > URL: https://issues.apache.org/jira/browse/HADOOP-10555 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Sergey Shelukhin >Priority: Trivial > Attachments: HADOOP-10555.patch > > > From HIVE-6430 code review -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10555) add offset support to MurmurHash
[ https://issues.apache.org/jira/browse/HADOOP-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-10555: -- Status: Patch Available (was: Open) > add offset support to MurmurHash > > > Key: HADOOP-10555 > URL: https://issues.apache.org/jira/browse/HADOOP-10555 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Sergey Shelukhin >Priority: Trivial > Attachments: HADOOP-10555.patch > > > From HIVE-6430 code review -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HADOOP-10555) add offset support to MurmurHash
[ https://issues.apache.org/jira/browse/HADOOP-10555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HADOOP-10555: -- Attachment: HADOOP-10555.patch can someone please assign to me? I don't have permissions to assign > add offset support to MurmurHash > > > Key: HADOOP-10555 > URL: https://issues.apache.org/jira/browse/HADOOP-10555 > Project: Hadoop Common > Issue Type: Improvement >Reporter: Sergey Shelukhin >Priority: Trivial > Attachments: HADOOP-10555.patch > > > From HIVE-6430 code review -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10555) add offset support to MurmurHash
Sergey Shelukhin created HADOOP-10555: - Summary: add offset support to MurmurHash Key: HADOOP-10555 URL: https://issues.apache.org/jira/browse/HADOOP-10555 Project: Hadoop Common Issue Type: Improvement Reporter: Sergey Shelukhin Priority: Trivial >From HIVE-6430 code review -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HADOOP-9487) Deprecation warnings in Configuration should go to their own log or otherwise be suppressible
[ https://issues.apache.org/jira/browse/HADOOP-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738771#comment-13738771 ] Sergey Shelukhin commented on HADOOP-9487: -- ping? > Deprecation warnings in Configuration should go to their own log or otherwise > be suppressible > - > > Key: HADOOP-9487 > URL: https://issues.apache.org/jira/browse/HADOOP-9487 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.0.0 >Reporter: Steve Loughran > Attachments: HADOOP-9487.patch, HADOOP-9487.patch > > > Running local pig jobs triggers large quantities of warnings about deprecated > properties -something I don't care about as I'm not in a position to fix > without delving into Pig. > I can suppress them by changing the log level, but that can hide other > warnings that may actually matter. > If there was a special Configuration.deprecated log for all deprecation > messages, this log could be suppressed by people who don't want noisy logs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HADOOP-9487) Deprecation warnings in Configuration should go to their own log or otherwise be suppressible
[ https://issues.apache.org/jira/browse/HADOOP-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702457#comment-13702457 ] Sergey Shelukhin commented on HADOOP-9487: -- This warning is also output in HBase shell. The latest patch looks reasonable > Deprecation warnings in Configuration should go to their own log or otherwise > be suppressible > - > > Key: HADOOP-9487 > URL: https://issues.apache.org/jira/browse/HADOOP-9487 > Project: Hadoop Common > Issue Type: Improvement > Components: conf >Affects Versions: 3.0.0 >Reporter: Steve Loughran > Attachments: HADOOP-9487.patch, HADOOP-9487.patch > > > Running local pig jobs triggers large quantities of warnings about deprecated > properties -something I don't care about as I'm not in a position to fix > without delving into Pig. > I can suppress them by changing the log level, but that can hide other > warnings that may actually matter. > If there was a special Configuration.deprecated log for all deprecation > messages, this log could be suppressed by people who don't want noisy logs -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira