Zbigniew Baranowski created RANGER-5125: -------------------------------------------
Summary: Missing the result column value in ORC File Logging Key: RANGER-5125 URL: https://issues.apache.org/jira/browse/RANGER-5125 Project: Ranger Issue Type: Bug Components: audit Affects Versions: 2.5.0, 2.4.0, 2.3.0 Reporter: Zbigniew Baranowski h4. {*}Description{*}: There is an issue in {{ORCFileUtil.log()}} when writing audit logs in ORC format. The _result_ field in the audit schema is of type \{{short }}and is not properly handled when being cast to a string. This results in empty values in the corresponding _accessResult_ column in the ORC file. h4. {*}Affected Component{*}: * {{org.apache.ranger.audit.provider.ORCFileUtil}} * {{castStringObject(Object object)}} method h4. {*}Steps to Reproduce{*}: # Run the main() from ORCFileUtil class: [https://github.com/apache/ranger/blob/a90a77e1ce12a0f7193533e846c504caea293d21/agents-audit/src/main/java/org/apache/ranger/audit/utils/ORCFileUtil.java#L85] # This will write the orc file under /tmp/test.orc # Open the file with for example spark and read out the content, the 'accessResult' column will not have values in any row even if the corresponding event had it set. {code:java} val df =spark.read.orc("/tmp/test.orc") df: org.apache.spark.sql.DataFrame = [repositoryType: int, repositoryName: string ... 24 more fields] scala> df.show(false) 25/01/29 19:28:12 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. +--------------+--------------+----+-------------------+----------+------------------------+------------+------+------------+-------+--------+------------+-----------+---------+----------+---------+-----------+-------------+-------+-------+------+----------+---------------+--------------+-----------+--------+ |repositoryType|repositoryName|user|eventTime |accessType|resourcePath |resourceType|action|accessResult|agentId|policyId|resultReason|aclEnforcer|sessionId|clientType|clientIP |requestData|agentHostname|logType|eventId|seqNum|eventCount|eventDurationMS|additionalInfo|clusterName|zoneName| +--------------+--------------+----+-------------------+----------+------------------------+------------+------+------------+-------+--------+------------+-----------+---------+----------+---------+-----------+-------------+-------+-------+------+----------+---------------+--------------+-----------+--------+ |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log001 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |0 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log111 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |1 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log221 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |2 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log331 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |3 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log441 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |4 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log551 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |5 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log661 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |6 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log771 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |7 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log881 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |8 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log991 |file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |9 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log10101|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |10 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log11111|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |11 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log12121|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |12 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log13131|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |13 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log14141|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |14 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log15151|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |15 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log16161|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |16 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log17171|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |17 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log18181|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |18 |0 |1 |0 | | | | |1 |hdfsdev | |2025-01-29 19:25:10|read |/tmp/test-audit.log19191|file | | | |0 |1 |ranger-acl | | |127.0.0.1| | | |19 |0 |1 |0 | | | | +--------------+--------------+----+-------------------+----------+------------------------+------------+------+------------+-------+--------+------------+-----------+---------+----------+---------+-----------+-------------+-------+-------+------+----------+---------------+--------------+-----------+--------+ {code} *Expected Behavior:* * {{short}} values (result field) will be correctly converted to strings before writing to ORC. h4. {*}Root Cause{*}: * The {{castStringObject(Object object)}} method is missing a case for {{{}Short{}}}. * This results in {{null}} or incorrect conversions when a {{short}} value is written to ORC. h4. {*}Proposed Fix{*}: Modify {{castStringObject(Object object)}} in {{ORCFileUtil.java}} to properly handle {{Short}} values: {code:java} protected String castStringObject(Object object) { String ret = null; try { if (object instanceof String) ret = (String) object; else if (object instanceof Date) { ret = getDateString((Date) object); } else if (object instanceof Short) { // Fix: Added case for Short ret = ((Short) object).toString(); } } catch (Exception e) { logger.error("Error while writing into ORC File:", e); } return ret; } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)