Thomas Rebele created HIVE-29204:
------------------------------------
Summary: Hive-site: cleanup attachments and links to attachments
Key: HIVE-29204
URL: https://issues.apache.org/jira/browse/HIVE-29204
Project: Hive
Issue Type: Task
Reporter: Thomas Rebele
Some links to attachments lead to a 404 Not found, e.g.
[attachments/40509928/42696874-txt|https://hive.apache.org/attachments/40509928/42696874-txt]
in [SQL Standard Based Hive
Authorization|https://hive.apache.org/docs/latest/language/sql-standard-based-hive-authorization/#hive-013].
Some link texts replace the dot with a dash (e.g.,
content/community/resources/presentations.md). In general, it would be better
to use the title of the document instead of numbers as file name and link text.
{code:java}
50:* [attachments/27362054/35193149-pptx](/attachments/27362054/35193149.pptx)
(Ashutosh Chauhan){code}
A few shell commands that might be helpful:
{code:java}
find themes/hive/static/attachments -type f | sed 's#themes/hive/static/##' |
sort -u > available-attachments.txt
rg "attachments/" | sed 's#attachments/#\nattachments/#g;' | grep
'^attachments' | sed 's/\([?"<> )]\|\]\).*//' | sort -u > needed-attachments.txt
{code}
There are also some duplicate files:
{code:java}
$ cat available-attachments.txt| sed 's#^#themes/hive/static/#' | xargs md5sum
| sort
...
f9f26fe37b0c5276d0b63f98e1188324
themes/hive/static/attachments/27362075/34177489.pdf
f9f26fe37b0c5276d0b63f98e1188324
themes/hive/static/attachments/27362075/34177517.pdf
f9f26fe37b0c5276d0b63f98e1188324
themes/hive/static/attachments/27362075/35193010.pdf
f9f26fe37b0c5276d0b63f98e1188324
themes/hive/static/attachments/27362075/35193011.pdf
...
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)