[ 
https://issues.apache.org/jira/browse/OPENNLP-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653610#comment-17653610
 ] 

ASF GitHub Bot commented on OPENNLP-1428:
-----------------------------------------

jzonthemtn commented on code in PR #473:
URL: https://github.com/apache/opennlp/pull/473#discussion_r1060051880


##########
opennlp-tools/src/main/java/opennlp/tools/util/DownloadUtil.java:
##########
@@ -174,4 +143,82 @@ public static <T extends BaseModel> T downloadModel(URL 
url, Class<T> type) thro
     }
   }
 
+  @Internal
+  static class DownloadParser {
+
+    private static final Pattern LINK_PATTERN = Pattern.compile("<a 
href=\\\"(.*?)\\\">(.*?)</a>", Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
+    private final URL indexUrl;
+
+    DownloadParser(URL indexUrl) {
+      Objects.requireNonNull(indexUrl);
+      this.indexUrl = indexUrl;
+    }
+
+    Map<String, Map<ModelType, String>> getAvailableModels() {
+
+      final Matcher matcher = LINK_PATTERN.matcher(fetchPageIndex());
+
+      final List<String> links = new ArrayList<>();
+      while (matcher.find()) {
+        links.add(matcher.group(1));
+      }
+
+      return toMap(links);
+    }
+
+    private Map<String, Map<ModelType, String>> toMap(List<String> links) {
+
+      final Map<String, Map<ModelType, String>> result = new HashMap<>();
+
+      for (String link : links) {

Review Comment:
   That sounds reasonable. I wrote 
[OPENNLP-1433](https://issues.apache.org/jira/browse/OPENNLP-1433) for it and 
referenced this conversation.





> Enhance DownloadUtil to avoid the use of hard-coded model urls
> --------------------------------------------------------------
>
>                 Key: OPENNLP-1428
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1428
>             Project: OpenNLP
>          Issue Type: Improvement
>            Reporter: Richard Zowalla
>            Assignee: Richard Zowalla
>            Priority: Major
>
> As pointed out in https://github.com/apache/opennlp/pull/472, we should not 
> rely on hard-coded URLs in DownloadUtil.
> Instead we can parse the content of 
> https://dlcdn.apache.org/opennlp/models/ud-models-1.0/ and automatically 
> derive the related model files from it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to