Hi Graydon, Maybe it’s TagSoup that has problems to convert some specific HTML files to XML. Did you try to write the responses to disk and parse them in a second step?
If your input data is not confidential, could you possibly provide us with an example that runs out of the box? Best, Christian > I'm using the basexgui to run (minus some identifying actual values defined > previously in the query) > > (: for each path, retrieve the document :) > for $remote in $paths > let $name as xs:string := file:name($remote) > let $target as xs:string := file:resolve-path($name,$targetBase) > let $fetched := > http:send-request(<http:request method='get' > override-media-type='application/octet-stream' username='{$id}' > password='{$pass}' />, > $remote)[2] > let $use as item() := try { > html:parse($fetched) > } catch * { > $fetched > } > return if ($use instance of document-node()) > then file:write($target,$use) > else file:write-binary($target,$use) > > It works, in that I get exactly 100 documents retrieved. (There are > unfortunately 140+ documents in the list.) > > However, the query fails with an "out of main memory" error when using a > recent 10.0 beta or 9.7 with Xmx set to 2g. Setting Xmx to 16g with 9.7 > produces the same "out of memory" error in the same length of time (about 5 > minutes). > > java -version says > 20:27 test % java -version > openjdk version "11.0.14.1" 2022-02-08 > OpenJDK Runtime Environment 18.9 (build 11.0.14.1+1) > OpenJDK 64-Bit Server VM 18.9 (build 11.0.14.1+1, mixed mode, sharing) > > It's entirely possible I'm going about fetching files off a web server the > wrong way; it's possible there's something there that's rather large, but I > doubt it's that large. > > What should I be doing instead? > > Thanks! > Graydon