+1 for BaseX. ( I believe you can do pretty much the same with eXist. ) For a WebApp, look at RESTXQ https://docs.basex.org/wiki/RESTXQ <https://docs.basex.org/wiki/RESTXQ> which uses annotations to map HTTP Routes to XQuery Functions and arguments.
— Steve M. > On Dec 17, 2020, at 3:57 PM, Pennington, Buddy D. <[email protected]> > wrote: > > Interesting. Our XML is being provided by ProQuest. We're cancelling our > subscription to their Historical NYT database and they are offering up XML of > the articles for 1851-1938. > > Buddy Pennington > Head of Electronic Resources & Systems > University Libraries > University of Missouri - Kansas City > (he/him/his) > > > -----Original Message----- > From: Code for Libraries <[email protected]> On Behalf Of Custer, Mark > Sent: Thursday, December 17, 2020 2:46 PM > To: [email protected] > Subject: Re: [CODE4LIB] Web app to search XML files > > WARNING: This message has originated from an External Source. This may be a > phishing expedition that can result in unauthorized access to our IT System. > Please use proper judgment and caution when opening attachments, clicking > links, or responding to this email. > > Exactly my thoughts, as well, Buddy. > > I was going to recommend BaseX, at least as a first step to investigate the > full corpus (https://docs.basex.org/wiki/Getting_Started). It indexes XML > documents very quickly (and in their entirety, which is important whether you > want to use those documents as is or transform them to something else). You > can do that from the command line or, without even taking the time to learn > its commands, you can use the GUI to 1) index and 2) get an overview of your > new database properties, including an exhaustive summary of attributes, > elements, path structure, etc. > > That said, I'm now curious about the NYT XML dataset in general. Can you > provide a link to more info about it? > > I just did a very quick bit of searching, and found this interesting blog > post, > https://open.blogs.nytimes.com/2016/07/26/the-future-of-the-past-modernizing-the-new-york-times-archive, > which I believe describes how that dataset (or a similar one) was handled > internally, converting it and HTML documents for missing XML docs into JSON. > Due to that, I expect it's not the type of XML that you'll need to retain as > XML, but getting a full view of the entire forest is probably the only way to > know for sure. > > Mark > > > > -----Original Message----- > From: Code for Libraries [mailto:[email protected]] On Behalf Of > Pennington, Buddy D. > Sent: Thursday, 17 December, 2020 3:31 PM > To: [email protected] > Subject: Re: [CODE4LIB] Web app to search XML files > > Yes, lots of excellent suggestions from folks. I was actually looking at > Basex earlier today as a tool to review the XML once we have it. > > Thanks! > > Buddy Pennington > Head of Electronic Resources & Systems > University Libraries > University of Missouri - Kansas City > (he/him/his) > > -----Original Message----- > From: Code for Libraries <[email protected]> On Behalf Of David Mayo > Sent: Thursday, December 17, 2020 2:15 PM > To: [email protected] > Subject: Re: [CODE4LIB] Web app to search XML files > > WARNING: This message has originated from an External Source. This may be a > phishing expedition that can result in unauthorized access to our IT System. > Please use proper judgment and caution when opening attachments, clicking > links, or responding to this email. > > A lot of good suggestions; if you're looking for fast turnaround without > having to decompose and shift the data, it might be worth looking at > dedicated XML databases like eXistDB and Basex > > https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fexist-db.org%2Fexist%2Fapps%2Fhomepage%2Findex.html&data=04%7C01%7Cmark.custer%40yale.edu%7C849f5aa006d84afd0c1d08d8a2caa43d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637438338513534783%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=eL%2FucCumXe8y8a5oAqdothJKneDwvcLdncQ3AB9ckcI%3D&reserved=0 > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbasex.org%2F&data=04%7C01%7Cmark.custer%40yale.edu%7C849f5aa006d84afd0c1d08d8a2caa43d%7Cdd8cbebb21394df8b4114e3e87abeb5c%7C0%7C0%7C637438338513534783%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=k2TDcMoP%2F1uFjApdAZwc4zdisOnIf6KZnjCtqMVTQX4%3D&reserved=0 > > IIRC, eXist-db has dedicated functionality for building applications built > in; even if you don't go that way, I've found these very useful for analysis > of XML corpuses prior to running other software to transform them. > > - Dave Mayo (He/Him) > Software Dev @ Harvard LTS > > > On Thu, Dec 17, 2020 at 2:53 PM Stuart A. Yeates <[email protected]> wrote: > >> There's XML and XML. >> >> I suggest that you enquire about the exact format that you're going to >> be receiving and ask around for systems that support it out of the >> box. >> >> cheers >> stuart >> >> >> -- >> ...let us be heard from red core to black sky >> >> On Fri, 18 Dec 2020 at 07:37, Pennington, Buddy D. >> <[email protected]> >> wrote: >>> >>> Hi all, >>> >>> We're purchasing an XML dataset for the historical NY Times and I am >> curious about any suggestions to quickly build a web app to search and >> display those records for end users. >>> >>> Buddy Pennington >>> Head of Electronic Resources & Systems University Libraries >>> University of Missouri - Kansas City >>> (he/him/his) >>
smime.p7s
Description: S/MIME cryptographic signature
