Thanks! Today is our Thanksgiving holiday so I am not working today. I will look at this soon.
Consider a file filter that uses regular expression(s). Wray Johnson (m) 704-293-9008 > On Nov 26, 2015, at 11:43 AM, "Christian Grün" <christian.gr...@gmail.com> > wrote: > > Hi E. Wray, > > I have attached a little example for some XQuery code, which adds > files, archives and archive contents to a database. It’s probably not > the most efficient solution, so feel free to enhance it or ask more > questions. > > I agree that your use case is an enticing one: We also use BaseX to > process office files, and Rositsa Shadura wrote an interesting thesis > on that topic [1]. As Dirk pointed out, it turned out that we didn’t > want to choose one particular solution, and the XQuery approach is > currently the most flexible one. > > Hope this helps, > Christian > > [1] http://basex.org/about-us/publications > ___________________________ > >> On Wed, Nov 25, 2015 at 5:43 PM, Dirk Kirsten <d...@basex.org> wrote: >> Hello, >> >> which problems did you encounter? This problem should be solvable using a >> small XQuery, basically putting what you describe in natural languages in >> XQuery so our processor understands it. >> >> I don't think it would make any sense to add such a specific format. There >> are simply way to many possible combinations - You want archive files >> extracted, others might want not to do this. In the end we would end up with >> a very complex definition language - And what's the point if we already have >> a standardized query language like XQuery, which can achieve the same thing? >> >> Cheers >> Dirk >> >> On 11/25/2015 05:38 PM, E. Wray Johnson wrote: >> >> Here is what I want to do: For a given folder and all its subfolders on my >> physical dive, mirror its contents including the contents of archives, >> parsing xml, json,html, text, etc. using their respective parser skipping >> invalids, and adding all other files as raw. I want archive files (*.zip, >> *.doxc) to be added as raw, however I want the text inside archive files >> like docx (ms-word) to be indexed and any files in the archives files that >> match a filter to be indexed. >> >> Note: It would be nice if there was a single db:add method that allowed me >> to specify a map of filters to parsers with options, where all files that do >> not match a filter (or are invalid) will be optionally added as raw. > <add-to-db.xq>