If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., before insertion) you might take a look at Apache Tika. It's designed for this sort of thing.
You could also setup it up in a simple web service callable from MarkLogic. POST the spreadsheet to it and have it return the metadata in whatever form you like. --- Ron Hitchens {r...@overstory.co.uk} +44 7879 358212 On Oct 17, 2014, at 3:35 PM, Gary Russo <garyru...@hotmail.com> wrote: > Hello Dennis, > > Thanks for the info. > > Yes, I tried xdmp:excel-convert() but this does not get the worksheet > metadata either. > > The metadata that I need to retrieve from the older excel format is the > “Named Fields”. > > Users create them using the Excel “Named Box” feature as shown here. => > http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm > > It looks like my only option is to use the Apache POI Java API to extract the > named fields or use it to convert xls-to-xlsx on-the-fly. > =>https://poi.apache.org/apidocs > > I know there’s a hidden way to use MarkLogic’s underlying JVM. > > It would be great if I could use it to call the Apache POI code. > > But that’s a question for another day. > > Thanks again, > > Gary Russo > > > Gary Russo > Enterprise NoSQL Developer > http://garyrusso.wordpress.com > http://twitter.com/garyprusso > > > > From: general-boun...@developer.marklogic.com > [mailto:general-boun...@developer.marklogic.com] On Behalf Of David Ennis > Sent: Thursday, October 16, 2014 5:02 PM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet > metadata from an Excel 97/2003? > > HI. > > I believe that with the conversion licence, you can do what you want with: > xdmp:excel-convert > > Barring that, you could always run openoffice as a headless server for > conversion purposes. > > Kind Regards, > David Ennis > > > > > > Kind Regards, > David Ennis > > > David Ennis > Content Engineer > > > Mastering the value of content > creative | technology | content > > Delftechpark 37i > 2628 XJ Delft > The Netherlands > T: +31 88 268 25 00 > M: +31 63 091 72 80 > > > > On 16 October 2014 20:00, Gary Russo <garyru...@hotmail.com> wrote: > I need to extract worksheet metadata called “defined name” from Excel 97/2003 > formatted spreadsheets. > > The ISYS xdmp:document-filter() API is limiting because it only extracts the > text. > > It does not extract any worksheet metadata. > > Does anyone know of a workaround for this? > > My only thought is to upload the “Excel 97/2003” xls file and then convert it > on the server to an “Excel 2010” xlsx format. > > Once it’s in an Excel 2010 format, I can easily extract the “defined name” > metadata. > > This is what it looks like in “Excel 2010” files. > > <definedNames> > <definedName name="LastYr">Revenue!$B$6:$B$15</definedName> > <definedName name="ThisYr">Revenue!$C$6:$C$15</definedName> > <definedName name="Variance">Revenue!$D$6:$D$15</definedName> > </definedNames> > > > Thanks, > Gary Russo > > > Gary Russo > Enterprise NoSQL Developer > Phone: 212-404-8639 > Skype: garyprusso > http://garyrusso.wordpress.com > > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > > > _______________________________________________ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general