If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., 
before insertion) you might take a look at Apache Tika.  It's designed for this 
sort of thing.

   You could also setup it up in a simple web service callable from MarkLogic.  
POST the spreadsheet to it and have it return the metadata in whatever form you 
like.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Oct 17, 2014, at 3:35 PM, Gary Russo <garyru...@hotmail.com> wrote:

> Hello Dennis,
>  
> Thanks for the info.
>  
> Yes, I tried xdmp:excel-convert() but this does not get the worksheet 
> metadata either.
>  
> The metadata that I need to retrieve from the older excel format is the 
> “Named Fields”.
>  
> Users create them using the Excel “Named Box” feature as shown here. => 
> http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm
>  
> It looks like my only option is to use the Apache POI Java API to extract the 
> named fields or use it to convert xls-to-xlsx on-the-fly. 
> =>https://poi.apache.org/apidocs
>  
> I know there’s a hidden way to use MarkLogic’s underlying JVM.
>  
> It would be great if I could use it to call the Apache POI code.
>  
> But that’s a question for another day.
>  
> Thanks again,
>  
> Gary Russo
>  
>  
> Gary Russo
> Enterprise NoSQL Developer
> http://garyrusso.wordpress.com
> http://twitter.com/garyprusso
>  
>  
>  
> From: general-boun...@developer.marklogic.com 
> [mailto:general-boun...@developer.marklogic.com] On Behalf Of David Ennis
> Sent: Thursday, October 16, 2014 5:02 PM
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
> metadata from an Excel 97/2003?
>  
> HI.
>  
> I believe that with the conversion licence, you can do what you want with: 
> xdmp:excel-convert
>  
> Barring that, you could always run openoffice as a headless server for 
> conversion purposes.
>  
> Kind Regards,
> David Ennis
>  
>  
> 
>  
>  
> Kind Regards,
> David Ennis
>  
>  
> David Ennis
> Content Engineer
> 
>  
> Mastering the value of content
> creative | technology | content
> 
> Delftechpark 37i
> 2628 XJ Delft
> The Netherlands
> T: +31 88 268 25 00
> M: +31 63 091 72 80 
> 
>    
>  
> On 16 October 2014 20:00, Gary Russo <garyru...@hotmail.com> wrote:
> I need to extract worksheet metadata called “defined name” from Excel 97/2003 
> formatted spreadsheets.
>  
> The ISYS xdmp:document-filter() API is limiting because it only extracts the 
> text.
>  
> It does not extract any worksheet metadata.
>  
> Does anyone know of a workaround for this?
>  
> My only thought is to upload the “Excel 97/2003” xls file and then convert it 
> on the server to an “Excel 2010” xlsx format.
>  
> Once it’s in an Excel 2010 format, I can easily extract the “defined name” 
> metadata.
>  
> This is what it looks like in “Excel 2010” files.
>  
>   <definedNames>
>     <definedName name="LastYr">Revenue!$B$6:$B$15</definedName>
>     <definedName name="ThisYr">Revenue!$C$6:$C$15</definedName>
>     <definedName name="Variance">Revenue!$D$6:$D$15</definedName>
>   </definedNames>
>  
>  
> Thanks,
> Gary Russo
>  
>  
> Gary Russo
> Enterprise NoSQL Developer
> Phone: 212-404-8639
> Skype: garyprusso
> http://garyrusso.wordpress.com
>  
> 
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general
> 
>  
> _______________________________________________
> General mailing list
> General@developer.marklogic.com
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to