Thanks Pete

 

I will definitely consider using this bulk conversion utility.

 

The application that I'm building will ingest a wide range of Excel
spreadsheets.

 

Once ingested, the system will aggregate the data. It will also generate
"unified spreadsheets".

 

Your Excel XQuery blog post was helpful. =>
http://developer.marklogic.com/blog/smallchanges/2007-12-04

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Pete Aven
Sent: Monday, October 20, 2014 12:23 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

In case it's useful, Microsoft also provides the Microsoft Office Planning
Manager and Office Compatibility Packs free, which will allow you to bulk
convert older Office formats ( 2003 and earlier ) to the new OOXML formats.

 

https://www.microsoft.com/en-us/download/details.aspx?id=21888#filelist

https://www.microsoft.com/en-us/download/details.aspx?id=3

 

Pete

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Russo
Sent: Monday, October 20, 2014 10:44 AM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

Hello Ron,

 

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

 

It complicates things a little bit but it will be ok.

 

Apache Tika looks like a nice solution.

 

My client is a Microsoft shop and they use a product called Aspose to
convert/extract data from spreadsheets.

 

The majority of spreadsheet formats that I need to ingest use the older
97/2003 format. I can use the Aspose API to covert the older format to OOXML
on the fly.

 

It's unfortunate that the MarkLogic xdmp:document-filter() API is not able
to extract the "defined name" metadata from the "97/2003" file format.

 

I consider it to be a bug in the MarkLogic API because other Excel
Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract
this data from the older file format. 

 

Anyway, thanks for the info.

 

-          Gary R

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

 

   If it's feasible to do your metadata extraction upstream of MarkLogic
(i.e., before insertion) you might take a look at Apache Tika.  It's
designed for this sort of thing.

 

   You could also setup it up in a simple web service callable from
MarkLogic.  POST the spreadsheet to it and have it return the metadata in
whatever form you like.


---

Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

 

On Oct 17, 2014, at 3:35 PM, Gary Russo <garyru...@hotmail.com> wrote:

 

Hello Dennis,

 

Thanks for the info.

 

Yes, I tried xdmp:excel-convert() but this does not get the worksheet
metadata either.

 

The metadata that I need to retrieve from the older excel format is the
"Named Fields".

 

Users create them using the Excel "Named Box" feature as shown here. =>
<http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm>
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

 

It looks like my only option is to use the Apache POI Java API to extract
the named fields or use it to convert xls-to-xlsx on-the-fly. =>
<https://poi.apache.org/apidocs> https://poi.apache.org/apidocs

 

I know there's a hidden way to use MarkLogic's underlying JVM.

 

It would be great if I could use it to call the Apache POI code.

 

But that's a question for another day.

 

Thanks again,

 

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com

 <http://twitter.com/garyprusso> http://twitter.com/garyprusso

 

 

 

From:  <mailto:general-boun...@developer.marklogic.com>
general-boun...@developer.marklogic.com [mailto:general-
<mailto:boun...@developer.marklogic.com> boun...@developer.marklogic.com] On
Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

HI.

 

I believe that with the conversion licence, you can do what you want with:
xdmp:excel-convert

 

Barring that, you could always run openoffice as a headless server for
conversion purposes.

 

Kind Regards,

David Ennis

 

 




 

 

Kind Regards,

David Ennis

 

 

David Ennis
Content Engineer

 <http://www.hinttech.com/> HintTech 
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80 

 <http://www.hinttech.com> http://www.hinttech.com
<https://twitter.com/HintTech>   <http://www.facebook.com/HintTech>
<http://www.linkedin.com/company/HintTech> 

 

On 16 October 2014 20:00, Gary Russo < <mailto:garyru...@hotmail.com>
garyru...@hotmail.com> wrote:

I need to extract worksheet metadata called "defined name" from Excel
97/2003 formatted spreadsheets.

 

The ISYS xdmp:document-filter() API is limiting because it only extracts the
text.

 

It does not extract any worksheet metadata.

 

Does anyone know of a workaround for this?

 

My only thought is to upload the "Excel 97/2003" xls file and then convert
it on the server to an "Excel 2010" xlsx format.

 

Once it's in an Excel 2010 format, I can easily extract the "defined name"
metadata.

 

This is what it looks like in "Excel 2010" files.

 

  <definedNames>
    <definedName name="LastYr">Revenue!$B$6:$B$15</definedName>
    <definedName name="ThisYr">Revenue!$C$6:$C$15</definedName>
    <definedName name="Variance">Revenue!$D$6:$D$15</definedName>
  </definedNames>

 

 

Thanks,

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

Phone:  <tel:212-404-8639> 212-404-8639

Skype: garyprusso

 <http://garyrusso.wordpress.com> http://garyrusso.wordpress.com

 


_______________________________________________
General mailing list
 <mailto:General@developer.marklogic.com> General@developer.marklogic.com
 <http://developer.marklogic.com/mailman/listinfo/general>
http://developer.marklogic.com/mailman/listinfo/general

 

_______________________________________________
General mailing list
 <mailto:General@developer.marklogic.com> General@developer.marklogic.com
 <http://developer.marklogic.com/mailman/listinfo/general>
http://developer.marklogic.com/mailman/listinfo/general

 

_______________________________________________
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to