Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-27 Thread Gary Russo
Thanks Pete

 

I will definitely consider using this bulk conversion utility.

 

The application that I'm building will ingest a wide range of Excel
spreadsheets.

 

Once ingested, the system will aggregate the data. It will also generate
unified spreadsheets.

 

Your Excel XQuery blog post was helpful. =
http://developer.marklogic.com/blog/smallchanges/2007-12-04

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Pete Aven
Sent: Monday, October 20, 2014 12:23 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

In case it's useful, Microsoft also provides the Microsoft Office Planning
Manager and Office Compatibility Packs free, which will allow you to bulk
convert older Office formats ( 2003 and earlier ) to the new OOXML formats.

 

https://www.microsoft.com/en-us/download/details.aspx?id=21888#filelist

https://www.microsoft.com/en-us/download/details.aspx?id=3

 

Pete

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Russo
Sent: Monday, October 20, 2014 10:44 AM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

Hello Ron,

 

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

 

It complicates things a little bit but it will be ok.

 

Apache Tika looks like a nice solution.

 

My client is a Microsoft shop and they use a product called Aspose to
convert/extract data from spreadsheets.

 

The majority of spreadsheet formats that I need to ingest use the older
97/2003 format. I can use the Aspose API to covert the older format to OOXML
on the fly.

 

It's unfortunate that the MarkLogic xdmp:document-filter() API is not able
to extract the defined name metadata from the 97/2003 file format.

 

I consider it to be a bug in the MarkLogic API because other Excel
Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract
this data from the older file format. 

 

Anyway, thanks for the info.

 

-  Gary R

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

 

   If it's feasible to do your metadata extraction upstream of MarkLogic
(i.e., before insertion) you might take a look at Apache Tika.  It's
designed for this sort of thing.

 

   You could also setup it up in a simple web service callable from
MarkLogic.  POST the spreadsheet to it and have it return the metadata in
whatever form you like.


---

Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

 

On Oct 17, 2014, at 3:35 PM, Gary Russo garyru...@hotmail.com wrote:

 

Hello Dennis,

 

Thanks for the info.

 

Yes, I tried xdmp:excel-convert() but this does not get the worksheet
metadata either.

 

The metadata that I need to retrieve from the older excel format is the
Named Fields.

 

Users create them using the Excel Named Box feature as shown here. =
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

 

It looks like my only option is to use the Apache POI Java API to extract
the named fields or use it to convert xls-to-xlsx on-the-fly. =
https://poi.apache.org/apidocs https://poi.apache.org/apidocs

 

I know there's a hidden way to use MarkLogic's underlying JVM.

 

It would be great if I could use it to call the Apache POI code.

 

But that's a question for another day.

 

Thanks again,

 

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com

 http://twitter.com/garyprusso http://twitter.com/garyprusso

 

 

 

From:  mailto:general-boun...@developer.marklogic.com
general-boun...@developer.marklogic.com [mailto:general-
mailto:boun...@developer.marklogic.com boun...@developer.marklogic.com] On
Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

HI.

 

I believe that with the conversion licence, you can do what you want with:
xdmp:excel-convert

 

Barring that, you could always run openoffice as a headless server for
conversion purposes.

 

Kind Regards,

David Ennis

 

 




 

 

Kind Regards,

David Ennis

 

 

David Ennis
Content Engineer

 http://www.hinttech.com/ HintTech 
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80 

 http://www.hinttech.com http://www.hinttech.com
https://twitter.com/HintTech   

Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-20 Thread Gary Russo
Hello Ron,

 

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

 

It complicates things a little bit but it will be ok.

 

Apache Tika looks like a nice solution.

 

My client is a Microsoft shop and they use a product called Aspose to
convert/extract data from spreadsheets.

 

The majority of spreadsheet formats that I need to ingest use the older
97/2003 format. I can use the Aspose API to covert the older format to OOXML
on the fly.

 

It's unfortunate that the MarkLogic xdmp:document-filter() API is not able
to extract the defined name metadata from the 97/2003 file format.

 

I consider it to be a bug in the MarkLogic API because other Excel
Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract
this data from the older file format. 

 

Anyway, thanks for the info.

 

-  Gary R

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

 

   If it's feasible to do your metadata extraction upstream of MarkLogic
(i.e., before insertion) you might take a look at Apache Tika.  It's
designed for this sort of thing.

 

   You could also setup it up in a simple web service callable from
MarkLogic.  POST the spreadsheet to it and have it return the metadata in
whatever form you like.


---

Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

 

On Oct 17, 2014, at 3:35 PM, Gary Russo garyru...@hotmail.com wrote:





Hello Dennis,

 

Thanks for the info.

 

Yes, I tried xdmp:excel-convert() but this does not get the worksheet
metadata either.

 

The metadata that I need to retrieve from the older excel format is the
Named Fields.

 

Users create them using the Excel Named Box feature as shown here. =
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

 

It looks like my only option is to use the Apache POI Java API to extract
the named fields or use it to convert xls-to-xlsx on-the-fly. =
https://poi.apache.org/apidocs https://poi.apache.org/apidocs

 

I know there's a hidden way to use MarkLogic's underlying JVM.

 

It would be great if I could use it to call the Apache POI code.

 

But that's a question for another day.

 

Thanks again,

 

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com

 http://twitter.com/garyprusso http://twitter.com/garyprusso

 

 

 

From:  mailto:general-boun...@developer.marklogic.com
general-boun...@developer.marklogic.com [mailto:general-
mailto:boun...@developer.marklogic.com boun...@developer.marklogic.com] On
Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

HI.

 

I believe that with the conversion licence, you can do what you want with:
xdmp:excel-convert

 

Barring that, you could always run openoffice as a headless server for
conversion purposes.

 

Kind Regards,

David Ennis

 

 




 

 

Kind Regards,

David Ennis

 

 

David Ennis
Content Engineer

 http://www.hinttech.com/ HintTech 
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80 

 http://www.hinttech.com http://www.hinttech.com
https://twitter.com/HintTech   http://www.facebook.com/HintTech
http://www.linkedin.com/company/HintTech 

 

On 16 October 2014 20:00, Gary Russo  mailto:garyru...@hotmail.com
garyru...@hotmail.com wrote:

I need to extract worksheet metadata called defined name from Excel
97/2003 formatted spreadsheets.

 

The ISYS xdmp:document-filter() API is limiting because it only extracts the
text.

 

It does not extract any worksheet metadata.

 

Does anyone know of a workaround for this?

 

My only thought is to upload the Excel 97/2003 xls file and then convert
it on the server to an Excel 2010 xlsx format.

 

Once it's in an Excel 2010 format, I can easily extract the defined name
metadata.

 

This is what it looks like in Excel 2010 files.

 

  definedNames
definedName name=LastYrRevenue!$B$6:$B$15/definedName
definedName name=ThisYrRevenue!$C$6:$C$15/definedName
definedName name=VarianceRevenue!$D$6:$D$15/definedName
  /definedNames

 

 

Thanks,

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

Phone:  tel:212-404-8639 212-404-8639

Skype: garyprusso

 http://garyrusso.wordpress.com http://garyrusso.wordpress.com

 


___
General mailing list
 mailto:General@developer.marklogic.com General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general
http://developer.marklogic.com/mailman/listinfo/general

 

Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-20 Thread Pete Aven
In case it's useful, Microsoft also provides the Microsoft Office Planning 
Manager and Office Compatibility Packs free, which will allow you to bulk 
convert older Office formats ( 2003 and earlier ) to the new OOXML formats.

https://www.microsoft.com/en-us/download/details.aspx?id=21888#filelist
https://www.microsoft.com/en-us/download/details.aspx?id=3

Pete

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Russo
Sent: Monday, October 20, 2014 10:44 AM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?

Hello Ron,

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

It complicates things a little bit but it will be ok.

Apache Tika looks like a nice solution.

My client is a Microsoft shop and they use a product called Aspose to 
convert/extract data from spreadsheets.

The majority of spreadsheet formats that I need to ingest use the older 97/2003 
format. I can use the Aspose API to covert the older format to OOXML on the fly.

It's unfortunate that the MarkLogic xdmp:document-filter() API is not able to 
extract the defined name metadata from the 97/2003 file format.

I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet 
extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the 
older file format.

Anyway, thanks for the info.


-  Gary R



From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?


   If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., 
before insertion) you might take a look at Apache Tika.  It's designed for this 
sort of thing.

   You could also setup it up in a simple web service callable from MarkLogic.  
POST the spreadsheet to it and have it return the metadata in whatever form you 
like.

---
Ron Hitchens {r...@overstory.co.ukmailto:r...@overstory.co.uk}  +44 7879 
358212

On Oct 17, 2014, at 3:35 PM, Gary Russo 
garyru...@hotmail.commailto:garyru...@hotmail.com wrote:

Hello Dennis,

Thanks for the info.

Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata 
either.

The metadata that I need to retrieve from the older excel format is the Named 
Fields.

Users create them using the Excel Named Box feature as shown here. = 
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

It looks like my only option is to use the Apache POI Java API to extract the 
named fields or use it to convert xls-to-xlsx on-the-fly. 
=https://poi.apache.org/apidocs

I know there's a hidden way to use MarkLogic's underlying JVM.

It would be great if I could use it to call the Apache POI code.

But that's a question for another day.

Thanks again,

Gary Russo


Gary Russo
Enterprise NoSQL Developer
http://garyrusso.wordpress.com
http://twitter.com/garyprusso



From: 
general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com
 
[mailto:general-boun...@developer.marklogic.commailto:boun...@developer.marklogic.com]
 On Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?

HI.

I believe that with the conversion licence, you can do what you want with: 
xdmp:excel-convert

Barring that, you could always run openoffice as a headless server for 
conversion purposes.

Kind Regards,
David Ennis





Kind Regards,
David Ennis


David Ennis
Content Engineer

[HintTech] http://www.hinttech.com/
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[http://www.hinttech.com]http://www.hinttech.com 
[http://www.hinttech.com/signature/Twitter_HintTech.png] 
https://twitter.com/HintTech  
[http://www.hinttech.com/signature/Facebook_HintTech.png] 
http://www.facebook.com/HintTech  
[http://www.hinttech.com/signature/Linkedin_HintTech.png] 
http://www.linkedin.com/company/HintTech

On 16 October 2014 20:00, Gary Russo 
garyru...@hotmail.commailto:garyru...@hotmail.com wrote:
I need to extract worksheet metadata called defined name from Excel 97/2003 
formatted spreadsheets.

The ISYS xdmp:document-filter() API is limiting because it only extracts the 
text.

It does not extract any worksheet metadata.

Does anyone know of a workaround for this?

My only thought is to upload the Excel 97/2003 xls file and then convert it 
on the server to an Excel 2010 xlsx format.

Once it's in an Excel 2010 format, I can easily extract the defined name 
metadata.

This is what it looks like in Excel 2010 files.

  definedNames

Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-17 Thread Ron Hitchens

   If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., 
before insertion) you might take a look at Apache Tika.  It's designed for this 
sort of thing.

   You could also setup it up in a simple web service callable from MarkLogic.  
POST the spreadsheet to it and have it return the metadata in whatever form you 
like.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

On Oct 17, 2014, at 3:35 PM, Gary Russo garyru...@hotmail.com wrote:

 Hello Dennis,
  
 Thanks for the info.
  
 Yes, I tried xdmp:excel-convert() but this does not get the worksheet 
 metadata either.
  
 The metadata that I need to retrieve from the older excel format is the 
 “Named Fields”.
  
 Users create them using the Excel “Named Box” feature as shown here. = 
 http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm
  
 It looks like my only option is to use the Apache POI Java API to extract the 
 named fields or use it to convert xls-to-xlsx on-the-fly. 
 =https://poi.apache.org/apidocs
  
 I know there’s a hidden way to use MarkLogic’s underlying JVM.
  
 It would be great if I could use it to call the Apache POI code.
  
 But that’s a question for another day.
  
 Thanks again,
  
 Gary Russo
  
  
 Gary Russo
 Enterprise NoSQL Developer
 http://garyrusso.wordpress.com
 http://twitter.com/garyprusso
  
  
  
 From: general-boun...@developer.marklogic.com 
 [mailto:general-boun...@developer.marklogic.com] On Behalf Of David Ennis
 Sent: Thursday, October 16, 2014 5:02 PM
 To: MarkLogic Developer Discussion
 Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
 metadata from an Excel 97/2003?
  
 HI.
  
 I believe that with the conversion licence, you can do what you want with: 
 xdmp:excel-convert
  
 Barring that, you could always run openoffice as a headless server for 
 conversion purposes.
  
 Kind Regards,
 David Ennis
  
  
 
  
  
 Kind Regards,
 David Ennis
  
  
 David Ennis
 Content Engineer
 
  
 Mastering the value of content
 creative | technology | content
 
 Delftechpark 37i
 2628 XJ Delft
 The Netherlands
 T: +31 88 268 25 00
 M: +31 63 091 72 80 
 

  
 On 16 October 2014 20:00, Gary Russo garyru...@hotmail.com wrote:
 I need to extract worksheet metadata called “defined name” from Excel 97/2003 
 formatted spreadsheets.
  
 The ISYS xdmp:document-filter() API is limiting because it only extracts the 
 text.
  
 It does not extract any worksheet metadata.
  
 Does anyone know of a workaround for this?
  
 My only thought is to upload the “Excel 97/2003” xls file and then convert it 
 on the server to an “Excel 2010” xlsx format.
  
 Once it’s in an Excel 2010 format, I can easily extract the “defined name” 
 metadata.
  
 This is what it looks like in “Excel 2010” files.
  
   definedNames
 definedName name=LastYrRevenue!$B$6:$B$15/definedName
 definedName name=ThisYrRevenue!$C$6:$C$15/definedName
 definedName name=VarianceRevenue!$D$6:$D$15/definedName
   /definedNames
  
  
 Thanks,
 Gary Russo
  
  
 Gary Russo
 Enterprise NoSQL Developer
 Phone: 212-404-8639
 Skype: garyprusso
 http://garyrusso.wordpress.com
  
 
 ___
 General mailing list
 General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general
 
  
 ___
 General mailing list
 General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


[MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-16 Thread Gary Russo
I need to extract worksheet metadata called defined name from Excel
97/2003 formatted spreadsheets.

 

The ISYS xdmp:document-filter() API is limiting because it only extracts the
text.

 

It does not extract any worksheet metadata.

 

Does anyone know of a workaround for this?

 

My only thought is to upload the Excel 97/2003 xls file and then convert
it on the server to an Excel 2010 xlsx format.

 

Once it's in an Excel 2010 format, I can easily extract the defined name
metadata.

 

This is what it looks like in Excel 2010 files.

 

  definedNames
definedName name=LastYrRevenue!$B$6:$B$15/definedName
definedName name=ThisYrRevenue!$C$6:$C$15/definedName
definedName name=VarianceRevenue!$D$6:$D$15/definedName
  /definedNames

 

 

Thanks,

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

Phone: 212-404-8639

Skype: garyprusso

http://garyrusso.wordpress.com

 

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-16 Thread David Ennis
HI.

I believe that with the conversion licence, you can do what you want
with: xdmp:excel-convert

Barring that, you could always run openoffice as a headless server for
conversion purposes.

Kind Regards,
David Ennis





Kind Regards,
David Ennis


David Ennis
*Content Engineer*

[image: HintTech]  http://www.hinttech.com/
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[image: http://www.hinttech.com] http://www.hinttech.com
https://twitter.com/HintTech  http://www.facebook.com/HintTech
http://www.linkedin.com/company/HintTech

On 16 October 2014 20:00, Gary Russo garyru...@hotmail.com wrote:

 I need to extract worksheet metadata called “defined name” from Excel
 97/2003 formatted spreadsheets.



 The ISYS xdmp:document-filter() API is limiting because it only extracts
 the text.



 It does not extract any worksheet metadata.



 Does anyone know of a workaround for this?



 My only thought is to upload the “Excel 97/2003” xls file and then convert
 it on the server to an “Excel 2010” xlsx format.



 Once it’s in an Excel 2010 format, I can easily extract the “defined name”
 metadata.



 This is what it looks like in “Excel 2010” files.



   definedNames
 definedName name=LastYrRevenue!$B$6:$B$15/definedName
 definedName name=ThisYrRevenue!$C$6:$C$15/definedName
 definedName name=VarianceRevenue!$D$6:$D$15/definedName
   /definedNames





 Thanks,

 Gary Russo





 *Gary Russo*

 *Enterprise NoSQL Developer*

 *Phone: 212-404-8639 212-404-8639*

 *Skype: garyprusso*

 *http://garyrusso.wordpress.com http://garyrusso.wordpress.com*



 ___
 General mailing list
 General@developer.marklogic.com
 http://developer.marklogic.com/mailman/listinfo/general


___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general