Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?
Thanks Pete I will definitely consider using this bulk conversion utility. The application that I'm building will ingest a wide range of Excel spreadsheets. Once ingested, the system will aggregate the data. It will also generate unified spreadsheets. Your Excel XQuery blog post was helpful. = http://developer.marklogic.com/blog/smallchanges/2007-12-04 From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Pete Aven Sent: Monday, October 20, 2014 12:23 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? In case it's useful, Microsoft also provides the Microsoft Office Planning Manager and Office Compatibility Packs free, which will allow you to bulk convert older Office formats ( 2003 and earlier ) to the new OOXML formats. https://www.microsoft.com/en-us/download/details.aspx?id=21888#filelist https://www.microsoft.com/en-us/download/details.aspx?id=3 Pete From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Russo Sent: Monday, October 20, 2014 10:44 AM To: 'MarkLogic Developer Discussion' Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? Hello Ron, Yes, it is feasible to do the metadata extraction upstream of MarkLogic. It complicates things a little bit but it will be ok. Apache Tika looks like a nice solution. My client is a Microsoft shop and they use a product called Aspose to convert/extract data from spreadsheets. The majority of spreadsheet formats that I need to ingest use the older 97/2003 format. I can use the Aspose API to covert the older format to OOXML on the fly. It's unfortunate that the MarkLogic xdmp:document-filter() API is not able to extract the defined name metadata from the 97/2003 file format. I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the older file format. Anyway, thanks for the info. - Gary R From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens Sent: Friday, October 17, 2014 11:52 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., before insertion) you might take a look at Apache Tika. It's designed for this sort of thing. You could also setup it up in a simple web service callable from MarkLogic. POST the spreadsheet to it and have it return the metadata in whatever form you like. --- Ron Hitchens {r...@overstory.co.uk} +44 7879 358212 On Oct 17, 2014, at 3:35 PM, Gary Russo garyru...@hotmail.com wrote: Hello Dennis, Thanks for the info. Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either. The metadata that I need to retrieve from the older excel format is the Named Fields. Users create them using the Excel Named Box feature as shown here. = http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. = https://poi.apache.org/apidocs https://poi.apache.org/apidocs I know there's a hidden way to use MarkLogic's underlying JVM. It would be great if I could use it to call the Apache POI code. But that's a question for another day. Thanks again, Gary Russo Gary Russo Enterprise NoSQL Developer http://garyrusso.wordpress.com http://twitter.com/garyprusso http://twitter.com/garyprusso From: mailto:general-boun...@developer.marklogic.com general-boun...@developer.marklogic.com [mailto:general- mailto:boun...@developer.marklogic.com boun...@developer.marklogic.com] On Behalf Of David Ennis Sent: Thursday, October 16, 2014 5:02 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? HI. I believe that with the conversion licence, you can do what you want with: xdmp:excel-convert Barring that, you could always run openoffice as a headless server for conversion purposes. Kind Regards, David Ennis Kind Regards, David Ennis David Ennis Content Engineer http://www.hinttech.com/ HintTech Mastering the value of content creative | technology | content Delftechpark 37i 2628 XJ Delft The Netherlands T: +31 88 268 25 00 M: +31 63 091 72 80 http://www.hinttech.com http://www.hinttech.com https://twitter.com/HintTech
Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?
Hello Ron, Yes, it is feasible to do the metadata extraction upstream of MarkLogic. It complicates things a little bit but it will be ok. Apache Tika looks like a nice solution. My client is a Microsoft shop and they use a product called Aspose to convert/extract data from spreadsheets. The majority of spreadsheet formats that I need to ingest use the older 97/2003 format. I can use the Aspose API to covert the older format to OOXML on the fly. It's unfortunate that the MarkLogic xdmp:document-filter() API is not able to extract the defined name metadata from the 97/2003 file format. I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the older file format. Anyway, thanks for the info. - Gary R From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens Sent: Friday, October 17, 2014 11:52 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., before insertion) you might take a look at Apache Tika. It's designed for this sort of thing. You could also setup it up in a simple web service callable from MarkLogic. POST the spreadsheet to it and have it return the metadata in whatever form you like. --- Ron Hitchens {r...@overstory.co.uk} +44 7879 358212 On Oct 17, 2014, at 3:35 PM, Gary Russo garyru...@hotmail.com wrote: Hello Dennis, Thanks for the info. Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either. The metadata that I need to retrieve from the older excel format is the Named Fields. Users create them using the Excel Named Box feature as shown here. = http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. = https://poi.apache.org/apidocs https://poi.apache.org/apidocs I know there's a hidden way to use MarkLogic's underlying JVM. It would be great if I could use it to call the Apache POI code. But that's a question for another day. Thanks again, Gary Russo Gary Russo Enterprise NoSQL Developer http://garyrusso.wordpress.com http://twitter.com/garyprusso http://twitter.com/garyprusso From: mailto:general-boun...@developer.marklogic.com general-boun...@developer.marklogic.com [mailto:general- mailto:boun...@developer.marklogic.com boun...@developer.marklogic.com] On Behalf Of David Ennis Sent: Thursday, October 16, 2014 5:02 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? HI. I believe that with the conversion licence, you can do what you want with: xdmp:excel-convert Barring that, you could always run openoffice as a headless server for conversion purposes. Kind Regards, David Ennis Kind Regards, David Ennis David Ennis Content Engineer http://www.hinttech.com/ HintTech Mastering the value of content creative | technology | content Delftechpark 37i 2628 XJ Delft The Netherlands T: +31 88 268 25 00 M: +31 63 091 72 80 http://www.hinttech.com http://www.hinttech.com https://twitter.com/HintTech http://www.facebook.com/HintTech http://www.linkedin.com/company/HintTech On 16 October 2014 20:00, Gary Russo mailto:garyru...@hotmail.com garyru...@hotmail.com wrote: I need to extract worksheet metadata called defined name from Excel 97/2003 formatted spreadsheets. The ISYS xdmp:document-filter() API is limiting because it only extracts the text. It does not extract any worksheet metadata. Does anyone know of a workaround for this? My only thought is to upload the Excel 97/2003 xls file and then convert it on the server to an Excel 2010 xlsx format. Once it's in an Excel 2010 format, I can easily extract the defined name metadata. This is what it looks like in Excel 2010 files. definedNames definedName name=LastYrRevenue!$B$6:$B$15/definedName definedName name=ThisYrRevenue!$C$6:$C$15/definedName definedName name=VarianceRevenue!$D$6:$D$15/definedName /definedNames Thanks, Gary Russo Gary Russo Enterprise NoSQL Developer Phone: tel:212-404-8639 212-404-8639 Skype: garyprusso http://garyrusso.wordpress.com http://garyrusso.wordpress.com ___ General mailing list mailto:General@developer.marklogic.com General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?
In case it's useful, Microsoft also provides the Microsoft Office Planning Manager and Office Compatibility Packs free, which will allow you to bulk convert older Office formats ( 2003 and earlier ) to the new OOXML formats. https://www.microsoft.com/en-us/download/details.aspx?id=21888#filelist https://www.microsoft.com/en-us/download/details.aspx?id=3 Pete From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Russo Sent: Monday, October 20, 2014 10:44 AM To: 'MarkLogic Developer Discussion' Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? Hello Ron, Yes, it is feasible to do the metadata extraction upstream of MarkLogic. It complicates things a little bit but it will be ok. Apache Tika looks like a nice solution. My client is a Microsoft shop and they use a product called Aspose to convert/extract data from spreadsheets. The majority of spreadsheet formats that I need to ingest use the older 97/2003 format. I can use the Aspose API to covert the older format to OOXML on the fly. It's unfortunate that the MarkLogic xdmp:document-filter() API is not able to extract the defined name metadata from the 97/2003 file format. I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the older file format. Anyway, thanks for the info. - Gary R From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens Sent: Friday, October 17, 2014 11:52 AM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., before insertion) you might take a look at Apache Tika. It's designed for this sort of thing. You could also setup it up in a simple web service callable from MarkLogic. POST the spreadsheet to it and have it return the metadata in whatever form you like. --- Ron Hitchens {r...@overstory.co.ukmailto:r...@overstory.co.uk} +44 7879 358212 On Oct 17, 2014, at 3:35 PM, Gary Russo garyru...@hotmail.commailto:garyru...@hotmail.com wrote: Hello Dennis, Thanks for the info. Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either. The metadata that I need to retrieve from the older excel format is the Named Fields. Users create them using the Excel Named Box feature as shown here. = http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. =https://poi.apache.org/apidocs I know there's a hidden way to use MarkLogic's underlying JVM. It would be great if I could use it to call the Apache POI code. But that's a question for another day. Thanks again, Gary Russo Gary Russo Enterprise NoSQL Developer http://garyrusso.wordpress.com http://twitter.com/garyprusso From: general-boun...@developer.marklogic.commailto:general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.commailto:boun...@developer.marklogic.com] On Behalf Of David Ennis Sent: Thursday, October 16, 2014 5:02 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? HI. I believe that with the conversion licence, you can do what you want with: xdmp:excel-convert Barring that, you could always run openoffice as a headless server for conversion purposes. Kind Regards, David Ennis Kind Regards, David Ennis David Ennis Content Engineer [HintTech] http://www.hinttech.com/ Mastering the value of content creative | technology | content Delftechpark 37i 2628 XJ Delft The Netherlands T: +31 88 268 25 00 M: +31 63 091 72 80 [http://www.hinttech.com]http://www.hinttech.com [http://www.hinttech.com/signature/Twitter_HintTech.png] https://twitter.com/HintTech [http://www.hinttech.com/signature/Facebook_HintTech.png] http://www.facebook.com/HintTech [http://www.hinttech.com/signature/Linkedin_HintTech.png] http://www.linkedin.com/company/HintTech On 16 October 2014 20:00, Gary Russo garyru...@hotmail.commailto:garyru...@hotmail.com wrote: I need to extract worksheet metadata called defined name from Excel 97/2003 formatted spreadsheets. The ISYS xdmp:document-filter() API is limiting because it only extracts the text. It does not extract any worksheet metadata. Does anyone know of a workaround for this? My only thought is to upload the Excel 97/2003 xls file and then convert it on the server to an Excel 2010 xlsx format. Once it's in an Excel 2010 format, I can easily extract the defined name metadata. This is what it looks like in Excel 2010 files. definedNames
Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?
If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., before insertion) you might take a look at Apache Tika. It's designed for this sort of thing. You could also setup it up in a simple web service callable from MarkLogic. POST the spreadsheet to it and have it return the metadata in whatever form you like. --- Ron Hitchens {r...@overstory.co.uk} +44 7879 358212 On Oct 17, 2014, at 3:35 PM, Gary Russo garyru...@hotmail.com wrote: Hello Dennis, Thanks for the info. Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata either. The metadata that I need to retrieve from the older excel format is the “Named Fields”. Users create them using the Excel “Named Box” feature as shown here. = http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm It looks like my only option is to use the Apache POI Java API to extract the named fields or use it to convert xls-to-xlsx on-the-fly. =https://poi.apache.org/apidocs I know there’s a hidden way to use MarkLogic’s underlying JVM. It would be great if I could use it to call the Apache POI code. But that’s a question for another day. Thanks again, Gary Russo Gary Russo Enterprise NoSQL Developer http://garyrusso.wordpress.com http://twitter.com/garyprusso From: general-boun...@developer.marklogic.com [mailto:general-boun...@developer.marklogic.com] On Behalf Of David Ennis Sent: Thursday, October 16, 2014 5:02 PM To: MarkLogic Developer Discussion Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003? HI. I believe that with the conversion licence, you can do what you want with: xdmp:excel-convert Barring that, you could always run openoffice as a headless server for conversion purposes. Kind Regards, David Ennis Kind Regards, David Ennis David Ennis Content Engineer Mastering the value of content creative | technology | content Delftechpark 37i 2628 XJ Delft The Netherlands T: +31 88 268 25 00 M: +31 63 091 72 80 On 16 October 2014 20:00, Gary Russo garyru...@hotmail.com wrote: I need to extract worksheet metadata called “defined name” from Excel 97/2003 formatted spreadsheets. The ISYS xdmp:document-filter() API is limiting because it only extracts the text. It does not extract any worksheet metadata. Does anyone know of a workaround for this? My only thought is to upload the “Excel 97/2003” xls file and then convert it on the server to an “Excel 2010” xlsx format. Once it’s in an Excel 2010 format, I can easily extract the “defined name” metadata. This is what it looks like in “Excel 2010” files. definedNames definedName name=LastYrRevenue!$B$6:$B$15/definedName definedName name=ThisYrRevenue!$C$6:$C$15/definedName definedName name=VarianceRevenue!$D$6:$D$15/definedName /definedNames Thanks, Gary Russo Gary Russo Enterprise NoSQL Developer Phone: 212-404-8639 Skype: garyprusso http://garyrusso.wordpress.com ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?
I need to extract worksheet metadata called defined name from Excel 97/2003 formatted spreadsheets. The ISYS xdmp:document-filter() API is limiting because it only extracts the text. It does not extract any worksheet metadata. Does anyone know of a workaround for this? My only thought is to upload the Excel 97/2003 xls file and then convert it on the server to an Excel 2010 xlsx format. Once it's in an Excel 2010 format, I can easily extract the defined name metadata. This is what it looks like in Excel 2010 files. definedNames definedName name=LastYrRevenue!$B$6:$B$15/definedName definedName name=ThisYrRevenue!$C$6:$C$15/definedName definedName name=VarianceRevenue!$D$6:$D$15/definedName /definedNames Thanks, Gary Russo Gary Russo Enterprise NoSQL Developer Phone: 212-404-8639 Skype: garyprusso http://garyrusso.wordpress.com ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?
HI. I believe that with the conversion licence, you can do what you want with: xdmp:excel-convert Barring that, you could always run openoffice as a headless server for conversion purposes. Kind Regards, David Ennis Kind Regards, David Ennis David Ennis *Content Engineer* [image: HintTech] http://www.hinttech.com/ Mastering the value of content creative | technology | content Delftechpark 37i 2628 XJ Delft The Netherlands T: +31 88 268 25 00 M: +31 63 091 72 80 [image: http://www.hinttech.com] http://www.hinttech.com https://twitter.com/HintTech http://www.facebook.com/HintTech http://www.linkedin.com/company/HintTech On 16 October 2014 20:00, Gary Russo garyru...@hotmail.com wrote: I need to extract worksheet metadata called “defined name” from Excel 97/2003 formatted spreadsheets. The ISYS xdmp:document-filter() API is limiting because it only extracts the text. It does not extract any worksheet metadata. Does anyone know of a workaround for this? My only thought is to upload the “Excel 97/2003” xls file and then convert it on the server to an “Excel 2010” xlsx format. Once it’s in an Excel 2010 format, I can easily extract the “defined name” metadata. This is what it looks like in “Excel 2010” files. definedNames definedName name=LastYrRevenue!$B$6:$B$15/definedName definedName name=ThisYrRevenue!$C$6:$C$15/definedName definedName name=VarianceRevenue!$D$6:$D$15/definedName /definedNames Thanks, Gary Russo *Gary Russo* *Enterprise NoSQL Developer* *Phone: 212-404-8639 212-404-8639* *Skype: garyprusso* *http://garyrusso.wordpress.com http://garyrusso.wordpress.com* ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general