Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-20 Thread Pete Aven
In case it's useful, Microsoft also provides the Microsoft Office Planning 
Manager and Office Compatibility Packs free, which will allow you to bulk 
convert older Office formats ( 2003 and earlier ) to the new OOXML formats.

https://www.microsoft.com/en-us/download/details.aspx?id=21888#filelist
https://www.microsoft.com/en-us/download/details.aspx?id=3

Pete

From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Gary Russo
Sent: Monday, October 20, 2014 10:44 AM
To: 'MarkLogic Developer Discussion'
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?

Hello Ron,

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

It complicates things a little bit but it will be ok.

Apache Tika looks like a nice solution.

My client is a Microsoft shop and they use a product called Aspose to 
convert/extract data from spreadsheets.

The majority of spreadsheet formats that I need to ingest use the older 97/2003 
format. I can use the Aspose API to covert the older format to OOXML on the fly.

It's unfortunate that the MarkLogic xdmp:document-filter() API is not able to 
extract the "defined name" metadata from the "97/2003" file format.

I consider it to be a bug in the MarkLogic API because other Excel Spreadsheet 
extraction APIs (e.g., Aspose, Tika, Apache POI) can extract this data from the 
older file format.

Anyway, thanks for the info.


-  Gary R



From: general-boun...@developer.marklogic.com 
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?


   If it's feasible to do your metadata extraction upstream of MarkLogic (i.e., 
before insertion) you might take a look at Apache Tika.  It's designed for this 
sort of thing.

   You could also setup it up in a simple web service callable from MarkLogic.  
POST the spreadsheet to it and have it return the metadata in whatever form you 
like.

---
Ron Hitchens {r...@overstory.co.uk}  +44 7879 
358212

On Oct 17, 2014, at 3:35 PM, Gary Russo 
mailto:garyru...@hotmail.com>> wrote:

Hello Dennis,

Thanks for the info.

Yes, I tried xdmp:excel-convert() but this does not get the worksheet metadata 
either.

The metadata that I need to retrieve from the older excel format is the "Named 
Fields".

Users create them using the Excel "Named Box" feature as shown here. => 
http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

It looks like my only option is to use the Apache POI Java API to extract the 
named fields or use it to convert xls-to-xlsx on-the-fly. 
=>https://poi.apache.org/apidocs

I know there's a hidden way to use MarkLogic's underlying JVM.

It would be great if I could use it to call the Apache POI code.

But that's a question for another day.

Thanks again,

Gary Russo


Gary Russo
Enterprise NoSQL Developer
http://garyrusso.wordpress.com
http://twitter.com/garyprusso



From: 
general-boun...@developer.marklogic.com
 
[mailto:general-boun...@developer.marklogic.com]
 On Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet 
metadata from an Excel 97/2003?

HI.

I believe that with the conversion licence, you can do what you want with: 
xdmp:excel-convert

Barring that, you could always run openoffice as a headless server for 
conversion purposes.

Kind Regards,
David Ennis





Kind Regards,
David Ennis


David Ennis
Content Engineer

[HintTech] 
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80

[http://www.hinttech.com] 
[http://www.hinttech.com/signature/Twitter_HintTech.png] 
  
[http://www.hinttech.com/signature/Facebook_HintTech.png] 
  
[http://www.hinttech.com/signature/Linkedin_HintTech.png] 


On 16 October 2014 20:00, Gary Russo 
mailto:garyru...@hotmail.com>> wrote:
I need to extract worksheet metadata called "defined name" from Excel 97/2003 
formatted spreadsheets.

The ISYS xdmp:document-filter() API is limiting because it only extracts the 
text.

It does not extract any worksheet metadata.

Does anyone know of a workaround for this?

My only thought is to upload the "Excel 97/2003" xls file and then convert it 
on the server to an "Excel 2010" xlsx format.

Once it's in an Excel 2010 format, I can easily extract the "defined name" 
metadata.

This is what it looks like in "Excel 2010" files.

  
Revenue!$B

Re: [MarkLogic Dev General] Is there a way to extract worksheet metadata from an Excel 97/2003?

2014-10-20 Thread Gary Russo
Hello Ron,

 

Yes, it is feasible to do the metadata extraction upstream of MarkLogic.

 

It complicates things a little bit but it will be ok.

 

Apache Tika looks like a nice solution.

 

My client is a Microsoft shop and they use a product called Aspose to
convert/extract data from spreadsheets.

 

The majority of spreadsheet formats that I need to ingest use the older
97/2003 format. I can use the Aspose API to covert the older format to OOXML
on the fly.

 

It's unfortunate that the MarkLogic xdmp:document-filter() API is not able
to extract the "defined name" metadata from the "97/2003" file format.

 

I consider it to be a bug in the MarkLogic API because other Excel
Spreadsheet extraction APIs (e.g., Aspose, Tika, Apache POI) can extract
this data from the older file format. 

 

Anyway, thanks for the info.

 

-  Gary R

 

 

 

From: general-boun...@developer.marklogic.com
[mailto:general-boun...@developer.marklogic.com] On Behalf Of Ron Hitchens
Sent: Friday, October 17, 2014 11:52 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

 

   If it's feasible to do your metadata extraction upstream of MarkLogic
(i.e., before insertion) you might take a look at Apache Tika.  It's
designed for this sort of thing.

 

   You could also setup it up in a simple web service callable from
MarkLogic.  POST the spreadsheet to it and have it return the metadata in
whatever form you like.


---

Ron Hitchens {r...@overstory.co.uk}  +44 7879 358212

 

On Oct 17, 2014, at 3:35 PM, Gary Russo  wrote:





Hello Dennis,

 

Thanks for the info.

 

Yes, I tried xdmp:excel-convert() but this does not get the worksheet
metadata either.

 

The metadata that I need to retrieve from the older excel format is the
"Named Fields".

 

Users create them using the Excel "Named Box" feature as shown here. =>

http://spreadsheets.about.com/od/exceltips/qt/81225namebox.htm

 

It looks like my only option is to use the Apache POI Java API to extract
the named fields or use it to convert xls-to-xlsx on-the-fly. =>
 https://poi.apache.org/apidocs

 

I know there's a hidden way to use MarkLogic's underlying JVM.

 

It would be great if I could use it to call the Apache POI code.

 

But that's a question for another day.

 

Thanks again,

 

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

http://garyrusso.wordpress.com

  http://twitter.com/garyprusso

 

 

 

From:  
general-boun...@developer.marklogic.com [mailto:general-
 boun...@developer.marklogic.com] On
Behalf Of David Ennis
Sent: Thursday, October 16, 2014 5:02 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] Is there a way to extract worksheet
metadata from an Excel 97/2003?

 

HI.

 

I believe that with the conversion licence, you can do what you want with:
xdmp:excel-convert

 

Barring that, you could always run openoffice as a headless server for
conversion purposes.

 

Kind Regards,

David Ennis

 

 




 

 

Kind Regards,

David Ennis

 

 

David Ennis
Content Engineer

  HintTech 
Mastering the value of content
creative | technology | content

Delftechpark 37i
2628 XJ Delft
The Netherlands
T: +31 88 268 25 00
M: +31 63 091 72 80 

  http://www.hinttech.com
   
 

 

On 16 October 2014 20:00, Gary Russo < 
garyru...@hotmail.com> wrote:

I need to extract worksheet metadata called "defined name" from Excel
97/2003 formatted spreadsheets.

 

The ISYS xdmp:document-filter() API is limiting because it only extracts the
text.

 

It does not extract any worksheet metadata.

 

Does anyone know of a workaround for this?

 

My only thought is to upload the "Excel 97/2003" xls file and then convert
it on the server to an "Excel 2010" xlsx format.

 

Once it's in an Excel 2010 format, I can easily extract the "defined name"
metadata.

 

This is what it looks like in "Excel 2010" files.

 

  
Revenue!$B$6:$B$15
Revenue!$C$6:$C$15
Revenue!$D$6:$D$15
  

 

 

Thanks,

Gary Russo

 

 

Gary Russo

Enterprise NoSQL Developer

Phone:   212-404-8639

Skype: garyprusso

  http://garyrusso.wordpress.com

 


___
General mailing list
  General@developer.marklogic.com
 
http://developer.marklogic.com/mailman/listinfo/general

 

___
General mailing list
  General

Re: [MarkLogic Dev General] Element Range Query with DateTime and Durations

2014-10-20 Thread Wanczowski, Andrew
The business rules are actually contained in multiple documents. They contain 
other constraints besides embargo information. The 2-stage query is the 
approach I was thinking of as well. Perhaps getting all  co-occurrences from 
the rules documents then filtering on the article documents would work.

From: Dave Cassel mailto:dave.cas...@marklogic.com>>
Date: Monday, October 20, 2014 9:22 AM
To: Andrew Wanczowski 
mailto:andrew_wanczow...@condenast.com>>, John 
Snelson mailto:john.snel...@marklogic.com>>, 
"general@developer.marklogic.com" 
mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and 
Durations

Drew, if I follow correctly, you're looking at a 2-stage query, where your 
first stage finds the criteria and the second stage applies those criteria to 
the real search. Since you have range indexes set up on the relevant data, make 
sure you're using that to gather the criteria. In other words, rather than:

fn:doc("/some/business/rules.xml")/path/to/date-and-duration

which requires loading the document, you can probably get that same data from 
your range indexes:

let $date as xs:date := cts:values($date-ref, (), "limit=1", 
$query-to-find-biz-rules)
let $duration as xs:string := cts:values($dur-ref, (), "limit=1", 
$query-to-find-biz-rules)

Sounds like it would be just one document, so it might not make a big 
difference, but it's one less that would need to be read from disk.

--
Dave Cassel
Developer Community Manager
MarkLogic Corporation
Cell:  +1-484-798-8720


From: , Andrew 
mailto:andrew_wanczow...@condenast.com>>
Date: Friday, October 17, 2014 at 1:38 PM
To: Dave Cassel mailto:dave.cas...@marklogic.com>>, 
John Snelson mailto:john.snel...@marklogic.com>>, 
"general@developer.marklogic.com" 
mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and 
Durations

Hi John and Dave,

In testing the provided example worked well:

xquery version "1.0-ml";

declare namespace html = "http://www.w3.org/1999/xhtml";;

let $results :=
  cts:search(
fn:doc(),
cts:element-range-query(xs:QName("publishedDate"), "<=", fn:current-date() 
- xs:dayTimeDuration("P90D"))
  )[1 to 10]

let $onSaleDates := (
  xs:date("2014-07-18"), (: July 18 published date :)
  xs:date("2014-07-19"), (: July 19 published date :)
  xs:date("2014-07-20")  (: July 20 published date :)
)

let $testCases :=
  for $onSaleDate in $onSaleDates
  return
  (
   $onSaleDate,
   ($onSaleDate <= fn:current-date() - xs:dayTimeDuration("P90D"))
  )

return $testCases


I am trying to solve a larger problem of filtering over larger datasets (5M+ 
documents)  where the durations are variables  that come from one set of 
documents (business rules) and filter search of other documents (articles). 
Basically it is to filter out content with various embargo durations.  The 
"business rules"only  state duration after published date not the actual  dates 
the embargo ends.

Would you say there is anything to watch out with from a performance 
standpoint? I have range indexes set up on all the fields that require 
calculations.

Thanks
Drew

From: Dave Cassel mailto:dave.cas...@marklogic.com>>
Date: Friday, October 17, 2014 12:09 PM
To: Andrew Wanczowski 
mailto:andrew_wanczow...@condenast.com>>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and 
Durations

Drew, I wasn't sure how familiar you are with durations, so in case John's 
answer didn't give you what you need --

cts:element-range-query(
  xs:QName("date"),
  ">",
  fn:current-date() - xs:dayTimeDuration("P90D")
)


--
Dave Cassel
Developer Community Manager
MarkLogic Corporation
Cell:  +1-484-798-8720


From: , Andrew 
mailto:andrew_wanczow...@condenast.com>>
Reply-To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Friday, October 17, 2014 at 6:42 AM
To: "general@developer.marklogic.com" 
mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and 
Durations

Thanks John. I'll give that a shot.

On 10/17/14 9:35 AM, "John Snelson" 
mailto:john.snel...@marklogic.com>> wrote:

Work out a dateTime 90 days before the current dateTime, and query for
articles with a published dateTime before that dateTime.

John

On 17/10/14 14:26, Wanczowski, Andrew wrote:
Hi All,

Is possible to do a dateTime range query based on the elements value
plus or minus a xs:dayTimeDuration. For example I want to find all
articles that are 90 days passed the published date.

The documentation
(http://docs.marklogic.com/cts:element-range-query)  gives a few
examples of date queriers but they all have a supplied dateTime.

Thanks
Drew

--
John Snelson, Lead Engineerhttp://twitter.com/jpcs
MarkLo

Re: [MarkLogic Dev General] Element Range Query with DateTime and Durations

2014-10-20 Thread Dave Cassel
Drew, if I follow correctly, you're looking at a 2-stage query, where your 
first stage finds the criteria and the second stage applies those criteria to 
the real search. Since you have range indexes set up on the relevant data, make 
sure you're using that to gather the criteria. In other words, rather than:

fn:doc("/some/business/rules.xml")/path/to/date-and-duration

which requires loading the document, you can probably get that same data from 
your range indexes:

let $date as xs:date := cts:values($date-ref, (), "limit=1", 
$query-to-find-biz-rules)
let $duration as xs:string := cts:values($dur-ref, (), "limit=1", 
$query-to-find-biz-rules)

Sounds like it would be just one document, so it might not make a big 
difference, but it's one less that would need to be read from disk.

--
Dave Cassel
Developer Community Manager
MarkLogic Corporation
Cell:  +1-484-798-8720


From: , Andrew 
mailto:andrew_wanczow...@condenast.com>>
Date: Friday, October 17, 2014 at 1:38 PM
To: Dave Cassel mailto:dave.cas...@marklogic.com>>, 
John Snelson mailto:john.snel...@marklogic.com>>, 
"general@developer.marklogic.com" 
mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and 
Durations

Hi John and Dave,

In testing the provided example worked well:

xquery version "1.0-ml";

declare namespace html = "http://www.w3.org/1999/xhtml";;

let $results :=
  cts:search(
fn:doc(),
cts:element-range-query(xs:QName("publishedDate"), "<=", fn:current-date() 
- xs:dayTimeDuration("P90D"))
  )[1 to 10]

let $onSaleDates := (
  xs:date("2014-07-18"), (: July 18 published date :)
  xs:date("2014-07-19"), (: July 19 published date :)
  xs:date("2014-07-20")  (: July 20 published date :)
)

let $testCases :=
  for $onSaleDate in $onSaleDates
  return
  (
   $onSaleDate,
   ($onSaleDate <= fn:current-date() - xs:dayTimeDuration("P90D"))
  )

return $testCases


I am trying to solve a larger problem of filtering over larger datasets (5M+ 
documents)  where the durations are variables  that come from one set of 
documents (business rules) and filter search of other documents (articles). 
Basically it is to filter out content with various embargo durations.  The 
"business rules"only  state duration after published date not the actual  dates 
the embargo ends.

Would you say there is anything to watch out with from a performance 
standpoint? I have range indexes set up on all the fields that require 
calculations.

Thanks
Drew

From: Dave Cassel mailto:dave.cas...@marklogic.com>>
Date: Friday, October 17, 2014 12:09 PM
To: Andrew Wanczowski 
mailto:andrew_wanczow...@condenast.com>>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and 
Durations

Drew, I wasn't sure how familiar you are with durations, so in case John's 
answer didn't give you what you need --

cts:element-range-query(
  xs:QName("date"),
  ">",
  fn:current-date() - xs:dayTimeDuration("P90D")
)


--
Dave Cassel
Developer Community Manager
MarkLogic Corporation
Cell:  +1-484-798-8720


From: , Andrew 
mailto:andrew_wanczow...@condenast.com>>
Reply-To: MarkLogic Developer Discussion 
mailto:general@developer.marklogic.com>>
Date: Friday, October 17, 2014 at 6:42 AM
To: "general@developer.marklogic.com" 
mailto:general@developer.marklogic.com>>
Subject: Re: [MarkLogic Dev General] Element Range Query with DateTime and 
Durations

Thanks John. I'll give that a shot.

On 10/17/14 9:35 AM, "John Snelson" 
mailto:john.snel...@marklogic.com>> wrote:

Work out a dateTime 90 days before the current dateTime, and query for
articles with a published dateTime before that dateTime.

John

On 17/10/14 14:26, Wanczowski, Andrew wrote:
Hi All,

Is possible to do a dateTime range query based on the elements value
plus or minus a xs:dayTimeDuration. For example I want to find all
articles that are 90 days passed the published date.

The documentation
(http://docs.marklogic.com/cts:element-range-query)  gives a few
examples of date queriers but they all have a supplied dateTime.

Thanks
Drew

--
John Snelson, Lead Engineerhttp://twitter.com/jpcs
MarkLogic Corporation http://www.marklogic.com
___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general

___
General mailing list
General@developer.marklogic.com
http://developer.marklogic.com/mailman/listinfo/general


Re: [MarkLogic Dev General] using xdmp:eval in REST Service

2014-10-20 Thread Kapoor, Pragya
Hi,


I tried logging at rewritter level also:


  case element(POST)
return
let $urlParams := let $paramNames:= 
xdmp:get-request-field-names()
  for $eachParam in 
$paramNames
  let $_ := 
xdmp:log(fn:concat("$eachParam",$eachParam))
  let $paramValue := 
xdmp:get-request-field($eachParam)
  let $_ := 
xdmp:log(fn:concat("$paramValue",$paramValue))
  return 
concat($eachParam,'=',$paramValue)​

logs:


2014-10-20 13:12:51.446 Info: rest-ingestion: $eachParamtransId
2014-10-20 13:12:51.446 Info: rest-ingestion: 
$paramValue39932186-9cab-44e9-8f4f-7ebf45dabf8f
2014-10-20 13:12:51.446 Info: rest-ingestion: 
module-invoker.xqy?format=xml&moduleName=add-document&requestDirectory=add&apiVersion=v1
2014-10-20 13:12:51.446 Info: rest-ingestion: $param-value  v1
2014-10-20 13:12:51.446 Info: rest-ingestion: $param-value  
39932186-9cab-44e9-8f4f-7ebf45dabf8f
2014-10-20 13:12:51.758 Info: rest-ingestion: Ingestion Service End Point 
add-document.xqy  39932186-9cab-44e9-8f4f-7ebf45dabf8f
2014-10-20 13:12:51.758 Info: rest-ingestion: 
Path/processing/39932186-9cab-44e9-8f4f-7ebf45dabf8f/validDocs/
2014-10-20 13:12:51.758 Info: rest-ingestion: 
uri/docs/BONY_4076_REPO-GMR-2000.xml

At no where I found the xml.

Moreover, The xml which I get is after calling the web service inside 
add-document.xqy where I am retrieving it from different DB which is getting 
serailzed as text. The response code in rewritter is also set as:

 (
  
xdmp:invoke(fn:concat('rest-apis/',$request-directory,'/',$module-name,'.xqy'),$param-map)

  ,
  xdmp:set-response-content-type('application/xml')

)


I am calling the webservive with one parameter in POST request and that is with 
transId variable.

So, I am unable to find the exact issue and where to use xdmp:unquote when 
there is no xml

Please help me on this.


Thanks

Pragya


From: general-boun...@developer.marklogic.com 
 on behalf of Kapoor, Pragya 

Sent: Friday, October 17, 2014 11:20 AM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] using xdmp:eval in REST Service


Hi David,


I tried logging in invoker:

let $_populate-map := for $each-param in $urlParams
let $param-value := xdmp:get-request-field($each-param)
 let $_ := 
xdmp:log(fn:concat("$param-value",$param-value))
...
logs:

2014-10-17 11:16:29.008 Info: rest-ingestion: 
module-invoker.xqy?format=xml&moduleName=add-document&requestDirectory=add&apiVersion=v1
2014-10-17 11:16:29.013 Info: rest-ingestion: $param-valuexml
2014-10-17 11:16:29.013 Info: rest-ingestion: $param-valueadd-document
2014-10-17 11:16:29.013 Info: rest-ingestion: $param-valueadd
2014-10-17 11:16:29.013 Info: rest-ingestion: $param-valuev1
2014-10-17 11:16:29.013 Info: rest-ingestion: 
$param-value39932186-9cab-44e9-8f4f-7ebf45dabf8f
2014-10-17 11:16:29.309 Info: rest-ingestion: Ingestion Service End Point 
add-document.xqy  39932186-9cab-44e9-8f4f-7ebf45dabf8f
2014-10-17 11:16:29.309 Info: rest-ingestion: 
Path/processing/39932186-9cab-44e9-8f4f-7ebf45dabf8f/validDocs/
2014-10-17 11:16:29.309 Info: rest-ingestion: 
uri/docs/BONY_4076_REPO-GMR-2000.xml

So its xml at this stage.


At rewriter:


 case element(POST)
return
let $urlParams := let $paramNames:= 
xdmp:get-request-field-names()
  for $eachParam in 
$paramNames
  let $_ := 
xdmp:log(fn:concat("$eachParam",$eachParam))
  let $paramValue := 
xdmp:get-request-field($eachParam)
  let $_ := 
xdmp:log(fn:concat("$paramValue",$paramValue))
  return 
concat($eachParam,'=',$paramValue)​

logs:



From: general-boun...@developer.marklogic.com 
 on behalf of David Ennis 

Sent: Thursday, October 16, 2014 5:00 PM
To: MarkLogic Developer Discussion
Subject: Re: [MarkLogic Dev General] using xdmp:eval in REST Service

HI.

Yes, I had a look.  The first reason I thought of the xdmp:unquote was because 
of the multipart form post. We have a system where we get XML packed inside of 
json in the body of a post and use xdmp:unquote and this seemed similiar.

I concluded that this might have been similiar looking at your code

rewriter:
...
 case element(POST)
return