Re: [MarkLogic Dev General] Merge Policy / Large Delete
Right. I knew I could force a merge. Data is constantly being added to this database. I expected it to merge automatically at some point. I rarely if ever do an update. Almost everything is insertion of new data. On Tue, Jun 25, 2013 at 12:02 PM, Damon Feldman wrote: > Alex, > > ** ** > > Merges are only triggered by document updates, so when you start adding or > changing data the system will check merge policies and decide if it should > merge or not. In the meantime you can force a merge for the entire database > or particular forests on the admin GUI. > > ** ** > > Yours, > > Damon > > ** ** > > -- > > Damon Feldman > > Sr. Principal Consultant, MarkLogic > > ** ** > > *From:* general-boun...@developer.marklogic.com [mailto: > general-boun...@developer.marklogic.com] *On Behalf Of *Alex Milowski > *Sent:* Tuesday, June 25, 2013 2:16 PM > *To:* General Mark Logic Developer Discussion > *Subject:* [MarkLogic Dev General] Merge Policy / Large Delete > > ** ** > > After deleting a large amount of content, I have large forests (100+GB) on > disk with large amounts deleted fragments (60+ GB). I didn't notice any > merging going on and I expected this to clear itself up eventually. > > ** ** > > After waiting for quite awhile (many days), it didn't do anything by > itself. > > ** ** > > The merge policy for the database is: > > ** ** > >merge priority = lower, max size = 0, min size = 1024, min ration = 2, > timestamp = 0 > > ** ** > > The database has three forests that are roughly balanced. As such, there > is 180GB of deleted fragments. > > ** ** > > I've gone in a manually requested the forests to merge, doing them one at > a time. > > ** ** > > Is there a reason why the merge didn't happen all by itself? > > > ** ** > > -- > --Alex Milowski > "The excellence of grammar as a guide is proportional to the paucity of the > inflexions, i.e. to the degree of analysis effected by the language > considered." > > Bertrand Russell in a footnote of Principles of Mathematics > > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > > -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] Merge Policy / Large Delete
After deleting a large amount of content, I have large forests (100+GB) on disk with large amounts deleted fragments (60+ GB). I didn't notice any merging going on and I expected this to clear itself up eventually. After waiting for quite awhile (many days), it didn't do anything by itself. The merge policy for the database is: merge priority = lower, max size = 0, min size = 1024, min ration = 2, timestamp = 0 The database has three forests that are roughly balanced. As such, there is 180GB of deleted fragments. I've gone in a manually requested the forests to merge, doing them one at a time. Is there a reason why the merge didn't happen all by itself? -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Simple Math Issue
Thanks. Yeah, it turns out that's what I've done in the past. This issue seems to keep catching me as I expect fixed point arithmetic and a lot of my latitude/longitude calculations end up being exact (e.g. 180 div 2.5). Why not use a fixed point library rather than convert to xs:double ? I don't expect xs:decimal to be efficient--just exact. On Wed, Mar 20, 2013 at 4:03 PM, Danny Sokolsky < danny.sokol...@marklogic.com> wrote: > It does seem odd—I filed a bug on the MarkLogic side and we’ll take a > look. > > ** ** > > In the mean time, as David pointed out, casting it to a double seems to > work around it. > > ** ** > > -Danny > > ** ** > > *From:* general-boun...@developer.marklogic.com [mailto: > general-boun...@developer.marklogic.com] *On Behalf Of *Alex Milowski > *Sent:* Wednesday, March 20, 2013 9:53 AM > > *To:* MarkLogic Developer Discussion > *Subject:* Re: [MarkLogic Dev General] Simple Math Issue > > ** ** > > > Testing other XQuery processors (e.g. Saxon 9.4 via XProc): > > ** ** > > > http://www.w3.org/ns/xproc"; >xmlns:c="http://www.w3.org/ns/xproc-step"; version="1.0"> > > > > > > > > > > > >element result { 180 div 2.5, 5 div 2.5 } > > > > > > > Output: 72 2 > > MarkLogic: 71.99601 1.999889 > > ** ** > > ** ** > > -- > --Alex Milowski > "The excellence of grammar as a guide is proportional to the paucity of the > inflexions, i.e. to the degree of analysis effected by the language > considered." > > Bertrand Russell in a footnote of Principles of Mathematics > > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > > -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Simple Math Issue
Testing other XQuery processors (e.g. Saxon 9.4 via XProc): http://www.w3.org/ns/xproc"; xmlns:c="http://www.w3.org/ns/xproc-step"; version="1.0"> element result { 180 div 2.5, 5 div 2.5 } Output: 72 2 MarkLogic: 71.99601 1.9999999889 -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Simple Math Issue
On Wed, Mar 20, 2013 at 8:57 AM, David Lee wrote: > > A simple case to demonstrate this is an even less prety > > ** ** > > 5 div 2.5 => 1.999889 > > ** ** > > xs:double( 5 div 2.5 ) => 2 > > ** > I'm just going to register my mathematical revulsion at that answer. :( -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Simple Math Issue
It still feels like a bug in the underlying decimal implementation. I'm having a hard time imagining a world where "180 / 2.5" is a hard computation to get right. The number of digits of precision has to be greater than 3 (or 4). Appendix E of the XPath F&O lists the number of digits of precision for xs:decimal as implementation defined. As both operands are xs:decimal and so there is no type promotion to xs:double. As such, I expect the following computation to be performed: 180 / 2.5 = 1800 / 25 = 72 [2]. [1] http://www.w3.org/TR/xpath-functions/ [2] http://en.wikipedia.org/wiki/Fixed_point_arithmetic On Wed, Mar 20, 2013 at 7:36 AM, David Lee wrote: > The difference here is that 2.5 is a xs:decimal not an xs:float or > xs:double according to XQuery > http://www.w3.org/TR/xquery/#prod-xquery-DecimalLiteral > > > Decimal types are different precision from float or double and are not > floating point. > Neither of which is infinite precision. > So math sometimes doesn't produce an exact mathematically correct result. > > > Yes I know that doesn't make the answer any more pleasant > > > - > David Lee > Lead Engineer > MarkLogic Corporation > d...@marklogic.com > Phone: +1 812-482-5224 > Cell: +1 812-630-7622 > www.marklogic.com > > > -Original Message- > From: general-boun...@developer.marklogic.com [mailto: > general-boun...@developer.marklogic.com] On Behalf Of David Sewell > Sent: Wednesday, March 20, 2013 9:36 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Simple Math Issue > > I get the same results as Alex running version 6.0-2.1 under Linux. > > On Wed, 20 Mar 2013, Geert Josten wrote: > > > Which version of MarkLogic are you running exactly? I am running 6.0-1.1 > on > > win7 and get (72, 72, 72) as response.. > > > > > > > > Kind regards, > > > > Geert > > > > > > > > *Van:* general-boun...@developer.marklogic.com [mailto: > > general-boun...@developer.marklogic.com] *Namens *Alex Milowski > > *Verzonden:* dinsdag 19 maart 2013 23:09 > > *Aan:* General Mark Logic Developer Discussion > > *Onderwerp:* [MarkLogic Dev General] Simple Math Issue > > > > > > > > In my Mathematical mind, the following expression: > > > > > > > > (180 div 2.5, 180 div xs:float(2.5), 180 div xs:double(2.5)) > > > > > > > > should yeild: > > > > > > > > (72, 72, 72) > > > > > > > > as the correct answer, mathematically speaking, is 72. > > > > > > > > MarkLogic 6 seems to think it is: > > > > > > > > (71.99601, 72, 72) > > > > > > > > which leads me to believe this is a xs:decimal issue. > > > > > > > > > > > > -- > David Sewell, Editorial and Technical Manager > ROTUNDA, The University of Virginia Press > PO Box 400314, Charlottesville, VA 22904-4314 USA > Email: dsew...@virginia.edu Tel: +1 434 924 9973 > Web: http://rotunda.upress.virginia.edu/ > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Simple Math Issue
Version 6.0-2.1 on Linux (Amazon, EC2, 2012.09) On Tue, Mar 19, 2013 at 11:19 PM, Geert Josten wrote: > Which version of MarkLogic are you running exactly? I am running 6.0-1.1 > on win7 and get (72, 72, 72) as response.. > > > > Kind regards, > > Geert > > > > *Van:* general-boun...@developer.marklogic.com [mailto: > general-boun...@developer.marklogic.com] *Namens *Alex Milowski > *Verzonden:* dinsdag 19 maart 2013 23:09 > *Aan:* General Mark Logic Developer Discussion > *Onderwerp:* [MarkLogic Dev General] Simple Math Issue > > > > In my Mathematical mind, the following expression: > > > > (180 div 2.5, 180 div xs:float(2.5), 180 div xs:double(2.5)) > > > > should yeild: > > > > (72, 72, 72) > > > > as the correct answer, mathematically speaking, is 72. > > > > MarkLogic 6 seems to think it is: > > > > (71.99601, 72, 72) > > > > which leads me to believe this is a xs:decimal issue. > > > > > -- > --Alex Milowski > "The excellence of grammar as a guide is proportional to the paucity of the > inflexions, i.e. to the degree of analysis effected by the language > considered." > > Bertrand Russell in a footnote of Principles of Mathematics > > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > > -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] cts:element-attribute-pair-geospatial-boxes and latitude boundaries
I was just wondering about what happens on the boundary. This should really be noted on the documentation for cts:element-attribute-pair-geospatial-boxes() and possibly elsewhere. If northern boundaries are open, then the result makes sense. I know there is bad data in the database and that must map to the pole. As such, the quadrangle query using * cts:element-attribute-pair-geospatial-query*() doesn't specify boundaries-north-excluded and I get those 26 data points back. Of course, what I want is for "92 latitude" to never come back for the quadrangle by an option on the query and not by changing the index. Thanks. On Tue, Mar 19, 2013 at 3:37 PM, Mary Holstege wrote: > On Tue, 19 Mar 2013 15:24:01 -0700, Alex Milowski > wrote: > > > > > Why is -90 latitude treated differently from 90 latitude? > > > > I don't have time to look at this in detail right now, but there > is an important asymmetry in these bounds: > lower bounds are closed, upper bounds are open. > > If this were a regular range index, you would also get > results for everything less than the lowest bound and > everything greater than or equal to the upper bound. > That's true for geo as well, but harder to wrap your > brain around. > > Poles are special case code everywhere, so it is > possible there is an improperly handled edge case > here. (If I can talk about "edge" in the context of > a surface with no edges...) > > //Mary > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] cts:element-attribute-pair-geospatial-boxes and latitude boundaries
I've notice zero area boxes coming back from cts:element-attribute-pair-geospatial-boxes where N/S=90 when I specify 90 latitude as one of the boundaries. Certainly, (-90,0,90) is equivalent to just (0) for latitude boundaries. Somehow, cts:element-attribute-pair-geospatial-boxes() seems to figure it out for -90 but not for 90. For example, consider: declare namespace s="http://weather.milowski.com/V/APRS/";; let $dtstart := xs:dateTime("2013-03-19T18:30:00Z"), $dtend := xs:dateTime("2013-03-19T19:00:00Z"), $boxes := cts:element-attribute-pair-geospatial-boxes( xs:QName("s:report"), QName("","latitude"), QName("","longitude"),(-90,0,90),(0,180), ("item-frequency","gridded","descending","empties"), cts:and-query( (cts:element-attribute-range-query(xs:QName("s:report"), QName("","received"),">=",$dtstart), cts:element-attribute-range-query(xs:QName("s:report"), QName("","received"),"<=",$dtend)) ) ) for $i in $boxes return element box { attribute count { cts:frequency($i) } , attribute s { cts:box-south($i) }, attribute w { cts:box-west($i) }, attribute n { cts:box-north($i) }, attribute e { cts:box-east($i) } } This returns the counts for the boxes based on a specific time period. The output is: If I change to (0) for latitude bounds, I get: where 5109 = 5083 + 26. Why is -90 latitude treated differently from 90 latitude? -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] Simple Math Issue
In my Mathematical mind, the following expression: (180 div 2.5, 180 div xs:float(2.5), 180 div xs:double(2.5)) should yeild: (72, 72, 72) as the correct answer, mathematically speaking, is 72. MarkLogic 6 seems to think it is: (71.99601, 72, 72) which leads me to believe this is a xs:decimal issue. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Invalid Locations and Geospatial Queries
On Sat, Jul 7, 2012 at 8:38 AM, James Fuller wrote: > On Sat, Jul 7, 2012 at 4:57 PM, Alex Milowski wrote: >> I've been told that invalid locations are mapped to the poles in >> MarkLogic. For example, when I query on the quadrangle [(-75,0) , >> (-90,15)], I get the following stations: >> >> NZSP(-90,0) >> VE3CGR-4(-2147483648,-2147483648) >> W2PE-3 (-2147483648,-2147483648) >> W8FY-4 (-2147483648,-2147483648) >> WA9KCU (-2147483648,-2147483648) >> WB5AOH-2(-2147483648,-2147483648) >> >> I data I get from CWOP sometimes has flaws and I don't scrub the data. >> I'd prefer to retain the data asis and for stations that send >> nonsense positions, I don't expect them to show up in quadrangle >> queries. That way the stations data will always be available by ID >> but they won't show up on a map at the poles where the certainly >> aren't located. >> >> Is there someway to turn this "feature" off? > > if you check where you setup geospatial index (in admin ui), > > depending on what version of ML u are using there should be an option > for rejecting/ignoring values ... reject means on ingestion that > document will not be ingested, ignore is the scenario I believe you > want. In ML 5.0-3, for geospatial attribute pairs, besides the configuration of parent element and attribute names, all I see is an option for "range value positions". Is there some setting elsewhere? -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] Invalid Locations and Geospatial Queries
I've been told that invalid locations are mapped to the poles in MarkLogic. For example, when I query on the quadrangle [(-75,0) , (-90,15)], I get the following stations: NZSP(-90,0) VE3CGR-4(-2147483648,-2147483648) W2PE-3 (-2147483648,-2147483648) W8FY-4 (-2147483648,-2147483648) WA9KCU (-2147483648,-2147483648) WB5AOH-2(-2147483648,-2147483648) I data I get from CWOP sometimes has flaws and I don't scrub the data. I'd prefer to retain the data asis and for stations that send nonsense positions, I don't expect them to show up in quadrangle queries. That way the stations data will always be available by ID but they won't show up on a map at the poles where the certainly aren't located. Is there someway to turn this "feature" off? -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] cts:search / geospatial queries - combining criteria
On Sat, Jun 30, 2012 at 9:05 PM, Alex Milowski wrote: > On Sat, Jun 30, 2012 at 2:25 PM, Michael Blakeley wrote: >> If you are familiar with RDBMS join terminology, the idea behind the change >> I suggested was to move from a nested-loop join to an index-based value join. >> After reducing the amount of my data I have in my database, I was able to test different queries with and without an additional geospatial index. I have millions of tiny weather report documents, each associated with a station specific collection and one massive "all weather" collection. I also have a collection that stores basic metadata about each station (around ~10,000 documents) including their position. I now have two geospatial indices--one of the stations and one on the weather reports and this seems to perform optimally for the kinds of summary reports I need to produce. Specifically, these are: * summary of number of stations per quadrangle (hits the station geospatial index) * summary of number of weather reports per quadrangle per time period (hits the weather report index) The second summary report can be done without the weather report geospatial index but it can't be done sufficiently well. That's do to the nature of the report in that each quadrangle's count must be calculated separately and I need all of them. As the quadrangle size decreases (e.g. 2.5 degrees), the number of separate counts increases (e.g. 10,368). So, that query time gets multiplied by 10,368 and so small amounts of cost get magnified. So, this wins: xdmp:estimate(cts:search( collection("http://.../weather/";)/s:report, cts:and-query( (cts:element-attribute-range-query(xs:QName("s:report"), QName("","received"),">=",$dtstart), cts:element-attribute-range-query(xs:QName("s:report"), QName("","received"),"<=",$dtend), cts:element-attribute-pair-geospatial-query(xs:QName("s:report"), QName("","latitude"), QName("","longitude"), $quad) ) ))) over this: sum(for $s in cts:search( collection("http://.../stations/";), cts:element-attribute-pair-geospatial-query(xs:QName("s:station"), QName("","latitude"), QName("","longitude"), $quad) ) return xdmp:estimate(collection(concat("http://.../weather/",$s/s:station/@id))/s:report[@received>=$dtstart and @received<=$dtend])) (or several other variants using cts:something-or-other) and I don't see how to turn the latter into an index-based join. In the end, I need both indices. Go figure, more indices equals better performance. The novel bit is that the cost is only marginal in terms of disk space and in-memory size of stands. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
On Thu, Jul 5, 2012 at 2:31 PM, Will Thompson wrote: > Alex, see Mike's post from below. > Yes, OK. I get it now. That's a bit obscure but it works quite well (and fast). I actually tested this all and re-created a single directory just to see how the documents get reassigned by URI subsumption. I now makes a lot more sense. Thanks! -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
Once directories are created, I don't suppose there is anyway to get rid of them without getting rid of the documents? xdmp:directory-delete() removes the contained document (as you might expect). I structured everything based on collection membership and not directories. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
On Thu, Jul 5, 2012 at 12:20 PM, Will Thompson wrote: > To my knowledge, there is no advantage other than for WebDav. And you can > still search URIs as if they are directories even if there are no directory > fragments. In that area of the database admin configuration page, there are a number of settings, if I set directory creation to manual, what else should I change? It looks to me like without directories being created automatically, I can safely ignore most of these settings. I currently have: maintain last modified : true maintain directory last modified: false inherit permissions: true inherit collections: true inherit quality: true -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
On Thu, Jul 5, 2012 at 12:09 PM, Ryan Dew wrote: > So why is the example from the documentation so bad? The expression is fully > searchable and will only return directory properties, so having millions of > documents shouldn't matter. The only count that should really matter are the > number of directories. Then if you really wanted to you could paginate over > the results using fn:subsequence. > > xdmp:plan(xdmp:document-properties()/prop:properties/prop:directory) > > and this might be slightly more efficient: > > xdmp:plan(xdmp:document-properties()/prop:properties[prop:directory]) Never mind. They're both working now. I don't know why it timed out before. Probably a merge operation was in progress. I'm trying to calculate the actual memory needed for the data set by loading about 3 weeks of data. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
Ah! Yes I do. That number of directories looks suspiciously close to the number of stations I have stored in my database. That would probably account for it. I'll never use WebDAV on this database give the large number of small documents, so I can turn that off. Is there any other reason why I would want that on? On Thu, Jul 5, 2012 at 11:49 AM, Will Thompson wrote: > Do you have directory creation set to automatic in your database settings? > > -Will > > -Original Message- > From: general-boun...@developer.marklogic.com > [mailto:general-boun...@developer.marklogic.com] On Behalf Of Alex Milowski > Sent: Thursday, July 05, 2012 11:48 AM > To: MarkLogic Developer Discussion > Subject: Re: [MarkLogic Dev General] Listing Directories? > > On Thu, Jul 5, 2012 at 11:37 AM, Danny Sokolsky > wrote: >> You don't need the URI lexicon for the xdmp:estimate one. > > OK. Good to know. > > What I really want to know is the name and purpose of these directories since > I don't actually create them explicitly. > > -- > --Alex Milowski > "The excellence of grammar as a guide is proportional to the paucity of the > inflexions, i.e. to the degree of analysis effected by the language > considered." > > Bertrand Russell in a footnote of Principles of Mathematics > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
It times out on 23 million documents for my current server configuration. I've reduced my database size to a more reasonable size but I still need a few more GB of real memory. As such, I suspect that query induces paging and slows everything down. On Thu, Jul 5, 2012 at 12:09 PM, Ryan Dew wrote: > So why is the example from the documentation so bad? The expression is fully > searchable and will only return directory properties, so having millions of > documents shouldn't matter. The only count that should really matter are the > number of directories. Then if you really wanted to you could paginate over > the results using fn:subsequence. > > xdmp:plan(xdmp:document-properties()/prop:properties/prop:directory) > > and this might be slightly more efficient: > > xdmp:plan(xdmp:document-properties()/prop:properties[prop:directory]) > > > On Thu, Jul 5, 2012 at 12:49 PM, Will Thompson > wrote: >> >> Do you have directory creation set to automatic in your database settings? >> >> -Will >> >> -Original Message- >> From: general-boun...@developer.marklogic.com >> [mailto:general-boun...@developer.marklogic.com] On Behalf Of Alex Milowski >> Sent: Thursday, July 05, 2012 11:48 AM >> To: MarkLogic Developer Discussion >> Subject: Re: [MarkLogic Dev General] Listing Directories? >> >> On Thu, Jul 5, 2012 at 11:37 AM, Danny Sokolsky >> wrote: >> > You don't need the URI lexicon for the xdmp:estimate one. >> >> OK. Good to know. >> >> What I really want to know is the name and purpose of these directories >> since I don't actually create them explicitly. >> >> -- >> --Alex Milowski >> "The excellence of grammar as a guide is proportional to the paucity of >> the inflexions, i.e. to the degree of analysis effected by the language >> considered." >> >> Bertrand Russell in a footnote of Principles of Mathematics >> ___ >> General mailing list >> General@developer.marklogic.com >> http://developer.marklogic.com/mailman/listinfo/general >> ___ >> General mailing list >> General@developer.marklogic.com >> http://developer.marklogic.com/mailman/listinfo/general > > > > ___ > General mailing list > General@developer.marklogic.com > http://developer.marklogic.com/mailman/listinfo/general > -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
On Thu, Jul 5, 2012 at 11:37 AM, Danny Sokolsky wrote: > You don't need the URI lexicon for the xdmp:estimate one. OK. Good to know. What I really want to know is the name and purpose of these directories since I don't actually create them explicitly. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Listing Directories?
Hmm... no URI lexicon so neither will work. I've been reluctant to turn that on because of the large number of documents. I never access the content by URI. On Thu, Jul 5, 2012 at 11:32 AM, Ryan Dew wrote: > You could try something like: > > cts:uris((),"properties", cts:properties-query( > > cts:element-query(xs:QName("prop:directory"),cts:and-query(())) > )) > -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] Listing Directories?
I'm curious as to how certain directories get created in my database. I have: 23,135,073 documents 9,916 directories I don't explicitly create directories in any of my import pipelines. If I could list them (or some sampling of them), I might have a better idea of where they are coming from. In the documentation, there is an example: for $x in xdmp:document-properties()/prop:properties/prop:directory return {xdmp:node-uri($x)} but that is a terrible idea if you have millions of documents. Is there a way to get the directories directly somehow? -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] cts:search / geospatial queries - combining criteria
On Sat, Jun 30, 2012 at 2:25 PM, Michael Blakeley wrote: > If you are familiar with RDBMS join terminology, the idea behind the change I > suggested was to move from a nested-loop join to an index-based value join. > OK. The index-based value join that I really want to do is between the id of the weather station located by the geospatial index and the id of the weather report. I have indices for both. I don't quite know how to specify that join using the query constructs of MarkLogic. The previous version of this did something similar but it use the geospatial index on the weather reports. It was recommended to me that I avoid having such a large geospatial index but it hasn't paid off so far. That is, the only geospatial index index-based join between the weather reports that matched a geospatial region and the time periods I'm interested is outperforming the newer query. I'm not that surprised given that I've left the cts:search realm to use regular predicates in XQuery to match by id and then by time period. Keep in mind that both the id-based comparison and time period predicates use the indicies. I've looked at the query plans. They just aren't doing it all together as a single cts:search might do. ...and then I do the above 10368 times for separate geospatial regions (quadrangles of 2.5 degrees each). -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://developer.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] segmentation fault
While beating on my database, I succeeded in crashing the database with a segmentation fault on one of the threads. The trace is rather large. Is there some place I should send the error log? -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] log file inconsistency with disk operations
As my server is overloaded and taking far too long to reindex, I've been tracking its progress via the log files. I've noticed new stands being written to disk but with no mention of them in the log file. After a very long time, I'll see a message like: 2012-06-28 09:58:57.214 Info: Saving /var/opt/MarkLogic/Forests/weather/b6ac 2012-06-28 09:59:23.060 Info: Saved 135 MB in 26 sec at 5 MB/sec to /var/opt/MarkLogic/Forests/weather/b6ac but those arrive more than an hour later. I would expect a log file entry when the stand (or whatever) is being initiated and not for the last few seconds of whatever operation is in play. I suspect this situation may only be detectable when your server is overloaded in some way. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] cts:search / geospatial queries - combining criteria
On Wed, Jun 27, 2012 at 10:21 AM, Michael Blakeley wrote: > I'm not sure that was a good change. Sometimes more documents are better - > and MarkLogic can handle billions, albeit with the right hardware. > > Anyway, look into fetching the station ids using cts:element-attribute-values > instead of cts:search. You might also be able to move the received date-range > constraints into a cts:query term on that call, which would allow you to use > cts:frequency instead of xdmp:estimate. > I'm not sure it was a great move but having to index millions of weather reports just to get one report working better doesn't feel like a great solution either. Just as another example, this one performs reasonably well for one quadrangle: for $s in cts:search( collection("http://.../stations/";), cts:element-attribute-pair-geospatial-query(xs:QName("s:station"), QName("","latitude"), QName("","longitude"), $quad) ) order by $s/s:station/@id return if (not(collection(concat("http://.../weather/",$s/s:station/@id))/s:report[@received>$dtstart and @received<$dtend])[1]) then () else let $id := string($s/s:station/@id) ... even with the order by and testing for no reports. I've also looks at the query plans for some of these. There is a index on @received and so the expression [@received>$dtstart and @received<$dtend] hits the index quite nicely. It only becomes "faster" if you can combine it within one query (via an and) as I had originally. There's a twist here in that the report I'm having trouble with is for all quadrangles where the above example is only for one quadrangles. That is, in my original question, that query runs many times (10368 times for 2.5° quadrangles). In relational terms, it is a join that I'm performing multiple times because I'm grouping by quadrangle. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] cts:search / geospatial queries - combining criteria
I've been restructuring my indices for good (or bad) for my weather database and I've go a basic question about optimally using a geospatial index. In summary, I have reports for each weather station stored in their own collection by id and each report has a location (latitude/longitude attribute pair). Previously, I had a index on that pair but I've dropped that in favor of a summary document for each station which tracks its current position. The rational is that there are roughly 10,000 stations and there are 10's of millions of reports (eventually to be 100's of millions). So, an index on a smaller set of document should be faster/better/etc., right? One difficulty is that I need to produce a report based on quadrangles (think rectangles mapped on the surface of the earth). For each quadrangle, I want to count the number of reports contained in the database for a certain period of time (e.g. the last 30 minutes). The core of this query used to be: xdmp:estimate( cts:search( collection("http://.../weather/";)/s:report, cts:and-query( (cts:element-attribute-range-query(xs:QName("s:report"), QName("","received"),">=",$dtstart), cts:element-attribute-range-query(xs:QName("s:report"), QName("","received"),"<=",$dtend), cts:element-attribute-pair-geospatial-query(xs:QName("s:report"), QName("","latitude"), QName("","longitude"), $quad) ) ))) but now, with the index change, it becomes: sum( for $s in cts:search( collection("http://.../stations/";), cts:element-attribute-pair-geospatial-query(xs:QName("s:station"), QName("","latitude"), QName("","longitude"), $quad) ) return xdmp:estimate(collection(concat("http://.../weather/",$s/s:station/@id))/s:report[@received>=$dtstart and @received<=$dtend])) which actually performs worse--probably due to the for loop and sum bit. Is there some way to combine this into one cts:search() statement where I get the relevant set of ids from the stations collections via the geospatial index and then the reports by id, received time, or collection ? Keep in mind that every report belongs to its station's collection of weather reports as well as a database-wide collection of weather reports. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Database Status Troubles
On Tue, Jun 26, 2012 at 2:01 PM, Michael Blakeley wrote: > On 26 Jun 2012, at 13:52 , Alex Milowski wrote: > >> On Tue, Jun 26, 2012 at 1:40 PM, Michael Blakeley wrote: >>> Are you only looking at the forests summary? The forest status screen >>> should show full reindexing status, along with other details. Is that >>> "reindexing" text a link? If so, click on it. >>> >>> If that doesn't help, maybe take a screenshot of the forest status you are >>> looking at? >> >> The forest status page doesn't return. The browser just hangs out >> until something times out. > > That is decidedly abnormal. I have seen a few systems where database-status > was slow and would time out during reindexing, but forest-status should not. > Admittedly I don't do much with geospatial, so this might be a > geospatial-specific bug. Here's what I don't get. I dropped a large index. Shouldn't that be an easy operation? Or is it the index I added that is causing the problem? It should only hit a very small number of documents/elements. Is it still required it to touch everything? > > But my hunch is that the OS is overloaded, so I would start by checking the > OS status: top and iostat, on linux or Solaris. The OS may be out of RAM and > swapping, or possibly reindexing puts too much demand on the available disk > I/O capacity. Either way, I suspect that your database has outgrown the > hardware. Yes, that is certainly true. I don't have enough memory. Funny bit is that the application works reasonable well for my research purposes. For production, it would need far more memory to support a load. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Database Status Troubles
On Tue, Jun 26, 2012 at 1:40 PM, Michael Blakeley wrote: > Are you only looking at the forests summary? The forest status screen should > show full reindexing status, along with other details. Is that "reindexing" > text a link? If so, click on it. > > If that doesn't help, maybe take a screenshot of the forest status you are > looking at? The forest status page doesn't return. The browser just hangs out until something times out. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Database Status Troubles
On Tue, Jun 26, 2012 at 1:23 PM, Michael Blakeley wrote: > Try forest-status instead. All the forests will look about the same. Yes. That is where I see it is "open (reindexing/refragmenting)" but I can't get any more information. > > Which version of the server is this? 5.0-3 -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
[MarkLogic Dev General] Database Status Troubles
I went in to my rather large database and dropped a geospatial attribute pair index that I don't need. That index was on probably 99% of the content in the database. I also added a geospatial attribute pair index for a very small set of content (about 10,000 small documents). So, dropped an index on millions of documents and added an index on around 10,000 documents. It looks like I'm in the process of a huge reindex now. I can't tell be cause the database status page never returns. When it did, it gave me the following error. Keep in mind that I rarely get any forest/merge information on this database because the status page, after a long wait, regularly gives me: "There is currently an XDMP-OLDSTAMP: Timestamp too old for forest weather exception. Information on this page may be missing." I know it will eventually settle down but I can't really tell what is going on. Shouldn't there be something in the logs about re-indexing? Is there some other way to get status information? 500: Internal Server Error XDMP-EXTIME: let $forests as xs:unsignedLong* := xdmp:database-forests($dbnode/db:database-id, fn:true()) -- Time limit exceeded In /lib/database-status-form.xqy on line 267 In displayForm() $dbnode = http://marklogic.com/xdmp/database"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>15944900662090440733we... $forests = http://marklogic.com/xdmp/database"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>11780037656355209513 In /lib/database-status-form.xqy on line 1983 In databaseStatusPage(http://marklogic.com/xdmp/database"; xsi:schemaLocation="http://www.w3.org/2001/XMLSchema XMLSchema.xsd http://marklogic"; attributeFormDefault="unqualified" elementFormDefault="unqualified" xmlns:xhtml="http://www.w3.org/1999/xhtml"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:xs="http://www.w3.org/2001/XMLSchema"; xmlns:sec="http://marklogic.com/xdmp/security"; xmlns:admin="http://marklogic.com/xdmp/admin"; xmlns="http://marklogic.com/xdmp/database";> Database configurati..., http://marklogic.com/xdmp/database database.xsd" timestamp="13407369745517960" xmlns="http://marklogic.com/xdmp/database"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>403500397113234379, ()) $schemaroot = http://marklogic.com/xdmp/database"; xsi:schemaLocation="http://www.w3.org/2001/XMLSchema XMLSchema.xsd http://marklogic"; attributeFormDefault="unqualified" elementFormDefault="unqualified" xmlns:xhtml="http://www.w3.org/1999/xhtml"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xmlns:xs="http://www.w3.org/2001/XMLSchema"; xmlns:sec="http://marklogic.com/xdmp/security"; xmlns:admin="http://marklogic.com/xdmp/admin"; xmlns="http://marklogic.com/xdmp/database";> Database configurati... $datanode = http://marklogic.com/xdmp/database database.xsd" timestamp="13407369745517960" xmlns="http://marklogic.com/xdmp/database"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>403500397113234379 $msgs = () In /database-status.xqy on line 16 -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://community.marklogic.com/mailman/listinfo/general
Re: [MarkLogic Dev General] Missing Atom Media Type
On Thu, Sep 10, 2009 at 2:57 PM, Danny Sokolsky wrote: > You can use the Admin Interface Mimetypes page to add this entry to your > system. Yeah, I know. Shouldn't it ship with known media types? > I believe the name of the config file is mimetypes.xml (lowercase but plural). Right. The online documentation is wrong and needs to be fixed. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://xqzone.com/mailman/listinfo/general
[MarkLogic Dev General] Missing Atom Media Type
I noticed the atom media type is missing from the mimetypes.xml file. The proper entry should be: application/atom+xml atom xml Also, the documentation for xdmp:get-request-body() refers to this file as "Mimetypes.xml" but it is actually "mimetype.xml" in the distribution. -- --Alex Milowski "The excellence of grammar as a guide is proportional to the paucity of the inflexions, i.e. to the degree of analysis effected by the language considered." Bertrand Russell in a footnote of Principles of Mathematics ___ General mailing list General@developer.marklogic.com http://xqzone.com/mailman/listinfo/general