[REBOL] Interleaving of strings question Re:(2)
I think it is an interesting programming topic (and I did not mean my earlier comments to be absolute - it was directed at the seemingly unresolvable debates about whether or not do such and such). Most sorting is done, it seems, on some sort of scalar level, and I was interested in the idea of using each sort item as a description of something else, in this case a point on a plane. (I'm not sure why your #4 comment means my function is wrong. Unless you mean that the resulting sorted list is in order according to "its distance from a point 0,0 that defines the most 'lower-left' point of an arbitrary plane". I.E. it only works in quadrant II of that old x-y graph. Please explain, if not. Maybe what's more useful would be the function sorting in terms of distance from any point, where I think the deltas come back in.) To me it could relate to some sort of abstraction on sort (using objects, blocks whatever) that might be useful. I guess I knew I was opening myself up to a "look who's talking" attitude, but I was thinking about a real-world problem in terms of some abstraction and trying to really understand what the original poster's interleave method was getting at. Often that leads to a cool generalization that can be used elsewhere. That's why I considered it to be a programming topic. You could argue it belonged in a different list, but this is where the original post was, and I'm trying to learn REBOL, so it's where I posted. Is that the same as a thread discussing why MS is successful, or why they suck? Again, on some level you could argue it is. Up to you, I guess. --- [EMAIL PROTECTED] wrote: Hi, although your question has a little in common with Rebol, my answers are as follows: 1. It would be more precise to interleave each bit - ie. each binary digit - as the smallest unit of information 2. in that case the bit - sort yields: [0,0 2,2 1,5 5,1] 3. as you pointed out, the result of the sort doesn't depend on the representation, but on the comparison function, any suitable representation can do 4. your suggestion doesn't work, because you need close things in the terms of distance to remain close in the terms of sort. Your suggestion works only in the neighborhood of the 0,0 point Best regards Ladislav - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, September 25, 2000 1:04 PM Subject: [REBOL] Interleaving of strings question I was thinking about the earlier postings about sorting coordinates by interleaving digits of the 2 axis values. The interleaving, it would seem, is to make the longitude as significant as the latitude, in terms of distance from other points. Ignoring the need to have the lat/long pairs stored that way for later processing, and ignoring stated expected values for the coordinates I thought: 1. Wouldn't it be more precise to interleave each digit? 2. In terms of a regular x,y system, even that would be incorrect sometimes. E.G.: [ 0,0 1,5 2,2 5,1 ] (a sorted set of simplified coordinates) isn't correct. 2,2 is closer to 0,0; 1,5 and 5,1 are the same distance from 0,0. 3. Could pairs or tuples be used? Answer: pairs didn't sort that way and I have no reason to assume sort would think of tuples as coordinates. 4. Wouldn't it be the least convoluted to have a /compare function for sort that understood that all components of the coordinate were equally significant? That way there's no manipulating the coordinates, and it should be the most precise. Would a Pythagorean type thing work, because it looks like simply summing the absolute differences between each coordinate component isn't right: 1,1 to 1,5 = 4 ((1-1)+(5-1)) 1,1 to 2,4 = 4 ((2-1)+(4-1)) yet 2,4 is closer to 1,1 than is 1,5. Until the real distance is needed, I don't think you'd have to take the square root of the sum (and how do you do that for 3-dimensions, does it become cubed and cube root - aha) (ABS(x1-x2) to the nth) + (ABS(y1-y2) to the nth) ... + (ABS(n1-n2) to the nth) Comments? Is this right? __ Do You Yahoo!? Send instant messages get email alerts with Yahoo! Messenger. http://im.yahoo.com/ __ Do You Yahoo!? Send instant messages get email alerts with Yahoo! Messenger. http://im.yahoo.com/
[REBOL] Interleaving of strings question Re:
Since we want them sorted, we're talking about their relative distance from an origin ( 0,0,... ). So it became (x1 to the nth + y1 to the nth) (x2 to the nth + y2 to the nth)... If one of you great REBOLs would show how to do this using all the niceties of series! traversal I would appreciate it; I know my version is a pretty sad procedural version: coord-sort: func [ a b /local sum1 sum2 ] [ sum1: 0 sum2: 0 dimension: length? a while [ not tail? a ] [ sum1: add sum1 power first a dimension a: next a ] while [ not tail? b ] [ sum2: add sum2 power first b dimension b: next b ] sum1 sum2 ] --- [EMAIL PROTECTED] wrote: (ABS(x1-x2) to the nth) + (ABS(y1-y2) to the nth) ... + (ABS(n1-n2) to the nth) __ Do You Yahoo!? Send instant messages get email alerts with Yahoo! Messenger. http://im.yahoo.com/
[REBOL] Interleaving of strings question Re:(2)
Hi, Grant! 1) A couple of minor tweaks to coord-sort yields (*WARNING* *UNTESTED CODE FOLLOWS* *TEST BEFORE USING*) coord-sort: func [a b /local val dim sum] [ dim: length? a sum: 0 foreach val a [sum: sum + power val dim] foreach val b [sum: sum - power val dim] sum 0 ] 2) Your generalization over dimensionality appeals to me as a mathematician, but as a "software engineer" I am compelled to point out that if you're dealing with a performance bottle- neck, the following is faster: coord-sort-2d: func [a b /local val dim sum] [ dim: length? a sum: 0 foreach val a [sum: sum + (val * val)] foreach val b [sum: sum - (val * val)] sum 0 ] 3) Putting my mathematics hat back on, I am compelled to point out that if these are really lat/lon coordinates, then both the taxicab metric (absolute deltas) and squared Euclidean metric (sum squared deltas) can give wrong answers to the general case of "which points are nearest to a specific target point". Near the poles, lattitude deltas make MUCH more difference than comparable longitude deltas. For small deltas, a GREAT improvement in accuracy results from weighting longitude delta by cos avg lat in the sum. Some additional fiddling may be required due to the singularites at the poles, but pole crossings involve large longitude deltas anyway... 4) Putting my s/w eng'ing hat back on, I should add that the performance of the whole search issue (assuming you might need to search for a "nearest point" as in the previous comment) is dominated by the number of points in the entire search space. If the space is large, with substantial dispersion of points, consider using the following speedup trick: The maximum error of the taxicab metric vs. the Euclidean metric is the square root of two (e.g., the point at (1,1) has a taxicab metric of 2, instead of the Euclidean metric of 1.141... -- it's too large by a factor of sqrt 2. Ergo... Make a prepass across the search space using the taxicab metric, retaining only those points whose distance from the target is at most 1.414... times the least distance- from-target found thusfar. Then make a second pass across only the set collected in the prepass, using the correct- but-more-expensive real metric. Hope some of this is helpful! -jn- [EMAIL PROTECTED] wrote: Since we want them sorted, we're talking about their relative distance from an origin ( 0,0,... ). So it became (x1 to the nth + y1 to the nth) (x2 to the nth + y2 to the nth)... If one of you great REBOLs would show how to do this using all the niceties of series! traversal I would appreciate it; I know my version is a pretty sad procedural version: coord-sort: func [ a b /local sum1 sum2 ] [ sum1: 0 sum2: 0 dimension: length? a while [ not tail? a ] [ sum1: add sum1 power first a dimension a: next a ] while [ not tail? b ] [ sum2: add sum2 power first b dimension b: next b ] sum1 sum2 ]
[REBOL] Interleaving of strings question Re:(3)
Yow! I guess it's a lot safer to play in the theoretical space ;) I was wondering tho' if you imagine a 3-dimensional space, does changing the power to 3 from 2 work for Pythagoras? I recently talked to some GIS guys who pointed out the inherent errors in GPS because the tectonic plates move up to 51 cm / year, or something. They got a big kick out of that, but somehow I missed the punchline. I did think, though, that this was a definitively more accurate method than the original discussion (don't know if you saw it) that was going to interleave the long digits into the latitude... --- [EMAIL PROTECTED] wrote: Hi, Grant! 1) A couple of minor tweaks to coord-sort yields (*WARNING* *UNTESTED CODE FOLLOWS* *TEST BEFORE USING*) coord-sort: func [a b /local val dim sum] [ dim: length? a sum: 0 foreach val a [sum: sum + power val dim] foreach val b [sum: sum - power val dim] sum 0 ] 2) Your generalization over dimensionality appeals to me as a mathematician, but as a "software engineer" I am compelled to point out that if you're dealing with a performance bottle- neck, the following is faster: coord-sort-2d: func [a b /local val dim sum] [ dim: length? a sum: 0 foreach val a [sum: sum + (val * val)] foreach val b [sum: sum - (val * val)] sum 0 ] 3) Putting my mathematics hat back on, I am compelled to point out that if these are really lat/lon coordinates, then both the taxicab metric (absolute deltas) and squared Euclidean metric (sum squared deltas) can give wrong answers to the general case of "which points are nearest to a specific target point". Near the poles, lattitude deltas make MUCH more difference than comparable longitude deltas. For small deltas, a GREAT improvement in accuracy results from weighting longitude delta by cos avg lat in the sum. Some additional fiddling may be required due to the singularites at the poles, but pole crossings involve large longitude deltas anyway... 4) Putting my s/w eng'ing hat back on, I should add that the performance of the whole search issue (assuming you might need to search for a "nearest point" as in the previous comment) is dominated by the number of points in the entire search space. If the space is large, with substantial dispersion of points, consider using the following speedup trick: The maximum error of the taxicab metric vs. the Euclidean metric is the square root of two (e.g., the point at (1,1) has a taxicab metric of 2, instead of the Euclidean metric of 1.141... -- it's too large by a factor of sqrt 2. Ergo... Make a prepass across the search space using the taxicab metric, retaining only those points whose distance from the target is at most 1.414... times the least distance- from-target found thusfar. Then make a second pass across only the set collected in the prepass, using the correct- but-more-expensive real metric. Hope some of this is helpful! -jn- [EMAIL PROTECTED] wrote: Since we want them sorted, we're talking about their relative distance from an origin ( 0,0,... ). So it became (x1 to the nth + y1 to the nth) (x2 to the nth + y2 to the nth)... If one of you great REBOLs would show how to do this using all the niceties of series! traversal I would appreciate it; I know my version is a pretty sad procedural version: coord-sort: func [ a b /local sum1 sum2 ] [ sum1: 0 sum2: 0 dimension: length? a while [ not tail? a ] [ sum1: add sum1 power first a dimension a: next a ] while [ not tail? b ] [ sum2: add sum2 power first b dimension b: next b ] sum1 sum2 ] __ Do You Yahoo!? Send instant messages get email alerts with Yahoo! Messenger. http://im.yahoo.com/
[REBOL] Interleaving of strings question Re:(4)
[EMAIL PROTECTED] wrote: Yow! I guess it's a lot safer to play in the theoretical space ;) I was wondering tho' if you imagine a 3-dimensional space, does changing the power to 3 from 2 work for Pythagoras? Yes, if your 3-space is Euclidean. I recently talked to some GIS guys who pointed out the inherent errors in GPS because the tectonic plates move up to 51 cm / year, or something. They got a big kick out of that, but somehow I missed the punchline. ROTFL! You should respond, "Yeah, but they more than compensated for that when they turned of SA!" I did think, though, that this was a definitively more accurate method than the original discussion (don't know if you saw it) that was going to interleave the long digits into the latitude... Well, I saw it, but didn't understand what problem was being solved (and still don't). Is the task is to: 1) take a given target location and return the point in the database that is geographically nearest; 2) take a given target location and return a list of all points from the database that are "near enough"; 3) sort a collection of points and present a human- readable display based on nearest-to-farthest; or 4) something else? In addition, I'm unclear as to whether: 1) the target is an arbitrary point or is constrained to be one of the points in the database; 2) how many points are in the database; 3) what resolution is required; and 4) the dispersion of the points in the database: 4a) how wide an area is covered, and 4b) how "clumpy" the distribution is. All of these would influence my choice of strategy. For example, if the task is (1) or (2), and the points are fairly uniformly distributed over a large range of lat/lon values, I'd consider a two-level strategy of partitioning the data into coarse-grained lat/lon cells (one-time, front-end work), then searching only the cell that should contain the target point (or nearby cells as needed for "near enough" or "empty cell" constraints). For example -- assuming two-degree cells for the sake of illustration -- to locate the point 37.4968N x 89.9935W, look first in the cell containing 36-37N x 88-89W. As needed, expand the search into the neighboring cells: .......... .......... ..38-39N36-37N34-35N.. ..90-91W90-91W90-91W.. ..38-39N36-37N34-35N.. ..88-89W88-89W88-89W.. ..38-39N36-37N34-35N.. ..86-87W86-87W86-87W.. .......... .......... This strategy lets you "zoom in" quickly to the area of relevance, then do the more complex calculations only for a small subset of the space. Another strategy would be to pre-compute the true distance-from-origin for each point (storing that value with each point), then use that as a figure of merit for limiting the search space. Points near the target will obviously have a similar distance-from-origin to the target's value. Of course, the converse is not true (90Nx0W is the same distance from 0Nx0W as 0Wx90N), but the point of this trick is just to reduce the nunber of points that must be individually evaluated. Working with real data on real computers with real memory and performance limits can DEFINITELY get hairy! Have fun! -jn-
[REBOL] Interleaving of strings question Re:
Hi, although your question has a little in common with Rebol, my answers are as follows: 1. It would be more precise to interleave each bit - ie. each binary digit - as the smallest unit of information 2. in that case the bit - sort yields: [0,0 2,2 1,5 5,1] 3. as you pointed out, the result of the sort doesn't depend on the representation, but on the comparison function, any suitable representation can do 4. your suggestion doesn't work, because you need close things in the terms of distance to remain close in the terms of sort. Your suggestion works only in the neighborhood of the 0,0 point Best regards Ladislav - Original Message - From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, September 25, 2000 1:04 PM Subject: [REBOL] Interleaving of strings question I was thinking about the earlier postings about sorting coordinates by interleaving digits of the 2 axis values. The interleaving, it would seem, is to make the longitude as significant as the latitude, in terms of distance from other points. Ignoring the need to have the lat/long pairs stored that way for later processing, and ignoring stated expected values for the coordinates I thought: 1. Wouldn't it be more precise to interleave each digit? 2. In terms of a regular x,y system, even that would be incorrect sometimes. E.G.: [ 0,0 1,5 2,2 5,1 ] (a sorted set of simplified coordinates) isn't correct. 2,2 is closer to 0,0; 1,5 and 5,1 are the same distance from 0,0. 3. Could pairs or tuples be used? Answer: pairs didn't sort that way and I have no reason to assume sort would think of tuples as coordinates. 4. Wouldn't it be the least convoluted to have a /compare function for sort that understood that all components of the coordinate were equally significant? That way there's no manipulating the coordinates, and it should be the most precise. Would a Pythagorean type thing work, because it looks like simply summing the absolute differences between each coordinate component isn't right: 1,1 to 1,5 = 4 ((1-1)+(5-1)) 1,1 to 2,4 = 4 ((2-1)+(4-1)) yet 2,4 is closer to 1,1 than is 1,5. Until the real distance is needed, I don't think you'd have to take the square root of the sum (and how do you do that for 3-dimensions, does it become cubed and cube root - aha) (ABS(x1-x2) to the nth) + (ABS(y1-y2) to the nth) ... + (ABS(n1-n2) to the nth) Comments? Is this right? __ Do You Yahoo!? Send instant messages get email alerts with Yahoo! Messenger. http://im.yahoo.com/
[REBOL] Interleaving of strings question Re:(5) (UnRebolish commentary)
In light of the recent posts... Yes this has nothing to do with Rebol in the strictest sense. Mea Culpa. Regarding this whole interleaving debate might I submit that I never intended to seed a thread around the validity of interleaving as an optimal method of performing complex and or seamless geospatial calculations. But while I'm here and the thread is active I'll mention that I just needed some Rebol code to explore the concept on a practical and or intellectual level. The method I described is only one variation on this theme, as some of you illustratedbyrecasting it in different forms. What I described was actually a bastard child from a specific method involving binary interleaving along with some other features that I could not possibly explain off the top of my head without the original source material. I truncated the method description intentionally for the sake of being able to pose my real request for Rebol parsing methods. And yes the method does break down across boundaries and goes for a tilt in a big way at the poles. In the absence of higher level support the conceptfails in a variety of ways that make it a poor contender for afully functioningGIS. Do not attempt to do anything more serious than a napkin sketch of this method without the supervision of an adult or perhaps an attending physician unless you wish to go numerically mad with something akin to a division by zero. Thanks for the feedback, it was most enlightening and informative. Sincerely, Jim