Re: [sqlite] General R*Tree query
On Wed, Dec 18, 2013 at 1:53 AM, Roman Fleysher < roman.fleys...@einstein.yu.edu> wrote: > Perhaps this is a weird way for me to get deeper knowledge of R trees, and > because I vaguely remember that Tyco refers to a specific epoch in which > coordinates are defined, but would it be possible to search R tree using a > cone, i.e. stars within a cone of certain degree around given star? This > would require a trigonometric calculation before comparison can be made but > can be done in a single comparison. > > Or, since RA and DEC coordinates are not area preserving (nor distance) -- > i.e. angle between stars at DEC =0 is bigger than angle between stars at > DEC=80 when they are the same delta RA apart -- then maybe instead of > defining rectangular FOV in RA and DEC one should be defining rectangular > FOV in DEC, sin(RA)? Then one would not need two searches. > > The goal is to find neighbors to a given star defined roughly by some > metric? Since there's nothing magical in RA , DEC coordinates the metric > could use some other coordinates? Every [RA,DEC] pair resolves to a unit vector in Cartesian coordinate space i.e. an [X,Y,Z] triplet on the surface of a unit sphere; that would be a continuous metric without the RA=0=360 issue. I don't see why the R*Tree could not be set up with X, Y, and Z, plus magnitude limits; the set of nodes is hollow in a 3D sense so the first-level non-leaf nodes would have a lot of empty space, but I don't think that matters; I've been thinking about doing it this way for some time. For my app I already store XYZs in the outer, non-R*Tree table because all final comparisons have to be in Cartesian space anyway. But in general the search region is so small that the cosine[DEC] dependence of distance per degree of RA is effectively constant for any one search, and an [RA,DEC,Mag] tree should be "good enough" because it pares down the search space quickly from 2.5M stars in Tycho=2 to a few hundred very quickly, with the caveat that there is a special case near RA=0=360. In any event the R*Tree is not going to do the final geometric comparison, rather it reduces the number of stars that need that comparison, and that's the goal. Btw, if you want to see something cool, look at chapter 4 of Dustin Lang's thesis (www.astro.princeton.edu/~dstn/lang-thesis.pdf); the whole paper is cool, but I really enjoy that chapter. -b ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] General R*Tree query
Perhaps this is a weird way for me to get deeper knowledge of R trees, and because I vaguely remember that Tyco refers to a specific epoch in which coordinates are defined, but would it be possible to search R tree using a cone, i.e. stars within a cone of certain degree around given star? This would require a trigonometric calculation before comparison can be made but can be done in a single comparison. Or, since RA and DEC coordinates are not area preserving (nor distance) -- i.e. angle between stars at DEC =0 is bigger than angle between stars at DEC=80 when they are the same delta RA apart -- then maybe instead of defining rectangular FOV in RA and DEC one should be defining rectangular FOV in DEC, sin(RA)? Then one would not need two searches. The goal is to find neighbors to a given star defined roughly by some metric? Since there's nothing magical in RA , DEC coordinates the metric could use some other coordinates? Roman Original message From: "Brian T. Carcich" <briantcarc...@gmail.com> Date: 12/18/2013 1:24 AM (GMT-05:00) To: General Discussion of SQLite Database <sqlite-users@sqlite.org> Subject: Re: [sqlite] General R*Tree query On Tue, Dec 17, 2013 at 3:57 PM, Roman Fleysher < roman.fleys...@einstein.yu.edu> wrote: > > Since coordinate system is spherical, how do you tell that RA=23:59 and > RA=00:01 are next to each other using usual comparisons? I don't; usual comparisons won't work so I do two comparisons: I am usually looking for stars within a convex field of view (FOV), typically a frustum with a rectangular footprint, so I determine if and where RA=0=360 crosses that footprint, and break the FOV into two pieces, from 0<=RA<=loRA and hiRA<=RA<=360, so loRA becomes hira in one search and hiRA become lora in the other. There are only three cases: zero, one or two crossings. Zero crossings means I can do everything in one SELECT; one crossing means either one of the poles is in the FOV and I search RA=0 to 360; DEC=someDEC to +/-90, or the FOV touches RA=0(=360) from one side or the other, which reduces to the zero case; two crossings means the poles are not in the FOV and I can do two searches as mentioned above, from 0 up to someLowRA and from 360 down to someHighRA. There are some edge cases but that is basically it. I actually handle "two or more crossings" cases the same as two cases, even though I don't think "more" can happen with a convex FOV footprint. For any edge (segment of the great circle between two vertices) of the FOV that crosses RA=0, which is easily determined since I have the vertices in XYZ coordinates, I insert a vertex in the edge at the crossing, and then recurse with subsets of vertices split across RA=0. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] General R*Tree query
On Tue, Dec 17, 2013 at 3:57 PM, Roman Fleysher < roman.fleys...@einstein.yu.edu> wrote: > > Since coordinate system is spherical, how do you tell that RA=23:59 and > RA=00:01 are next to each other using usual comparisons? I don't; usual comparisons won't work so I do two comparisons: I am usually looking for stars within a convex field of view (FOV), typically a frustum with a rectangular footprint, so I determine if and where RA=0=360 crosses that footprint, and break the FOV into two pieces, from 0<=RA<=loRA and hiRA<=RA<=360, so loRA becomes hira in one search and hiRA become lora in the other. There are only three cases: zero, one or two crossings. Zero crossings means I can do everything in one SELECT; one crossing means either one of the poles is in the FOV and I search RA=0 to 360; DEC=someDEC to +/-90, or the FOV touches RA=0(=360) from one side or the other, which reduces to the zero case; two crossings means the poles are not in the FOV and I can do two searches as mentioned above, from 0 up to someLowRA and from 360 down to someHighRA. There are some edge cases but that is basically it. I actually handle "two or more crossings" cases the same as two cases, even though I don't think "more" can happen with a convex FOV footprint. For any edge (segment of the great circle between two vertices) of the FOV that crosses RA=0, which is easily determined since I have the vertices in XYZ coordinates, I insert a vertex in the edge at the crossing, and then recurse with subsets of vertices split across RA=0. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] General R*Tree query
On Tue, Dec 17, 2013 at 3:51 PM, Dan Kennedywrote: > On 12/18/2013 12:49 AM, Brian T. Carcich wrote: > >> [...] > > Points are fine. [...] > Is it working now? How many more stars do you have data for? Excellent, thanks for the info! I forgot to mention that we do perform searches using magnitude. Yes it is working now; I do the normal SQLite3 R*Tree INNER JOIN to get to the index table (tyc2index) from the indexrtree table (tyc2indexrtree) regions overlapping the user-supplied RA,DEC limits (hira = High RA limit; lodec = Low DEC limit; etc), and then do another INNER JOIN ON the index table start and end offsets with the offsets in the main catalog table (tyc2catalog_uvs), so it all happens in one call. The beauty is that all the work is done up front when I load the data from the star catalog, and then the SELECT does the rest. Also, the approach should work for any catalog that has RA,DEC and Magnitude, which almost all catalogs do. I think the SELECT is in the Githup repo ... yeah, here it is: SELECT tyc2catalog_uvs.offset ,tyc2catalog_uvs.x ,tyc2catalog_uvs.y > ,tyc2catalog_uvs.z ,tyc2catalog_uvs.mag > FROM tyc2indexrtree INNER JOIN tyc2index > ON tyc2indexrtree.offset=tyc2index.offset INNER JOIN tyc2catalog_uvs > ON tyc2index.catalogstart<=tyc2catalog_uvs.offset >AND tyc2index.catalogend>tyc2catalog_uvs.offset >AND tyc2catalog_uvs.mag AND tyc2indexrtree.hira>? > AND tyc2indexrtree.lora AND tyc2indexrtree.hidec>? > AND tyc2indexrtree.lodec ORDER BY tyc2catalog_uvs.mag asc; ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] General R*Tree query
Since coordinate system is spherical, how do you tell that RA=23:59 and RA=00:01 are next to each other using usual comparisons? Roman From: sqlite-users-boun...@sqlite.org [sqlite-users-boun...@sqlite.org] on behalf of Dan Kennedy [danielk1...@gmail.com] Sent: Tuesday, December 17, 2013 3:51 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] General R*Tree query On 12/18/2013 12:49 AM, Brian T. Carcich wrote: > I'm working on an SQLite solution to get at star catalogs; they are usually > searched via Right Ascension (RA), Declination (DEC), and magnitude (mag). > RA,DEC is a spherical coordinate system to specify a star position on-sky; > magnitude is related to star brightness. > > What I have so far is here: > > https://github.com/drbitboy/Tycho2_SQLite_RTree > > > I started with the Tycho-2 star catalog. It comprises 2.5 million stars in > a flat ASCII, fixed-width catalog file (actually two files but ignore that > for now), and an index file (map) of ~10k small RA-DEC regions, with an > average of ~250 stars in each region. The regions do not overlap, and all > the stars in any one region are in contiguous lines in the catalog file. > > The index file does not implement any grouping or sorting by magnitude. > Each index region refers to > > A) a contiguous region on-sky with defined by a min-max RA pair and a > min-max DEC pair. > > B) a contiguous range of the lines (stars) in the flat file that are > within that region. > > So the data in the index file are a reasonable starting point for an R*Tree > in SQLite3. I put the index file data into the virtual table using the RA > and DEC limits for each region as the two min-max pairs of columns in the > table, and the index table, referenced by the primary key of the virtual > table, contains the starting and ending+1 indices (offsets actually) of the > stars in the flat catalog file for each region. > > So I use the R*Tree module to get a fast lookup into the index table, > returning index regions that overlap an input RA and DEC min-max pair, then > step through the catalog lines for each of those regions. > > Here's my question: is there any advantage to skipping the index step and > putting the star catalog data into the virtual table itself? One advantage > is that I could include the magnitude in the rtree table. > > The reason I ask is that rtree table uses min-max pairs, but each star is a > point so the min and max are equal for each star. Would that break any > implicit R*Tree rules or negate any efficiencies? Points are fine. So long as (max>=min) for all dimensions. R-tree will return SQLITE_CONSTRAINT if you try to insert a record for which this is not the case. I guess in theory storing each individual star in the r-tree might be more efficient. Difficult to say if it would be significant though. The r-tree and other virtual table code is not as optimized as the core, so even if it seems better in theory it might not be in practice. Adding the magnitude to the r-tree structure as an extra dimension that is never queried sounds like it might make the r-tree structure less efficient though. R-tree will make some effort to place records with similar magnitudes on the same node, which won't help the query but will presumably cause some reduction in the localization that does matter. Is it working now? How many more stars do you have data for? Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] General R*Tree query
On 12/18/2013 12:49 AM, Brian T. Carcich wrote: I'm working on an SQLite solution to get at star catalogs; they are usually searched via Right Ascension (RA), Declination (DEC), and magnitude (mag). RA,DEC is a spherical coordinate system to specify a star position on-sky; magnitude is related to star brightness. What I have so far is here: https://github.com/drbitboy/Tycho2_SQLite_RTree I started with the Tycho-2 star catalog. It comprises 2.5 million stars in a flat ASCII, fixed-width catalog file (actually two files but ignore that for now), and an index file (map) of ~10k small RA-DEC regions, with an average of ~250 stars in each region. The regions do not overlap, and all the stars in any one region are in contiguous lines in the catalog file. The index file does not implement any grouping or sorting by magnitude. Each index region refers to A) a contiguous region on-sky with defined by a min-max RA pair and a min-max DEC pair. B) a contiguous range of the lines (stars) in the flat file that are within that region. So the data in the index file are a reasonable starting point for an R*Tree in SQLite3. I put the index file data into the virtual table using the RA and DEC limits for each region as the two min-max pairs of columns in the table, and the index table, referenced by the primary key of the virtual table, contains the starting and ending+1 indices (offsets actually) of the stars in the flat catalog file for each region. So I use the R*Tree module to get a fast lookup into the index table, returning index regions that overlap an input RA and DEC min-max pair, then step through the catalog lines for each of those regions. Here's my question: is there any advantage to skipping the index step and putting the star catalog data into the virtual table itself? One advantage is that I could include the magnitude in the rtree table. The reason I ask is that rtree table uses min-max pairs, but each star is a point so the min and max are equal for each star. Would that break any implicit R*Tree rules or negate any efficiencies? Points are fine. So long as (max>=min) for all dimensions. R-tree will return SQLITE_CONSTRAINT if you try to insert a record for which this is not the case. I guess in theory storing each individual star in the r-tree might be more efficient. Difficult to say if it would be significant though. The r-tree and other virtual table code is not as optimized as the core, so even if it seems better in theory it might not be in practice. Adding the magnitude to the r-tree structure as an extra dimension that is never queried sounds like it might make the r-tree structure less efficient though. R-tree will make some effort to place records with similar magnitudes on the same node, which won't help the query but will presumably cause some reduction in the localization that does matter. Is it working now? How many more stars do you have data for? Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users