It's been a while since I've played with great circle distances, but you are right that you want that to be simple to understand.
Thanks, -- Raul On Tue, Jan 6, 2015 at 2:45 PM, Vijay Lulla <[email protected]> wrote: > Actually I used (32$2)#:_17998321 . I agree about the point of simplicity > but there are other reasons for using a simpler approach too. I didn't > consider using padding/fills because calculations involving > latitudes/longitudes have peculiarities just like date/time > representations. For e.g., calculating distance between points in > east/west and north/south hemispheres ( > http://workshops.boundlessgeo.com/postgis-intro/geography.html). This > comes up often in GIS related calculations, especially when every mapping > webservice is using lat/lon to store geographic data. And hence, I try to > find the simplest approach (human centric) even if it might not be the most > efficient (machine/task centric). > > On Tue, Jan 6, 2015 at 1:43 PM, Raul Miller <[email protected]> wrote: > >> The &.|. in ($!.1)&.|. is so the 1s you are padding with appear on the >> left, rather than on the right. >> >> Another way of accomplishing that would be like this: >> _32 {.!.1 #:-17998321 >> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 >> >> But note that there's a problem here (with both of these approaches): >> 32 ($!.1)&.|. 1 >> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> >> ...padding on the left with 1s is only valid for negative numbers. >> >> Meanwhile, >> #: 256 #. |. a.i. 2 ic _17998321 >> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 >> >> ... so the |. which appears in that expression works the same way on >> probably any intel machine and would probably be relevant in most >> contexts (though I've not tested on ARM processor, and I'm not even >> sure if current versions of J run on ARM... and since ARM is >> "bi-endian" I'm not sure what "native" would mean in that context - >> but I guess it would make sense for ARM handling to mimic intel >> handling). >> >> But there's an even simpler approach: >> (32#2)#:_17998321 >> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 >> >> Simplicity is a virtue? >> >> Thanks, >> >> -- >> Raul >> >> On Tue, Jan 6, 2015 at 12:44 PM, 'Pascal Jasmin' via Programming >> <[email protected]> wrote: >> > I'm not sure this is the same problem, unless you are trying to encode >> within a range of lattitudes that include dense civilization, and >> longitudes that exclude oceans. >> > >> > but the 32bit binary rep of -17998321 >> > >> > 32 ($!.1)&.|. #: -17998321 >> > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 >> > >> > though the "memcopy" version is simpler. >> > >> > #: 256 #. |. a.i. 2 ic _17998321 >> > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 >> > >> > -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic _17998321 >> > _17998321 >> > -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic 17998321 >> > 17998321 >> > >> > >> > I'm not sure if the |. step is only for non macs. >> > >> > >> > >> > ----- Original Message ----- >> > From: Vijay Lulla <[email protected]> >> > To: [email protected] >> > Cc: >> > Sent: Tuesday, January 6, 2015 11:38 AM >> > Subject: Re: [Jprogramming] variable integer coding (challenge) >> > >> > I have followed this thread with interest. Much of this discussion is >> > relevant to >> > >> https://developers.google.com/maps/documentation/utilities/polylinealgorithm >> > . Like Joe, I also enjoyed playing at bits level and was impressed by how >> > easy J makes such explorations. Unfortunately, I couldn't generate all >> the >> > encoded values in the example table listed on the page. Maybe others in >> > this group might enjoy exploring this algorithm. >> > >> > Thanks for the great discussion. >> > >> > On Wed, Dec 31, 2014 at 3:15 PM, Joe Bogner <[email protected]> wrote: >> > >> >> Thanks. I couldn't follow the examples so I worked through it on my >> >> own. Sharing it here for anyone else who's following. I switched to 9 >> >> bits instead of 19 to simplify the example >> >> >> >> NB. 300 needs 9 bits to be expressed >> >> $ #: 300 >> >> 9 >> >> >> >> NB. Show 4 9 bit numbers >> >> $ #: 300 301 302 303 >> >> 4 9 >> >> >> >> Writing those 4 numbers would take 16 bytes[1] since J ints are 4 >> >> bytes as well as C >> >> >> >> We can compress it down to 5 bytes if we pack all the bits together: >> >> >> >> ] packed=. #. _8[\ , #: 300 301 302 303 >> >> 150 75 101 210 240 >> >> >> >> $ packed >> >> 5 >> >> >> >> Those 5 bytes could be written to a file and re-read >> >> >> >> If I knew in advance that those numbers were 9 bits, it can be reversed >> >> >> >> #. _9[\, #: packed >> >> 300 301 302 303 0 >> >> >> >> We could also pack it into 8 bytes (two 32 bit integers), but I don't >> >> see an advantage of that over packing into a single byte (8 bit) >> >> >> >> ] packed=. #. _32[\ , #: 300 301 302 303 >> >> 2.52152e9 4.02653e9 >> >> >> >> #. _9[\, #: packed >> >> 300 301 302 303 0 0 0 0 >> >> >> >> >> >> This may be elementary for some but I haven't worked at this level in >> >> years so I enjoyed playing with it >> >> >> >> >> >> >> >> >> >> [1] - http://www.jsoftware.com/help/learning/27.htm >> >> >> >> On Wed, Dec 31, 2014 at 11:51 AM, 'Pascal Jasmin' via Programming >> >> <[email protected]> wrote: >> >> > Also, if you have only 4 19 bit values, you can pack them up tighter >> >> > >> >> > _2 (16&#.)\ #. |: ?. 4 19 $ 2 >> >> > 109 59 161 30 136 80 64 129 56 1 >> >> > >> >> > Non multiples of 8 would just generally leave some trailing 0s on >> >> decode, which could simply be understood as invalid. >> >> > >> >> > >> >> > ----- Original Message ----- >> >> > From: 'Pascal Jasmin' via Programming <[email protected]> >> >> > To: "[email protected]" <[email protected]> >> >> > Cc: >> >> > Sent: Wednesday, December 31, 2014 11:40 AM >> >> > Subject: Re: [Jprogramming] variable integer coding (challenge) >> >> > >> >> > base64 is essentially taking 3 bytes and storing them as 4. The >> inverse >> >> can take 4 6bit values and pack them into 3 bytes. The J code for >> base64 >> >> is pretty generic to see how to do it for any base. 7 bit values could >> be >> >> packed 8 at a time in 7 bytes. >> >> > >> >> > There is a different way to pack 8 x bit values into x bytes. Each >> byte >> >> would store the i'th bit of the 8 values. This may be even faster than >> >> base64 type coding. >> >> > >> >> > #. |: ?. 8 19 $ 2 >> >> > 104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28 >> >> > >> >> > >> >> > ----- Original Message ----- >> >> > From: Joe Bogner <[email protected]> >> >> > To: [email protected] >> >> > Cc: >> >> > Sent: Wednesday, December 31, 2014 10:02 AM >> >> > Subject: Re: [Jprogramming] variable integer coding (challenge) >> >> > >> >> > Can 19 bits be stored in memory or a file on modern computers without >> >> > taking up 3 bytes (24 bits)? The only idea that came to mind would >> some >> >> bit >> >> > packing scheme with multiple items. That seems complicated unless >> there >> >> was >> >> > a need to store billions of these numbers >> >> > >> >> > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote: >> >> > >> >> >> Thanks. I missed the point of the variable breakup point but >> exploring >> >> >> different options below illustrated it for me. >> >> >> >> >> >> Changing the breakup point can encode the numbers into different bit >> >> >> lengths (other than 8) which can then be written to the file/memory. >> >> >> >> >> >> For example, 1021 encoded in 511 breakups would take 18 bits >> >> >> >> >> >> $ #: 511 toVint 1021 >> >> >> 2 9 >> >> >> >> >> >> Whereas encoding it as 255 would require 40 bits, which is more than >> a >> >> >> 32-bit int. This is where I was failing to see the point of encoding >> >> >> larger numbers >> >> >> >> >> >> NB. 5 bytes, 40 bits >> >> >> $ #: 255 toVint 1021 >> >> >> 5 8 >> >> >> >> >> >> Without trying the binary representation, I was expecting the breakup >> >> >> point to take the same space. For example, below I would have thought >> >> >> that I'd need to write a >> >> >> >> >> >> NB. 6 bytes >> >> >> 255 toVint 511 toVint 1021 >> >> >> 255 255 1 255 255 0 >> >> >> >> >> >> >> >> >> If I understand this correctly, I can see how variable ints could >> >> >> shave off a few bits if the range of numbers is known in advance, >> >> >> which is often the case. >> >> >> >> >> >> >> >> >> Regarding packed strings I've been thinking about it recently as >> well. >> >> >> I would like to see how it could be used to pack a table of strings >> >> >> (rows and columns). >> >> >> >> >> >> The following string could be a packed representation of a 2x2 table >> >> >> >> >> >> FrankUSATomGermany >> >> >> >> >> >> The binary format could be: >> >> >> >> >> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length].... >> >> >> >> >> >> The header for the string could be >> >> >> 2 2 5 3 3 7 >> >> >> >> >> >> Or maybe if you wanted to support a variable number of columns >> (unsure >> >> >> of the usecase of this) >> >> >> >> >> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2 >> length] >> >> >> 2 2 5 3 2 3 7 >> >> >> >> >> >> I don't know well it'd perform relative to boxed tables or inverted >> >> >> tables. I would imagine the memory would be reduced for storage but >> >> >> I'd have to test to see how much memory would be used on different >> >> >> operations. I'd be interested in the performance of operations like >> >> >> searching (e.g. find all the rows with a country of Germany) >> >> >> >> >> >> It might also be a safer way of storing user-input text without >> >> >> worrying about escaping delimiters. >> >> >> >> >> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming >> >> >> <[email protected]> wrote: >> >> >> > I also provided some adverbs >> >> >> > >> >> >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]' >> >> >> > fromVint=: 1 : 'm&~: +/;.2 ]' >> >> >> > >> >> >> > these let you set the breakup point to what you wish >> >> >> > >> >> >> > 15 toVint 33 12 17 >> >> >> > 15 15 3 12 15 2 >> >> >> > 511 toVint 1233 >> >> >> > 511 511 211 >> >> >> > >> >> >> > 511 fromVint 511 toVint 1233 444 >> >> >> > 1233 444 >> >> >> > It can provide simple compression. If you have a list of many >> strings >> >> >> where almost all of them will fit into a 6bit alphabet, you can still >> >> >> encode a full 7 bit or 8bit alphabet. On average, you'd save space >> >> using >> >> >> 6bit alphabet, depending on your context. >> >> >> > >> >> >> > The variable string functions I showed, still let you do a >> substring >> >> >> search on the packed data. Part of it is disk and memory compression >> >> >> (don't expand until you need to, and a string takes much less memory >> >> than a >> >> >> list of numbers < 256), but part of it is not being forced in design >> to >> >> >> determine what the impossible cutoff number is. >> >> >> > >> >> >> > A big advantage related to J is having binary data records. If you >> >> >> encode the length of the binary data, you no longer have to box them >> if >> >> a >> >> >> trailing 0 or space would be difficult to know whether it is a fill >> or >> >> >> intentional. >> >> >> > >> >> >> > Beyond binary data, there is dumb stuff such as storing addresses. >> >> Most >> >> >> people have just 1, Most fit within 255 bytes. Usually no one cares >> >> about >> >> >> address info except when they do or already have the client record. >> Do >> >> you >> >> >> really want a separate table with 1 to many join with a possible >> 10000 >> >> >> character fixed size per address just because you're unsure if it >> will >> >> ever >> >> >> be possible for you to store an address to an African village as a >> set >> >> of >> >> >> driving instructions and then a description of the house relative to >> 3 >> >> >> other houses and landmarks just in case one of the other houses gets >> >> >> painted. >> >> >> > >> >> >> > Another neat feature about the coding method is that if you have a >> lot >> >> >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at >> least >> >> >> compresses well. >> >> >> > >> >> >> > A minor thing is that the encoding still lets you do search by < >> or >. >> >> >> If you are storing the number of orders per customer. The first byte >> >> tells >> >> >> you whether it is > 254 or less than any other number below 255 >> >> (probably >> >> >> enough info to select customers of interest). Another minor thing is >> >> that >> >> >> you don't have to use the encoding codes until there is an >> overflow. It >> >> >> may be easier to just switch functions to decode variable data, then >> to >> >> >> reshape terrabytes of data. >> >> >> > ________________________________ >> >> >> > From: Joe Bogner <[email protected]> >> >> >> > To: [email protected] >> >> >> > Sent: Tuesday, December 30, 2014 10:45 AM >> >> >> > Subject: Re: [Jprogramming] variable integer coding (challenge) >> >> >> > >> >> >> > >> >> >> > Pascal, >> >> >> > >> >> >> > Thanks for sharing. Can you also share a potential use case? >> >> >> > >> >> >> > I had to remind myself about the size of integers (4 bytes) on >> 32-bit. >> >> >> > >> >> >> > Will this only save space on numbers less than 1019? >> >> >> > >> >> >> > NB. 4 bytes >> >> >> > toVint2 1019 >> >> >> > 255 255 255 254 >> >> >> > >> >> >> > NB. 5 bytes >> >> >> > toVint2 1020 >> >> >> > 255 255 255 255 0 >> >> >> > >> >> >> > Are you looking for the savings in terms of disk space (saving it >> to a >> >> >> > file) or working memory? >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming >> >> >> > <[email protected]> wrote: >> >> >> >> A common design decision is how large to make a field. For many >> >> >> numbers, we think that a byte will be large enough to hold practical >> >> >> values, but one day we will find out that it is not. This is >> >> essentially >> >> >> the Y2k problem, or year 2032 issue. >> >> >> >> >> >> >> >> A simple variable integer coding (tied to byte ranges) >> >> >> >> >> >> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each >> >> >> >> >> >> >> >> toVint 566 44 >> >> >> >> 255 255 56 44 >> >> >> >> >> >> >> >> It now takes 3 bytes to store 566, and one byte to store 44. >> >> >> >> >> >> >> >> the challenge is to write fromVint such that >> >> >> >> >> >> >> >> >> >> >> >> 566 44 -: fromVint 255 255 56 44 >> >> >> >> >> >> >> >> as an extra challenge, is there a way to write toVint or fromVint >> >> such >> >> >> that it is faster than using intermediate boxing? >> >> >> >> >> >> ---------------------------------------------------------------------- >> >> >> >> For information about J forums see >> >> http://www.jsoftware.com/forums.htm >> > >> >> > >> >> >> > >> ---------------------------------------------------------------------- >> >> >> > For information about J forums see >> >> http://www.jsoftware.com/forums.htm >> >> >> > >> ---------------------------------------------------------------------- >> >> >> > For information about J forums see >> >> http://www.jsoftware.com/forums.htm >> >> >> >> >> > ---------------------------------------------------------------------- >> >> > For information about J forums see >> http://www.jsoftware.com/forums.htm >> >> > ---------------------------------------------------------------------- >> >> > For information about J forums see >> http://www.jsoftware.com/forums.htm >> >> > ---------------------------------------------------------------------- >> >> > For information about J forums see >> http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> >> For information about J forums see http://www.jsoftware.com/forums.htm >> >> >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
