Actually I used (32$2)#:_17998321 . I agree about the point of simplicity but there are other reasons for using a simpler approach too. I didn't consider using padding/fills because calculations involving latitudes/longitudes have peculiarities just like date/time representations. For e.g., calculating distance between points in east/west and north/south hemispheres ( http://workshops.boundlessgeo.com/postgis-intro/geography.html). This comes up often in GIS related calculations, especially when every mapping webservice is using lat/lon to store geographic data. And hence, I try to find the simplest approach (human centric) even if it might not be the most efficient (machine/task centric).
On Tue, Jan 6, 2015 at 1:43 PM, Raul Miller <[email protected]> wrote: > The &.|. in ($!.1)&.|. is so the 1s you are padding with appear on the > left, rather than on the right. > > Another way of accomplishing that would be like this: > _32 {.!.1 #:-17998321 > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > But note that there's a problem here (with both of these approaches): > 32 ($!.1)&.|. 1 > 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > > ...padding on the left with 1s is only valid for negative numbers. > > Meanwhile, > #: 256 #. |. a.i. 2 ic _17998321 > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > ... so the |. which appears in that expression works the same way on > probably any intel machine and would probably be relevant in most > contexts (though I've not tested on ARM processor, and I'm not even > sure if current versions of J run on ARM... and since ARM is > "bi-endian" I'm not sure what "native" would mean in that context - > but I guess it would make sense for ARM handling to mimic intel > handling). > > But there's an even simpler approach: > (32#2)#:_17998321 > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > Simplicity is a virtue? > > Thanks, > > -- > Raul > > On Tue, Jan 6, 2015 at 12:44 PM, 'Pascal Jasmin' via Programming > <[email protected]> wrote: > > I'm not sure this is the same problem, unless you are trying to encode > within a range of lattitudes that include dense civilization, and > longitudes that exclude oceans. > > > > but the 32bit binary rep of -17998321 > > > > 32 ($!.1)&.|. #: -17998321 > > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > > > though the "memcopy" version is simpler. > > > > #: 256 #. |. a.i. 2 ic _17998321 > > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1 > > > > -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic _17998321 > > _17998321 > > -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic 17998321 > > 17998321 > > > > > > I'm not sure if the |. step is only for non macs. > > > > > > > > ----- Original Message ----- > > From: Vijay Lulla <[email protected]> > > To: [email protected] > > Cc: > > Sent: Tuesday, January 6, 2015 11:38 AM > > Subject: Re: [Jprogramming] variable integer coding (challenge) > > > > I have followed this thread with interest. Much of this discussion is > > relevant to > > > https://developers.google.com/maps/documentation/utilities/polylinealgorithm > > . Like Joe, I also enjoyed playing at bits level and was impressed by how > > easy J makes such explorations. Unfortunately, I couldn't generate all > the > > encoded values in the example table listed on the page. Maybe others in > > this group might enjoy exploring this algorithm. > > > > Thanks for the great discussion. > > > > On Wed, Dec 31, 2014 at 3:15 PM, Joe Bogner <[email protected]> wrote: > > > >> Thanks. I couldn't follow the examples so I worked through it on my > >> own. Sharing it here for anyone else who's following. I switched to 9 > >> bits instead of 19 to simplify the example > >> > >> NB. 300 needs 9 bits to be expressed > >> $ #: 300 > >> 9 > >> > >> NB. Show 4 9 bit numbers > >> $ #: 300 301 302 303 > >> 4 9 > >> > >> Writing those 4 numbers would take 16 bytes[1] since J ints are 4 > >> bytes as well as C > >> > >> We can compress it down to 5 bytes if we pack all the bits together: > >> > >> ] packed=. #. _8[\ , #: 300 301 302 303 > >> 150 75 101 210 240 > >> > >> $ packed > >> 5 > >> > >> Those 5 bytes could be written to a file and re-read > >> > >> If I knew in advance that those numbers were 9 bits, it can be reversed > >> > >> #. _9[\, #: packed > >> 300 301 302 303 0 > >> > >> We could also pack it into 8 bytes (two 32 bit integers), but I don't > >> see an advantage of that over packing into a single byte (8 bit) > >> > >> ] packed=. #. _32[\ , #: 300 301 302 303 > >> 2.52152e9 4.02653e9 > >> > >> #. _9[\, #: packed > >> 300 301 302 303 0 0 0 0 > >> > >> > >> This may be elementary for some but I haven't worked at this level in > >> years so I enjoyed playing with it > >> > >> > >> > >> > >> [1] - http://www.jsoftware.com/help/learning/27.htm > >> > >> On Wed, Dec 31, 2014 at 11:51 AM, 'Pascal Jasmin' via Programming > >> <[email protected]> wrote: > >> > Also, if you have only 4 19 bit values, you can pack them up tighter > >> > > >> > _2 (16&#.)\ #. |: ?. 4 19 $ 2 > >> > 109 59 161 30 136 80 64 129 56 1 > >> > > >> > Non multiples of 8 would just generally leave some trailing 0s on > >> decode, which could simply be understood as invalid. > >> > > >> > > >> > ----- Original Message ----- > >> > From: 'Pascal Jasmin' via Programming <[email protected]> > >> > To: "[email protected]" <[email protected]> > >> > Cc: > >> > Sent: Wednesday, December 31, 2014 11:40 AM > >> > Subject: Re: [Jprogramming] variable integer coding (challenge) > >> > > >> > base64 is essentially taking 3 bytes and storing them as 4. The > inverse > >> can take 4 6bit values and pack them into 3 bytes. The J code for > base64 > >> is pretty generic to see how to do it for any base. 7 bit values could > be > >> packed 8 at a time in 7 bytes. > >> > > >> > There is a different way to pack 8 x bit values into x bytes. Each > byte > >> would store the i'th bit of the 8 values. This may be even faster than > >> base64 type coding. > >> > > >> > #. |: ?. 8 19 $ 2 > >> > 104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28 > >> > > >> > > >> > ----- Original Message ----- > >> > From: Joe Bogner <[email protected]> > >> > To: [email protected] > >> > Cc: > >> > Sent: Wednesday, December 31, 2014 10:02 AM > >> > Subject: Re: [Jprogramming] variable integer coding (challenge) > >> > > >> > Can 19 bits be stored in memory or a file on modern computers without > >> > taking up 3 bytes (24 bits)? The only idea that came to mind would > some > >> bit > >> > packing scheme with multiple items. That seems complicated unless > there > >> was > >> > a need to store billions of these numbers > >> > > >> > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote: > >> > > >> >> Thanks. I missed the point of the variable breakup point but > exploring > >> >> different options below illustrated it for me. > >> >> > >> >> Changing the breakup point can encode the numbers into different bit > >> >> lengths (other than 8) which can then be written to the file/memory. > >> >> > >> >> For example, 1021 encoded in 511 breakups would take 18 bits > >> >> > >> >> $ #: 511 toVint 1021 > >> >> 2 9 > >> >> > >> >> Whereas encoding it as 255 would require 40 bits, which is more than > a > >> >> 32-bit int. This is where I was failing to see the point of encoding > >> >> larger numbers > >> >> > >> >> NB. 5 bytes, 40 bits > >> >> $ #: 255 toVint 1021 > >> >> 5 8 > >> >> > >> >> Without trying the binary representation, I was expecting the breakup > >> >> point to take the same space. For example, below I would have thought > >> >> that I'd need to write a > >> >> > >> >> NB. 6 bytes > >> >> 255 toVint 511 toVint 1021 > >> >> 255 255 1 255 255 0 > >> >> > >> >> > >> >> If I understand this correctly, I can see how variable ints could > >> >> shave off a few bits if the range of numbers is known in advance, > >> >> which is often the case. > >> >> > >> >> > >> >> Regarding packed strings I've been thinking about it recently as > well. > >> >> I would like to see how it could be used to pack a table of strings > >> >> (rows and columns). > >> >> > >> >> The following string could be a packed representation of a 2x2 table > >> >> > >> >> FrankUSATomGermany > >> >> > >> >> The binary format could be: > >> >> > >> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length].... > >> >> > >> >> The header for the string could be > >> >> 2 2 5 3 3 7 > >> >> > >> >> Or maybe if you wanted to support a variable number of columns > (unsure > >> >> of the usecase of this) > >> >> > >> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2 > length] > >> >> 2 2 5 3 2 3 7 > >> >> > >> >> I don't know well it'd perform relative to boxed tables or inverted > >> >> tables. I would imagine the memory would be reduced for storage but > >> >> I'd have to test to see how much memory would be used on different > >> >> operations. I'd be interested in the performance of operations like > >> >> searching (e.g. find all the rows with a country of Germany) > >> >> > >> >> It might also be a safer way of storing user-input text without > >> >> worrying about escaping delimiters. > >> >> > >> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming > >> >> <[email protected]> wrote: > >> >> > I also provided some adverbs > >> >> > > >> >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]' > >> >> > fromVint=: 1 : 'm&~: +/;.2 ]' > >> >> > > >> >> > these let you set the breakup point to what you wish > >> >> > > >> >> > 15 toVint 33 12 17 > >> >> > 15 15 3 12 15 2 > >> >> > 511 toVint 1233 > >> >> > 511 511 211 > >> >> > > >> >> > 511 fromVint 511 toVint 1233 444 > >> >> > 1233 444 > >> >> > It can provide simple compression. If you have a list of many > strings > >> >> where almost all of them will fit into a 6bit alphabet, you can still > >> >> encode a full 7 bit or 8bit alphabet. On average, you'd save space > >> using > >> >> 6bit alphabet, depending on your context. > >> >> > > >> >> > The variable string functions I showed, still let you do a > substring > >> >> search on the packed data. Part of it is disk and memory compression > >> >> (don't expand until you need to, and a string takes much less memory > >> than a > >> >> list of numbers < 256), but part of it is not being forced in design > to > >> >> determine what the impossible cutoff number is. > >> >> > > >> >> > A big advantage related to J is having binary data records. If you > >> >> encode the length of the binary data, you no longer have to box them > if > >> a > >> >> trailing 0 or space would be difficult to know whether it is a fill > or > >> >> intentional. > >> >> > > >> >> > Beyond binary data, there is dumb stuff such as storing addresses. > >> Most > >> >> people have just 1, Most fit within 255 bytes. Usually no one cares > >> about > >> >> address info except when they do or already have the client record. > Do > >> you > >> >> really want a separate table with 1 to many join with a possible > 10000 > >> >> character fixed size per address just because you're unsure if it > will > >> ever > >> >> be possible for you to store an address to an African village as a > set > >> of > >> >> driving instructions and then a description of the house relative to > 3 > >> >> other houses and landmarks just in case one of the other houses gets > >> >> painted. > >> >> > > >> >> > Another neat feature about the coding method is that if you have a > lot > >> >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at > least > >> >> compresses well. > >> >> > > >> >> > A minor thing is that the encoding still lets you do search by < > or >. > >> >> If you are storing the number of orders per customer. The first byte > >> tells > >> >> you whether it is > 254 or less than any other number below 255 > >> (probably > >> >> enough info to select customers of interest). Another minor thing is > >> that > >> >> you don't have to use the encoding codes until there is an > overflow. It > >> >> may be easier to just switch functions to decode variable data, then > to > >> >> reshape terrabytes of data. > >> >> > ________________________________ > >> >> > From: Joe Bogner <[email protected]> > >> >> > To: [email protected] > >> >> > Sent: Tuesday, December 30, 2014 10:45 AM > >> >> > Subject: Re: [Jprogramming] variable integer coding (challenge) > >> >> > > >> >> > > >> >> > Pascal, > >> >> > > >> >> > Thanks for sharing. Can you also share a potential use case? > >> >> > > >> >> > I had to remind myself about the size of integers (4 bytes) on > 32-bit. > >> >> > > >> >> > Will this only save space on numbers less than 1019? > >> >> > > >> >> > NB. 4 bytes > >> >> > toVint2 1019 > >> >> > 255 255 255 254 > >> >> > > >> >> > NB. 5 bytes > >> >> > toVint2 1020 > >> >> > 255 255 255 255 0 > >> >> > > >> >> > Are you looking for the savings in terms of disk space (saving it > to a > >> >> > file) or working memory? > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming > >> >> > <[email protected]> wrote: > >> >> >> A common design decision is how large to make a field. For many > >> >> numbers, we think that a byte will be large enough to hold practical > >> >> values, but one day we will find out that it is not. This is > >> essentially > >> >> the Y2k problem, or year 2032 issue. > >> >> >> > >> >> >> A simple variable integer coding (tied to byte ranges) > >> >> >> > >> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each > >> >> >> > >> >> >> toVint 566 44 > >> >> >> 255 255 56 44 > >> >> >> > >> >> >> It now takes 3 bytes to store 566, and one byte to store 44. > >> >> >> > >> >> >> the challenge is to write fromVint such that > >> >> >> > >> >> >> > >> >> >> 566 44 -: fromVint 255 255 56 44 > >> >> >> > >> >> >> as an extra challenge, is there a way to write toVint or fromVint > >> such > >> >> that it is faster than using intermediate boxing? > >> >> >> > >> ---------------------------------------------------------------------- > >> >> >> For information about J forums see > >> http://www.jsoftware.com/forums.htm > > > >> > > >> >> > > ---------------------------------------------------------------------- > >> >> > For information about J forums see > >> http://www.jsoftware.com/forums.htm > >> >> > > ---------------------------------------------------------------------- > >> >> > For information about J forums see > >> http://www.jsoftware.com/forums.htm > >> >> > >> > ---------------------------------------------------------------------- > >> > For information about J forums see > http://www.jsoftware.com/forums.htm > >> > ---------------------------------------------------------------------- > >> > For information about J forums see > http://www.jsoftware.com/forums.htm > >> > ---------------------------------------------------------------------- > >> > For information about J forums see > http://www.jsoftware.com/forums.htm > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > >> > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
