Re: [Jprogramming] variable integer coding (challenge)

Vijay Lulla Tue, 06 Jan 2015 11:45:22 -0800

Actually I used (32$2)#:_17998321 . I agree about the point of simplicity
but there are other reasons for using a simpler approach too.  I didn't
consider using padding/fills because calculations involving
latitudes/longitudes have peculiarities just like date/time
representations.  For e.g., calculating distance between points in
east/west and north/south hemispheres (
http://workshops.boundlessgeo.com/postgis-intro/geography.html).  This
comes up often in GIS related calculations, especially when every mapping
webservice is using lat/lon to store geographic data. And hence, I try to
find the simplest approach (human centric) even if it might not be the most
efficient (machine/task centric).


On Tue, Jan 6, 2015 at 1:43 PM, Raul Miller <[email protected]> wrote:

> The &.|. in ($!.1)&.|. is so the 1s you are padding with appear on the
> left, rather than on the right.
>
> Another way of accomplishing that would be like this:
>    _32 {.!.1 #:-17998321
> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>
> But note that there's a problem here (with both of these approaches):
>    32 ($!.1)&.|. 1
> 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> ...padding on the left with 1s is only valid for negative numbers.
>
> Meanwhile,
>    #: 256 #. |. a.i. 2 ic _17998321
> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>
> ... so the |. which appears in that expression works the same way on
> probably any intel machine and would probably be relevant in most
> contexts (though I've not tested on ARM processor, and I'm not even
> sure if current versions of J run on ARM... and since ARM is
> "bi-endian" I'm not sure what "native" would mean in that context -
> but I guess it would make sense for ARM handling to mimic intel
> handling).
>
> But there's an even simpler approach:
>    (32#2)#:_17998321
> 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
>
> Simplicity is a virtue?
>
> Thanks,
>
> --
> Raul
>
> On Tue, Jan 6, 2015 at 12:44 PM, 'Pascal Jasmin' via Programming
> <[email protected]> wrote:
> > I'm not sure this is the same problem, unless you are trying to encode
> within a range of lattitudes that include dense civilization, and
> longitudes that exclude oceans.
> >
> > but the 32bit binary rep of -17998321
> >
> > 32 ($!.1)&.|. #: -17998321
> > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
> >
> > though the "memcopy" version is simpler.
> >
> >   #: 256 #. |. a.i. 2 ic _17998321
> > 1 1 1 1 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0 1 1 1 1 0 0 0 0 0 1 1 1 1
> >
> >   -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic _17998321
> > _17998321
> >   -&4294967296^:(>&2147483648) 256 #. |. a.i. 2 ic 17998321
> > 17998321
> >
> >
> >  I'm not sure if the |. step is only for non macs.
> >
> >
> >
> > ----- Original Message -----
> > From: Vijay Lulla <[email protected]>
> > To: [email protected]
> > Cc:
> > Sent: Tuesday, January 6, 2015 11:38 AM
> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >
> > I have followed this thread with interest.  Much of this discussion is
> > relevant to
> >
> https://developers.google.com/maps/documentation/utilities/polylinealgorithm
> > . Like Joe, I also enjoyed playing at bits level and was impressed by how
> > easy J makes such explorations.  Unfortunately, I couldn't generate all
> the
> > encoded values in the example table listed on the page.  Maybe others in
> > this group might enjoy exploring this algorithm.
> >
> > Thanks for the great discussion.
> >
> > On Wed, Dec 31, 2014 at 3:15 PM, Joe Bogner <[email protected]> wrote:
> >
> >> Thanks. I couldn't follow the examples so I worked through it on my
> >> own. Sharing it here for anyone else who's following. I switched to 9
> >> bits instead of 19 to simplify the example
> >>
> >>    NB. 300 needs 9 bits to be expressed
> >>    $ #: 300
> >> 9
> >>
> >>    NB. Show 4 9 bit numbers
> >>    $ #: 300 301 302 303
> >> 4 9
> >>
> >> Writing those 4 numbers would take 16 bytes[1] since J ints are 4
> >> bytes as well as C
> >>
> >> We can compress it down to 5 bytes if we pack all the bits together:
> >>
> >>    ] packed=. #. _8[\ , #: 300 301 302 303
> >> 150 75 101 210 240
> >>
> >>    $ packed
> >> 5
> >>
> >> Those 5 bytes could be written to a file and re-read
> >>
> >> If I knew in advance that those numbers were 9 bits, it can be reversed
> >>
> >>    #.  _9[\, #: packed
> >> 300 301 302 303 0
> >>
> >> We could also pack it into 8 bytes (two 32 bit integers), but I don't
> >> see an advantage of that over packing into a single byte (8 bit)
> >>
> >>    ] packed=. #. _32[\ , #: 300 301 302 303
> >> 2.52152e9 4.02653e9
> >>
> >>    #.  _9[\, #: packed
> >> 300 301 302 303 0 0 0 0
> >>
> >>
> >> This may be elementary for some but I haven't worked at this level in
> >> years so I enjoyed playing with it
> >>
> >>
> >>
> >>
> >> [1] - http://www.jsoftware.com/help/learning/27.htm
> >>
> >> On Wed, Dec 31, 2014 at 11:51 AM, 'Pascal Jasmin' via Programming
> >> <[email protected]> wrote:
> >> > Also, if you have only 4 19 bit values, you can pack them up tighter
> >> >
> >> > _2 (16&#.)\ #. |: ?. 4 19 $ 2
> >> > 109 59 161 30 136 80 64 129 56 1
> >> >
> >> > Non multiples of 8 would just generally leave some trailing 0s on
> >> decode, which could simply be understood as invalid.
> >> >
> >> >
> >> > ----- Original Message -----
> >> > From: 'Pascal Jasmin' via Programming <[email protected]>
> >> > To: "[email protected]" <[email protected]>
> >> > Cc:
> >> > Sent: Wednesday, December 31, 2014 11:40 AM
> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >> >
> >> > base64 is essentially taking 3 bytes and storing them as 4.  The
> inverse
> >> can take 4 6bit values and pack them into 3 bytes.  The J code for
> base64
> >> is pretty generic to see how to do it for any base.  7 bit values could
> be
> >> packed 8 at a time in 7 bytes.
> >> >
> >> > There is a different way to pack 8 x bit values into x bytes.  Each
> byte
> >> would store the i'th bit of the 8 values.  This may be even faster than
> >> base64 type coding.
> >> >
> >> >   #. |: ?. 8 19 $ 2
> >> > 104 214 59 183 174 23 31 225 134 149 66 31 85 19 133 1 51 148 28
> >> >
> >> >
> >> > ----- Original Message -----
> >> > From: Joe Bogner <[email protected]>
> >> > To: [email protected]
> >> > Cc:
> >> > Sent: Wednesday, December 31, 2014 10:02 AM
> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >> >
> >> > Can 19 bits be stored in memory or a file on modern computers without
> >> > taking up 3 bytes (24 bits)? The only idea that came to mind would
> some
> >> bit
> >> > packing scheme with multiple items. That seems complicated unless
> there
> >> was
> >> > a need to store billions of these numbers
> >> >
> >> > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote:
> >> >
> >> >> Thanks. I missed the point of the variable breakup point but
> exploring
> >> >> different options below illustrated it for me.
> >> >>
> >> >> Changing the breakup point can encode the numbers into different bit
> >> >> lengths (other than 8) which can then be written to the file/memory.
> >> >>
> >> >> For example, 1021 encoded in 511 breakups would take 18 bits
> >> >>
> >> >>    $ #: 511 toVint 1021
> >> >> 2 9
> >> >>
> >> >> Whereas encoding it as 255 would require 40 bits, which is more than
> a
> >> >> 32-bit int. This is where I was failing to see the point of encoding
> >> >> larger numbers
> >> >>
> >> >> NB. 5 bytes, 40 bits
> >> >>     $ #: 255 toVint 1021
> >> >> 5 8
> >> >>
> >> >> Without trying the binary representation, I was expecting the breakup
> >> >> point to take the same space. For example, below I would have thought
> >> >> that I'd need to write a
> >> >>
> >> >> NB. 6 bytes
> >> >>    255 toVint 511 toVint 1021
> >> >> 255 255 1 255 255 0
> >> >>
> >> >>
> >> >> If I understand this correctly, I can see how variable ints could
> >> >> shave off a few bits if the range of numbers is known in advance,
> >> >> which is often the case.
> >> >>
> >> >>
> >> >> Regarding packed strings I've been thinking about it recently as
> well.
> >> >> I would like to see how it could be used to pack a table of strings
> >> >> (rows and columns).
> >> >>
> >> >> The following string could be a packed representation of a 2x2 table
> >> >>
> >> >> FrankUSATomGermany
> >> >>
> >> >> The binary format could be:
> >> >>
> >> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length]....
> >> >>
> >> >> The header for the string could be
> >> >> 2 2 5 3 3 7
> >> >>
> >> >> Or maybe if you wanted to support a variable number of columns
> (unsure
> >> >> of the usecase of this)
> >> >>
> >> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2
> length]
> >> >> 2 2 5 3 2 3 7
> >> >>
> >> >> I don't know well it'd perform relative to boxed tables or inverted
> >> >> tables. I would imagine the memory would be reduced for storage but
> >> >> I'd have to test to see how much memory would be used on different
> >> >> operations. I'd be interested in the performance of operations like
> >> >> searching (e.g. find all the rows with a country of Germany)
> >> >>
> >> >> It might also be a safer way of storing user-input text without
> >> >> worrying about escaping delimiters.
> >> >>
> >> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming
> >> >> <[email protected]> wrote:
> >> >> > I also provided some adverbs
> >> >> >
> >> >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]'
> >> >> > fromVint=: 1 : 'm&~: +/;.2 ]'
> >> >> >
> >> >> > these let you set the breakup point to what you wish
> >> >> >
> >> >> >   15 toVint 33 12 17
> >> >> > 15 15 3 12 15 2
> >> >> >   511 toVint 1233
> >> >> > 511 511 211
> >> >> >
> >> >> >   511 fromVint 511 toVint 1233 444
> >> >> > 1233 444
> >> >> > It can provide simple compression.  If you have a list of many
> strings
> >> >> where almost all of them will fit into a 6bit alphabet, you can still
> >> >> encode a full 7 bit or 8bit alphabet.  On average, you'd save space
> >> using
> >> >> 6bit alphabet, depending on your context.
> >> >> >
> >> >> > The variable string functions I showed, still let you do a
> substring
> >> >> search on the packed data.  Part of it is disk and memory compression
> >> >> (don't expand until you need to, and a string takes much less memory
> >> than a
> >> >> list of numbers < 256), but part of it is not being forced in design
> to
> >> >> determine what the impossible cutoff number is.
> >> >> >
> >> >> > A big advantage related to J is having binary data records.  If you
> >> >> encode the length of the binary data, you no longer have to box them
> if
> >> a
> >> >> trailing 0 or space would be difficult to know whether it is a fill
> or
> >> >> intentional.
> >> >> >
> >> >> > Beyond binary data, there is dumb stuff such as storing addresses.
> >> Most
> >> >> people have just 1,  Most fit within 255 bytes.  Usually no one cares
> >> about
> >> >> address info except when they do or already have the client record.
> Do
> >> you
> >> >> really want a separate table with 1 to many join with a possible
> 10000
> >> >> character fixed size per address just because you're unsure if it
> will
> >> ever
> >> >> be possible for you to store an address to an African village as a
> set
> >> of
> >> >> driving instructions and then a description of the house relative to
> 3
> >> >> other houses and landmarks just in case one of the other houses gets
> >> >> painted.
> >> >> >
> >> >> > Another neat feature about the coding method is that if you have a
> lot
> >> >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at
> least
> >> >> compresses well.
> >> >> >
> >> >> > A minor thing is that the encoding still lets you do search by <
> or >.
> >> >> If you are storing the number of orders per customer.  The first byte
> >> tells
> >> >> you whether it is > 254 or less than any other number below 255
> >> (probably
> >> >> enough info to select customers of interest).  Another minor thing is
> >> that
> >> >> you don't have to use the encoding codes until there is an
> overflow.  It
> >> >> may be easier to just switch functions to decode variable data, then
> to
> >> >> reshape terrabytes of data.
> >> >> > ________________________________
> >> >> > From: Joe Bogner <[email protected]>
> >> >> > To: [email protected]
> >> >> > Sent: Tuesday, December 30, 2014 10:45 AM
> >> >> > Subject: Re: [Jprogramming] variable integer coding (challenge)
> >> >> >
> >> >> >
> >> >> > Pascal,
> >> >> >
> >> >> > Thanks for sharing. Can you also share a potential use case?
> >> >> >
> >> >> > I had to remind myself about the size of integers (4 bytes) on
> 32-bit.
> >> >> >
> >> >> > Will this only save space on numbers less than 1019?
> >> >> >
> >> >> > NB. 4 bytes
> >> >> > toVint2 1019
> >> >> > 255 255 255 254
> >> >> >
> >> >> > NB. 5 bytes
> >> >> > toVint2 1020
> >> >> > 255 255 255 255 0
> >> >> >
> >> >> > Are you looking for the savings in terms of disk space (saving it
> to a
> >> >> > file) or working memory?
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming
> >> >> > <[email protected]> wrote:
> >> >> >> A common design decision is how large to make a field.  For many
> >> >> numbers, we think that a byte will be large enough to hold practical
> >> >> values, but one day we will find out that it is not.  This is
> >> essentially
> >> >> the Y2k problem, or year 2032 issue.
> >> >> >>
> >> >> >> A simple variable integer coding (tied to byte ranges)
> >> >> >>
> >> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each
> >> >> >>
> >> >> >>   toVint 566 44
> >> >> >> 255 255 56 44
> >> >> >>
> >> >> >> It now takes 3 bytes to store 566, and one byte to store 44.
> >> >> >>
> >> >> >> the challenge is to write fromVint such that
> >> >> >>
> >> >> >>
> >> >> >> 566 44 -: fromVint 255 255 56 44
> >> >> >>
> >> >> >> as an extra challenge, is there a way to write toVint or fromVint
> >> such
> >> >> that it is faster than using intermediate boxing?
> >> >> >>
> >> ----------------------------------------------------------------------
> >> >> >> For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >
> >> >
> >> >> >
> ----------------------------------------------------------------------
> >> >> > For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> >> >
> ----------------------------------------------------------------------
> >> >> > For information about J forums see
> >> http://www.jsoftware.com/forums.htm
> >> >>
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >> > ----------------------------------------------------------------------
> >> > For information about J forums see
> http://www.jsoftware.com/forums.htm
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] variable integer coding (challenge)

Reply via email to