The answer to that is contextual. But on intel architectures, it's often going to be: no.
Thanks, -- Raul On Wed, Dec 31, 2014 at 10:02 AM, Joe Bogner <[email protected]> wrote: > Can 19 bits be stored in memory or a file on modern computers without > taking up 3 bytes (24 bits)? The only idea that came to mind would some bit > packing scheme with multiple items. That seems complicated unless there was > a need to store billions of these numbers > On Dec 31, 2014 9:28 AM, "Joe Bogner" <[email protected]> wrote: > >> Thanks. I missed the point of the variable breakup point but exploring >> different options below illustrated it for me. >> >> Changing the breakup point can encode the numbers into different bit >> lengths (other than 8) which can then be written to the file/memory. >> >> For example, 1021 encoded in 511 breakups would take 18 bits >> >> $ #: 511 toVint 1021 >> 2 9 >> >> Whereas encoding it as 255 would require 40 bits, which is more than a >> 32-bit int. This is where I was failing to see the point of encoding >> larger numbers >> >> NB. 5 bytes, 40 bits >> $ #: 255 toVint 1021 >> 5 8 >> >> Without trying the binary representation, I was expecting the breakup >> point to take the same space. For example, below I would have thought >> that I'd need to write a >> >> NB. 6 bytes >> 255 toVint 511 toVint 1021 >> 255 255 1 255 255 0 >> >> >> If I understand this correctly, I can see how variable ints could >> shave off a few bits if the range of numbers is known in advance, >> which is often the case. >> >> >> Regarding packed strings I've been thinking about it recently as well. >> I would like to see how it could be used to pack a table of strings >> (rows and columns). >> >> The following string could be a packed representation of a 2x2 table >> >> FrankUSATomGermany >> >> The binary format could be: >> >> [# of rows] [# of columns] [row1-col1 length] [row1-col2 length].... >> >> The header for the string could be >> 2 2 5 3 3 7 >> >> Or maybe if you wanted to support a variable number of columns (unsure >> of the usecase of this) >> >> [# of rows] [row-1 # of columns] [row1-col1 length] [row1-col2 length] >> 2 2 5 3 2 3 7 >> >> I don't know well it'd perform relative to boxed tables or inverted >> tables. I would imagine the memory would be reduced for storage but >> I'd have to test to see how much memory would be used on different >> operations. I'd be interested in the performance of operations like >> searching (e.g. find all the rows with a country of Germany) >> >> It might also be a safer way of storing user-input text without >> worrying about escaping delimiters. >> >> On Tue, Dec 30, 2014 at 1:18 PM, 'Pascal Jasmin' via Programming >> <[email protected]> wrote: >> > I also provided some adverbs >> > >> > toVint =: 1 : '[: (({."1 ,. 1:) #&, m ,. {:"1) (0, m) #: ]' >> > fromVint=: 1 : 'm&~: +/;.2 ]' >> > >> > these let you set the breakup point to what you wish >> > >> > 15 toVint 33 12 17 >> > 15 15 3 12 15 2 >> > 511 toVint 1233 >> > 511 511 211 >> > >> > 511 fromVint 511 toVint 1233 444 >> > 1233 444 >> > It can provide simple compression. If you have a list of many strings >> where almost all of them will fit into a 6bit alphabet, you can still >> encode a full 7 bit or 8bit alphabet. On average, you'd save space using >> 6bit alphabet, depending on your context. >> > >> > The variable string functions I showed, still let you do a substring >> search on the packed data. Part of it is disk and memory compression >> (don't expand until you need to, and a string takes much less memory than a >> list of numbers < 256), but part of it is not being forced in design to >> determine what the impossible cutoff number is. >> > >> > A big advantage related to J is having binary data records. If you >> encode the length of the binary data, you no longer have to box them if a >> trailing 0 or space would be difficult to know whether it is a fill or >> intentional. >> > >> > Beyond binary data, there is dumb stuff such as storing addresses. Most >> people have just 1, Most fit within 255 bytes. Usually no one cares about >> address info except when they do or already have the client record. Do you >> really want a separate table with 1 to many join with a possible 10000 >> character fixed size per address just because you're unsure if it will ever >> be possible for you to store an address to an African village as a set of >> driving instructions and then a description of the house relative to 3 >> other houses and landmarks just in case one of the other houses gets >> painted. >> > >> > Another neat feature about the coding method is that if you have a lot >> of inefficient codes, ie. storing 1020 as 255 255 255 255 0, it at least >> compresses well. >> > >> > A minor thing is that the encoding still lets you do search by < or >. >> If you are storing the number of orders per customer. The first byte tells >> you whether it is > 254 or less than any other number below 255 (probably >> enough info to select customers of interest). Another minor thing is that >> you don't have to use the encoding codes until there is an overflow. It >> may be easier to just switch functions to decode variable data, then to >> reshape terrabytes of data. >> > ________________________________ >> > From: Joe Bogner <[email protected]> >> > To: [email protected] >> > Sent: Tuesday, December 30, 2014 10:45 AM >> > Subject: Re: [Jprogramming] variable integer coding (challenge) >> > >> > >> > Pascal, >> > >> > Thanks for sharing. Can you also share a potential use case? >> > >> > I had to remind myself about the size of integers (4 bytes) on 32-bit. >> > >> > Will this only save space on numbers less than 1019? >> > >> > NB. 4 bytes >> > toVint2 1019 >> > 255 255 255 254 >> > >> > NB. 5 bytes >> > toVint2 1020 >> > 255 255 255 255 0 >> > >> > Are you looking for the savings in terms of disk space (saving it to a >> > file) or working memory? >> > >> > >> > >> > >> > >> > >> > On Mon, Dec 29, 2014 at 12:12 PM, 'Pascal Jasmin' via Programming >> > <[email protected]> wrote: >> >> A common design decision is how large to make a field. For many >> numbers, we think that a byte will be large enough to hold practical >> values, but one day we will find out that it is not. This is essentially >> the Y2k problem, or year 2032 issue. >> >> >> >> A simple variable integer coding (tied to byte ranges) >> >> >> >> toVint =: [: ; (255&| ,~ 255 #~ <.@%&255) each >> >> >> >> toVint 566 44 >> >> 255 255 56 44 >> >> >> >> It now takes 3 bytes to store 566, and one byte to store 44. >> >> >> >> the challenge is to write fromVint such that >> >> >> >> >> >> 566 44 -: fromVint 255 255 56 44 >> >> >> >> as an extra challenge, is there a way to write toVint or fromVint such >> that it is faster than using intermediate boxing? >> >> ---------------------------------------------------------------------- >> >> For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- >> > For information about J forums see http://www.jsoftware.com/forums.htm >> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
