bug#41518: Bug in od?
On Mai 29 2020, Yuan Cao wrote: > It just feels strange because the order does not reflect the order of the > characters in the file. But that's not true. It reflects exactly how 2-byte numbers are stored in memory on your system. If you want to make a connection with characters, you need to think about UCS-2 characters. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
bug#41518: Bug in od?
Yuan Cao wrote: > > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e > > Thanks for pointing me to this documentation. > > It just feels strange because the order does not reflect the order of the > characters in the file. It feels strange in the environment *today*. But in the 1970's when the 'od' was written it was perfectly natural on the PDP-11 to print out the native machine word in the *native word order* of the PDP-11. During that time most software operated on the native architecture and the idea of being portable to other systems was not yet common. The PDP-11 is a 16-bit word machine. Therefore what you are seeing with the 2-byte integer and the order it is printed is the order that it was printed on the PDP-11 system. And has remained unchanged to the present day. Because it can't change without breaking all historical use. For anyone using od today the best way to use -x is -tx1 which prints bytes in a portable order. Whenever you think to type in -x use -tx1 instead. This avoids breaking historical use and produces the output that you are wanting. > I think it might have been useful to get the "by word" value of the file if > you are working with a binary file historically. One might have stored some > data as a list of shorts. Then, we can easily view the data using "od -x > data_file_name". > > Since memory is so cheap now, people are probably using just using chars > for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints > (shorts) before. In this case, the "by word" order does not seem to me to > be as useful and violates the principle of least astonishment needlessly. But changing the use of options to a command is a hard problem and cannot be done without breaking a lot of use of it. The better way is not to try. The options to head and tail changed an eon ago and yet just in the last week I ran across a posting where the option change bit someone in the usage change. And since there is no need for any breaking change it is better not to do it. Simply use the correct options for what you want. -tx1 in this case. > It might be interesting to change the option to print values by double word > or quadword instead or add another option to let the users choose to print > by double word or quadword if they want. And the size of 16-bits was a good value for a yester-year. 32-bits has been a good size for some years. Now 64-bits is the most common size. The only way to win is not to play. Better to say the size explicitly. And IMNHO the best size is 1 regardless of architecture. od -Ax -tx1z -v Each of those options have been added over the years and each changes the behavior of the program. Each of those would be a breaking change if they were made the default. Best to ask for what you want explicitly. I strongly recommend https://www.ietf.org/rfc/ien/ien137.txt as required reading. Bob
bug#41518: Bug in od?
On Fri, May 29, 2020 at 1:20 AM Bob Proulx wrote: > A little more information. > > Pádraig Brady wrote: > > Yuan Cao wrote: > > > I recently came across the following behavior. > > > > > > When using "--traditional x2" or "-x" option, it seems the order of hex > > > code output for the characters is pairwise reversed (if that's the > correct > > > way of describing it). > > ‘-x’ > Output as hexadecimal two-byte units. Equivalent to ‘-t x2’. > > Outputs 16-bit integers in the *native byte order* of the machine. > Which may be either big-endian or little-endian depending on the > machine. Not portable. Depends upon the machine it is run upon. > > > If you want to hexdump independently of endianess you can: > > > > od -Ax -tx1z -v > > The -tx1 option above is portable because it outputs 1-byte units > instead of 2-byte units which is independent of endianess. > > This is the FAQ entry for this topic. > > > https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e > > Bob > Thanks for pointing me to this documentation. It just feels strange because the order does not reflect the order of the characters in the file. I think it might have been useful to get the "by word" value of the file if you are working with a binary file historically. One might have stored some data as a list of shorts. Then, we can easily view the data using "od -x data_file_name". Since memory is so cheap now, people are probably using just using chars for text, and 4 byte ints or 8 byte ints where they used to use 2 byte ints (shorts) before. In this case, the "by word" order does not seem to me to be as useful and violates the principle of least astonishment needlessly. It might be interesting to change the option to print values by double word or quadword instead or add another option to let the users choose to print by double word or quadword if they want. Best Regards, Yuan
bug#41518: Bug in od?
A little more information. Pádraig Brady wrote: > Yuan Cao wrote: > > I recently came across the following behavior. > > > > When using "--traditional x2" or "-x" option, it seems the order of hex > > code output for the characters is pairwise reversed (if that's the correct > > way of describing it). ‘-x’ Output as hexadecimal two-byte units. Equivalent to ‘-t x2’. Outputs 16-bit integers in the *native byte order* of the machine. Which may be either big-endian or little-endian depending on the machine. Not portable. Depends upon the machine it is run upon. > If you want to hexdump independently of endianess you can: > > od -Ax -tx1z -v The -tx1 option above is portable because it outputs 1-byte units instead of 2-byte units which is independent of endianess. This is the FAQ entry for this topic. https://www.gnu.org/software/coreutils/faq/coreutils-faq.html#The-_0027od-_002dx_0027-command-prints-bytes-in-the-wrong-order_002e Bob
bug#41518: Bug in od?
tag 41518 notabug close 41518 stop response below... On 25/05/2020 04:05, Yuan Cao wrote: Hello, I recently came across the following behavior. When using "--traditional x2" or "-x" option, it seems the order of hex code output for the characters is pairwise reversed (if that's the correct way of describing it). For example, using "od -cx" on a test file that contains "123456789\n", you get the following output: 000 1 2 3 4 5 6 7 8 9 0 \n 3231 3433 3635 3837 3039 000a 013 It seems like it should be the following instead: 000 1 2 3 4 5 6 7 8 9 0 \n 3132 3334 3536 3738 3930 0a00 013 The version involved is od in GNU coreutils 8.28. That's because you're on a little endian machine. If you want to reorder as per a big endian machine you can: od --endian=big -cx your_file If you want to hexdump independently of endianess you can: od -Ax -tx1z -v cheers, Pádraig
bug#41518: Bug in od?
Hello, I recently came across the following behavior. When using "--traditional x2" or "-x" option, it seems the order of hex code output for the characters is pairwise reversed (if that's the correct way of describing it). For example, using "od -cx" on a test file that contains "123456789\n", you get the following output: 000 1 2 3 4 5 6 7 8 9 0 \n 3231 3433 3635 3837 3039 000a 013 It seems like it should be the following instead: 000 1 2 3 4 5 6 7 8 9 0 \n 3132 3334 3536 3738 3930 0a00 013 The version involved is od in GNU coreutils 8.28. Best Regards, Yuan