Re: mysterious setting of B_DIRECT?
On Thu, Apr 25, 2024 at 8:51 PM Rick Macklem wrote: > > On Thu, Apr 25, 2024 at 8:09 PM Konstantin Belousov wrote: > > > > On Thu, Apr 25, 2024 at 07:49:23PM -0700, Rick Macklem wrote: > > > Hi, > > > > > > This week I have been doing active testing as a part of an IETF > > > bakeathon for NFSv4. During the week I had a NFSv4 client > > > crash. On the surface, it is straightforward, in that it called > > > ncl_doio_directwrite() and the field called b_caller1 was NULL. > > > > > > Now, here's the weird part... > > > ncl_doio_directwrite() should never be called because B_DIRECT > > > should never be set. (The only place B_DIRECT gets set in the code > > > is never currently executed.) > > Do you mean the place in nfs_directio_write()? And the fact that > > IO_SYNC is always set. > Yes. > > > > > > > > > I have a patch that clears out the "never to be executed" code and > > > this seems to avoid the patch, since with the patch, > > > ncl_doio_directwrite() > > > no longer exists. > > > > > > What I cannot figure out is how B_DIRECT got set? > > > I can note that UFS was under heavy load when the client crashed, > > > but I cannot see how a UFS "struct buf" would become a NFS "struct buf" > > > without b_flags being set to 0. > > > > There are also vfs_bio_brelse()/vfs_bio_setflags() functions which can > > set B_DIRECT. On the other hand, they are not used by nfs client. > Yes, again. > > > > > What was the overall state of the buffer with the B_DIRECT flag? Which > > vnode it was assigned to? > Unfortunately I was in a hurry and didn't get more info. > And, since I have never seen this crash before, I doubt I'll be able > to reproduce it. Oh, and I will put the cleanup patch on phabricator. I didn't see the crash again during a few days of testing with the patch. This makes sense, since it gets rid of ncl_doio_directwrite(). > > Thanks, rick
Re: mysterious setting of B_DIRECT?
On Thu, Apr 25, 2024 at 8:09 PM Konstantin Belousov wrote: > > On Thu, Apr 25, 2024 at 07:49:23PM -0700, Rick Macklem wrote: > > Hi, > > > > This week I have been doing active testing as a part of an IETF > > bakeathon for NFSv4. During the week I had a NFSv4 client > > crash. On the surface, it is straightforward, in that it called > > ncl_doio_directwrite() and the field called b_caller1 was NULL. > > > > Now, here's the weird part... > > ncl_doio_directwrite() should never be called because B_DIRECT > > should never be set. (The only place B_DIRECT gets set in the code > > is never currently executed.) > Do you mean the place in nfs_directio_write()? And the fact that > IO_SYNC is always set. Yes. > > > > > I have a patch that clears out the "never to be executed" code and > > this seems to avoid the patch, since with the patch, ncl_doio_directwrite() > > no longer exists. > > > > What I cannot figure out is how B_DIRECT got set? > > I can note that UFS was under heavy load when the client crashed, > > but I cannot see how a UFS "struct buf" would become a NFS "struct buf" > > without b_flags being set to 0. > > There are also vfs_bio_brelse()/vfs_bio_setflags() functions which can > set B_DIRECT. On the other hand, they are not used by nfs client. Yes, again. > > What was the overall state of the buffer with the B_DIRECT flag? Which > vnode it was assigned to? Unfortunately I was in a hurry and didn't get more info. And, since I have never seen this crash before, I doubt I'll be able to reproduce it. Thanks, rick
Re: mysterious setting of B_DIRECT?
On Thu, Apr 25, 2024 at 07:49:23PM -0700, Rick Macklem wrote: > Hi, > > This week I have been doing active testing as a part of an IETF > bakeathon for NFSv4. During the week I had a NFSv4 client > crash. On the surface, it is straightforward, in that it called > ncl_doio_directwrite() and the field called b_caller1 was NULL. > > Now, here's the weird part... > ncl_doio_directwrite() should never be called because B_DIRECT > should never be set. (The only place B_DIRECT gets set in the code > is never currently executed.) Do you mean the place in nfs_directio_write()? And the fact that IO_SYNC is always set. > > I have a patch that clears out the "never to be executed" code and > this seems to avoid the patch, since with the patch, ncl_doio_directwrite() > no longer exists. > > What I cannot figure out is how B_DIRECT got set? > I can note that UFS was under heavy load when the client crashed, > but I cannot see how a UFS "struct buf" would become a NFS "struct buf" > without b_flags being set to 0. There are also vfs_bio_brelse()/vfs_bio_setflags() functions which can set B_DIRECT. On the other hand, they are not used by nfs client. What was the overall state of the buffer with the B_DIRECT flag? Which vnode it was assigned to?
mysterious setting of B_DIRECT?
Hi, This week I have been doing active testing as a part of an IETF bakeathon for NFSv4. During the week I had a NFSv4 client crash. On the surface, it is straightforward, in that it called ncl_doio_directwrite() and the field called b_caller1 was NULL. Now, here's the weird part... ncl_doio_directwrite() should never be called because B_DIRECT should never be set. (The only place B_DIRECT gets set in the code is never currently executed.) I have a patch that clears out the "never to be executed" code and this seems to avoid the patch, since with the patch, ncl_doio_directwrite() no longer exists. What I cannot figure out is how B_DIRECT got set? I can note that UFS was under heavy load when the client crashed, but I cannot see how a UFS "struct buf" would become a NFS "struct buf" without b_flags being set to 0. Anyone have any ideas? rick
Re: serial/ulscom: response timeout using pySerial/esptool.py
Can you isolate out the extraneous stuff and loop tx and rx on a CP2101 board and send bytes through? I did a bunch of development on an esp8266 board in the last few weeks and had no issues, but I’ve no idea if it were the same usb serial chip. I’ll have a dig around and see if I have something matching On Thu, Apr 25, 2024, at 20:17, FreeBSD User wrote: > Hello, > > Host: 15.0-CURRENT FreeBSD 15.0-CURRENT #36 main-n269703-54c3aa02e926: > Thu Apr 25 18:48:56 > CEST 2024 amd64 or 14-STABLE recently compiled (dmesg/uname not at > hand). > > Hardware: oldish Z77Pro 4 based Asrock mainboard, a Lenovo T560 > notebook, Fujitsu Esprimo Q5XX > (simple desktop, Pentium Gold) or an oldish Fujitsu Celsius 7XX > workstation, 6 core Haswell > XEON. > > Phenomenon: a couple of weeks now I try to connect to several Xtensa > ESP32 dev boards > (ESP32-WROOM32 with CP2101 or CP2104 UART) via comms/py-esptool > (doesn't matter whether it is > tho port's py39-esptool 4.5 or the latest py-esptool 4.7.0, doesn't > matter whether pkg package > or self compiled on CURRENT and 14-STABLE, on all hardware platforms > same result). > > Attaching the ESP devel module via Micro USB cable (several type, > differnt vendors tried ...) > show > > dmesg: > [...] > ugen0.4: at usbus0 > uslcom0 on uhub3 > uslcom0: rev 1.10/1.00, addr 4> > on usbus0 > [...] > > When trying to connect to the ESP32 via below shown command (--trace > not every time issued), I > get no connection: > > [ohartmann]: esptool.py --trace --chip esp32 --baud 115200 --port > /dev/cuaU1 flash_id > esptool.py v4.7.0 > Loaded custom configuration from /pool/home/ohartmann/esptool.cfg > Serial port /dev/cuaU1 > Connecting...TRACE +0.000 command op=0x08 data len=36 wait_response=1 > timeout=0.100 data= > 07071220 | ... > | > | > TRACE +0.000 Write 46 bytes: > c824 000707122055 | ...$ UUU > | > 55c0 | U. > TRACE +0.102 No serial data received. > TRACE +0.052 command op=0x08 data len=36 wait_response=1 timeout=0.100 > data= > 07071220 | ... > | > | > TRACE +0.000 Write 46 bytes: > c824 000707122055 | ...$ UUU > | > 55c0 | U. > TRACE +0.107 No serial data received. > TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 > data= > 07071220 | ... > | > | > TRACE +0.000 Write 46 bytes: > c824 000707122055 | ...$ UUU > | > 55c0 | U. > TRACE +0.107 No serial data received. > TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 > data= > 07071220 | ... > | > | > TRACE +0.000 Write 46 bytes: > c824 000707122055 | ...$ UUU > | > 55c0 | U. > > > A serial exception error occurred: device reports readiness to read but > returned no data > (device disconnected or multiple access on port?) Note: This error > originates from pySerial. > It is likely not a problem with esptool, but with the hardware > connection or drivers. For > troubleshooting steps visit: > https://docs.espressif.com/projects/esptool/en/latest/troubleshooting.html > [...] > > > Whatever baud rate issued, in most cases on all tested OS versions and > almost all hardware > platforms except the Fujistu Celsius 7XX (2015 model) I do not get any > connection! And it get > more weird: To avoid out-of-sync-software I recompiled everything via > "portmaster -df > comms/py-pyserial comms/py-esptool" and after that, everything was > fine, the connection was > made, I got results out of the chip. Seconds later same problems. > > I exchanged cablings, exchanged the ESP32 model and vendor. Invariants > are 14-STABLE, daily > compiled, CURRENT. daily compiled. On my private box (old Z77 based > IvyBridge ASRock crap), a > couple of Lenovo T560 running 14-STABLE and several Fujitsu Esprimo > Q5XX boxes there is always > this weird error message, but in very rare cases I get connection. > > Only exception: the Fujsitus Celsius 7XX workstation (14-STABLE, last > complied today noon). No > matter what ESP32, no mat
Re: serial/ulscom: response timeout using pySerial/esptool.py
CP2102 are pretty good ones and never let me down :-) Is your UART connection to ESP32 working correctly? Can you see the boot message and whatever happens next in terminal (cu / minicom)? Are RX TX pins not swapped? Power supply okay? Are boards generic devkits of custom hardware? ESP32 in addition to RX TX needs two control lines Reset and Boot that will switch the chip to bootloader / flashing mode. Most USB-to-UART use RTS/CTS lines for that. Are you sure these lines are available on your board and connected to the target correctly? Do you have Reset and Boot buttons on the board so you could trigger bootloader by hand (hold Boot, press and release Reset, device will be in bootloader upload mode, retry esptool flashing now). You can also play with the buttons with active terminal attached (i.e. minicom) to see if they work as expected. Minicom serial terminal is pretty cool as it allows you to watch UART behavior on adapter (un)plug. In minicom you can also enable/disable hardware flow control lines (Ctrl+A O -> Serial Port Setup -> (F) Hardware Flow Control). You can switch that easily and watch the target behavior. If this is the problem you may want to use stty (1) to enable/disable hardware flow control on the port. Can you try with another board? ESP32 has fuses that may permanently disable and/or mess up some hardware features. -- CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
serial/ulscom: response timeout using pySerial/esptool.py
Hello, Host: 15.0-CURRENT FreeBSD 15.0-CURRENT #36 main-n269703-54c3aa02e926: Thu Apr 25 18:48:56 CEST 2024 amd64 or 14-STABLE recently compiled (dmesg/uname not at hand). Hardware: oldish Z77Pro 4 based Asrock mainboard, a Lenovo T560 notebook, Fujitsu Esprimo Q5XX (simple desktop, Pentium Gold) or an oldish Fujitsu Celsius 7XX workstation, 6 core Haswell XEON. Phenomenon: a couple of weeks now I try to connect to several Xtensa ESP32 dev boards (ESP32-WROOM32 with CP2101 or CP2104 UART) via comms/py-esptool (doesn't matter whether it is tho port's py39-esptool 4.5 or the latest py-esptool 4.7.0, doesn't matter whether pkg package or self compiled on CURRENT and 14-STABLE, on all hardware platforms same result). Attaching the ESP devel module via Micro USB cable (several type, differnt vendors tried ...) show dmesg: [...] ugen0.4: at usbus0 uslcom0 on uhub3 uslcom0: on usbus0 [...] When trying to connect to the ESP32 via below shown command (--trace not every time issued), I get no connection: [ohartmann]: esptool.py --trace --chip esp32 --baud 115200 --port /dev/cuaU1 flash_id esptool.py v4.7.0 Loaded custom configuration from /pool/home/ohartmann/esptool.cfg Serial port /dev/cuaU1 Connecting...TRACE +0.000 command op=0x08 data len=36 wait_response=1 timeout=0.100 data= 07071220 | ... | | TRACE +0.000 Write 46 bytes: c824 000707122055 | ...$ UUU | 55c0 | U. TRACE +0.102 No serial data received. TRACE +0.052 command op=0x08 data len=36 wait_response=1 timeout=0.100 data= 07071220 | ... | | TRACE +0.000 Write 46 bytes: c824 000707122055 | ...$ UUU | 55c0 | U. TRACE +0.107 No serial data received. TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 data= 07071220 | ... | | TRACE +0.000 Write 46 bytes: c824 000707122055 | ...$ UUU | 55c0 | U. TRACE +0.107 No serial data received. TRACE +0.054 command op=0x08 data len=36 wait_response=1 timeout=0.100 data= 07071220 | ... | | TRACE +0.000 Write 46 bytes: c824 000707122055 | ...$ UUU | 55c0 | U. A serial exception error occurred: device reports readiness to read but returned no data (device disconnected or multiple access on port?) Note: This error originates from pySerial. It is likely not a problem with esptool, but with the hardware connection or drivers. For troubleshooting steps visit: https://docs.espressif.com/projects/esptool/en/latest/troubleshooting.html [...] Whatever baud rate issued, in most cases on all tested OS versions and almost all hardware platforms except the Fujistu Celsius 7XX (2015 model) I do not get any connection! And it get more weird: To avoid out-of-sync-software I recompiled everything via "portmaster -df comms/py-pyserial comms/py-esptool" and after that, everything was fine, the connection was made, I got results out of the chip. Seconds later same problems. I exchanged cablings, exchanged the ESP32 model and vendor. Invariants are 14-STABLE, daily compiled, CURRENT. daily compiled. On my private box (old Z77 based IvyBridge ASRock crap), a couple of Lenovo T560 running 14-STABLE and several Fujitsu Esprimo Q5XX boxes there is always this weird error message, but in very rare cases I get connection. Only exception: the Fujsitus Celsius 7XX workstation (14-STABLE, last complied today noon). No matter what ESP32, no matter what vendor, no matter what cablin used: connection is established at any BAUD rate issued at any time. Not one single failure as shown above in any session (I checked several tenth times)! Now I'm out of ideas and I suspect the CP210X ulscom serial driver to have trouble with most onboard serial chipsets. Can anyone help me track down this issue? Is there anything I could have missed? I drives me nuts ... Thanks in advance, Oliver -- O. Hartmann