Re: [gentoo-user] extracting text, numbers from screencasts
On 07/05/2016 16:31, hw wrote: > Helmut Jarausch schrieb: >> On 04/08/2016 03:26:53 PM, hw wrote: >>> >>> Hi, >>> >>> what would be the best approach to extract data >>> from a screencast? >>> >>> The task is to acquire some data from the display of >>> a GUI program used interactively by a user. There are >>> a couple 'fields' (as in "designated areas of the display") >>> in which the relevant data is being displayed while the >>> program is being used. The acquired data needs to be >>> entered into a mysql database, preferably as soon as >>> possible. (The program needs windoze, and the sources >>> are unavailable :( ) >>> >>> >>> The idea is to make a screen recording and postprocess >>> the recording with some sort of OCR software. This might >>> require using ffmpeg (or the like) to create a single >>> image from each frame of the recording; then treat each >>> image with an OCR software to get the interesting data >>> which can then be entered into the database. >>> >>> Data to extract is mostly numbers. The relevant fields >>> can be expected to be either filled or empty. The FPS rate >>> of the recording can be kept reasonably low, like 1 FPS, >>> or perhaps even less, depending on how frequent the relevant >>> fields change. >>> >>> Using tesseract comes to mind, but after reading that >>> >>> "Tesseract's output will be very poor quality if the input >>> images are not preprocessed to suit it: Images (especially >>> screenshots) must be scaled up such that the text x-height >>> is at least 20 pixels,[12] any rotation or skew must be >>> corrected or no text will be recognized, low-frequency >>> changes in brightness must be high-pass filtered, or >>> Tesseract's binarization stage will destroy much of the >>> page, and dark borders must be manually removed, or they >>> will be misinterpreted as characters."[1] >>> >>> I'm even more doubtful that this would produce usable >>> results with sufficient reliability. >>> >>> So what might be the best way to get text/numbers out of >>> what a program displays? >>> >>> >>> [1]: https://en.wikipedia.org/wiki/Tesseract_(software) >>> >> >> I can't help with Gentoo. >> Try to find an old (free) version of FineReader which runs under wine. >> If you do it only occasionally, transfer the image to an Android phone >> where there a good and cheap OCR apps, even FineReader. > > It would be too much video to process. Besides, phones are > ok for making phone calls and entirely incompatible with > computers, which makes them useless for anything else but > making phone calls. Huh? da fuck you talkin' 'bout? My trusty collection of Android devices would be very surprised to hear they now don't have real CPUs, wifi chips, RAM and storage. Or can't run a web browser, do email, instant chat, play x264 video with less cpu load than my 8 core laptop, share with smb on the network, do bluetooth, video calls or any of the other bazzillion things computers have always done with each other. How odd. I really thought my Android phones could do all of that. I must have imagined it that means my delusions are worse than I thought and maybe I need different and more pills from the nice lady who's my GP. -- Alan McKinnon alan.mckin...@gmail.com
Re: how to share a directory tree with files in it with multiple users (Re: [gentoo-user] local shared directory)
On 07/05/2016 17:12, hw wrote: > Michael Orlitzky schrieb: >> On 04/23/2016 10:42 AM, hw wrote: >>> >>> Has it become entirely impossible to share a directory tree and the >>> files in it with multiple users when Linux is involved? This should be >>> a very simple thing to accomplish. >>> >> >> It was never possible. It's ridiculous, but there it is. The UNIX >> permissions model is too simple. ACLs were bolted on top, but most tools >> retain legacy behavior with respect to group masks that breaks default >> ACLs. You're seeing that same problem with your Samba share. >> >> Filesystem permissions are one thing that Windows got right. There's >> ongoing work to bring that model to Linux, >> >>https://en.wikipedia.org/wiki/Richacls >> >> but they're going to make the same mistake again[0] and allow the group >> bits to act as a mask. That means mkdir, tar, cp, 7z -- anything that >> tries to mess with group bits -- isn't going to work. They'll be DOA >> just like POSIX ACLs were. >> >> I think you can manage this with incron and POSIX ACLs. Instead of >> running "chmod g+w", use sys-apps/apply-default-acl to reset the >> permissions to the defaults that you set. >> >> I wrote apply-default-acl to solve exactly this problem. You just need >> to figure out a way to run it whenever things get screwed up. Which >> means, whenever a file or directory is created. >> >> >> [0] http://www.bestbits.at/richacl/man/richacl.7.txt >> >> Changing the file mode permission bits: >> >>When changing the file mode permission bits with chmod(1), the >>owner, group, and other file permission bits are set to the >>permission bits in the new mode... In addition, the masked and >>write_through ACL flags are set. This has the effect of limiting the >>permissions granted by the ACL to the file mode permission bits... >> >> > > Hm, I'm confused. Is it not possible to somehow force > samba to set a user and a group as owners of a file or > of a directory which is being created on a share? > > If that was possible, couldn't I mount that share with > the uid and gid of the owner and group samba enforces, > which would then allow multiple local users to access > the files and directories on that share as one? Now you've added a whole new wrinkle that was never mentioned before - samba. Yes, samba can enforce the permissions you want on file system objects in shares it controls. To be accurate, it runs as root and presents the perms you want to the user, but only when accessing the files via samba. Look at these options in smb.conf create mask = 664 force create mode = 664 security mask = 664 force security mode = 664 directory mask = 2775 force directory mode = 2775 directory security mask = 2775 force directory security mode = 2775 With this you can achieve what you want, but you have to ensure that samba is the only way the users can access the files. I'm assuming you completely and correctly understand umask. -- Alan McKinnon alan.mckin...@gmail.com
Re: how to share a directory tree with files in it with multiple users (Re: [gentoo-user] local shared directory)
Michael Orlitzky schrieb: On 04/23/2016 10:42 AM, hw wrote: Has it become entirely impossible to share a directory tree and the files in it with multiple users when Linux is involved? This should be a very simple thing to accomplish. It was never possible. It's ridiculous, but there it is. The UNIX permissions model is too simple. ACLs were bolted on top, but most tools retain legacy behavior with respect to group masks that breaks default ACLs. You're seeing that same problem with your Samba share. Filesystem permissions are one thing that Windows got right. There's ongoing work to bring that model to Linux, https://en.wikipedia.org/wiki/Richacls but they're going to make the same mistake again[0] and allow the group bits to act as a mask. That means mkdir, tar, cp, 7z -- anything that tries to mess with group bits -- isn't going to work. They'll be DOA just like POSIX ACLs were. I think you can manage this with incron and POSIX ACLs. Instead of running "chmod g+w", use sys-apps/apply-default-acl to reset the permissions to the defaults that you set. I wrote apply-default-acl to solve exactly this problem. You just need to figure out a way to run it whenever things get screwed up. Which means, whenever a file or directory is created. [0] http://www.bestbits.at/richacl/man/richacl.7.txt Changing the file mode permission bits: When changing the file mode permission bits with chmod(1), the owner, group, and other file permission bits are set to the permission bits in the new mode... In addition, the masked and write_through ACL flags are set. This has the effect of limiting the permissions granted by the ACL to the file mode permission bits... Hm, I'm confused. Is it not possible to somehow force samba to set a user and a group as owners of a file or of a directory which is being created on a share? If that was possible, couldn't I mount that share with the uid and gid of the owner and group samba enforces, which would then allow multiple local users to access the files and directories on that share as one?
Re: [gentoo-user] extracting text, numbers from screencasts
Urs Schütz schrieb: On 04/08/16 11:30, Helmut Jarausch wrote: On 04/08/2016 03:26:53 PM, hw wrote: Hi, what would be the best approach to extract data from a screencast? The task is to acquire some data from the display of a GUI program used interactively by a user. There are a couple 'fields' (as in "designated areas of the display") in which the relevant data is being displayed while the program is being used. The acquired data needs to be entered into a mysql database, preferably as soon as possible. (The program needs windoze, and the sources are unavailable :( ) The idea is to make a screen recording and postprocess the recording with some sort of OCR software. This might require using ffmpeg (or the like) to create a single image from each frame of the recording; then treat each image with an OCR software to get the interesting data which can then be entered into the database. Data to extract is mostly numbers. The relevant fields can be expected to be either filled or empty. The FPS rate of the recording can be kept reasonably low, like 1 FPS, or perhaps even less, depending on how frequent the relevant fields change. Using tesseract comes to mind, but after reading that "Tesseract's output will be very poor quality if the input images are not preprocessed to suit it: Images (especially screenshots) must be scaled up such that the text x-height is at least 20 pixels,[12] any rotation or skew must be corrected or no text will be recognized, low-frequency changes in brightness must be high-pass filtered, or Tesseract's binarization stage will destroy much of the page, and dark borders must be manually removed, or they will be misinterpreted as characters."[1] I'm even more doubtful that this would produce usable results with sufficient reliability. So what might be the best way to get text/numbers out of what a program displays? [1]: https://en.wikipedia.org/wiki/Tesseract_(software) I can't help with Gentoo. Try to find an old (free) version of FineReader which runs under wine. If you do it only occasionally, transfer the image to an Android phone where there a good and cheap OCR apps, even FineReader. I had some surprisingly good experience with tesseact in digitizing photographed pages of an old book recently. So I gave it a try today with a cropped screenshot of thunderbird. $ convert scrsht.png -type Grayscale -filter point -resize 300% -normalize upscaled.png $ tesseract -l eng upscaled.png out $ less out.txt convert is from media-gfx/imagemagick-6.9.0.3 tesseract is app-text/tesseract-3.04.00-r2 Here are my findings: Any graphical elements sized similar to an character appear as strange letters. Recognition of serif fonts was better than sans-serif fonts, even at smaller font size. Text which can be spell-checked was nearly perfectly recognized. Gentoo-specific words like "GLSA" and "NVMe" was not correctly recognized. Selected text (white on blue background) was poorly recognized. Dates were not recognized correctly. Times were correctly read. "convert" time for a initial screenshot size of 956 x 639 pixels was 0.4 seconds. "tesseract" time was a little more than 6s on an Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz, without opencl. The image conversion and tesseract ocr could easily be scripted. Considering the amount of video, 6s per frame would be too long. The application is time-critical such that I have a window of about 10s to extract and to process the data from at least 8 video streams. Recording at only 10 FPS and taking 8 seconds to extract and to process the data would require 640s per 10s window, and I don't have about 70 CPUs available to do the work. To make things worse, it's an ongoing process, i. e. dividing it into 10s windows is too artificial to keep things running as smoothly as they should. In short I would say that the following steps would help with tesseract: Avoid GUI with a lot of graphics. Try to screenshot just the relevant areas. Increase GUI font size. Configure GUI to use a well known serif font, or train tesseract for the specific font used. Configure GUI to use high contrasts, avoid colors which get converted to gray. Tesseract time could be improved by enabling opencl. I would be interested to hear about your findings with numerical data, and which approach finally works for you. Thank you very much for giving me a better idea of what I'm looking at! Considering it, I have resorted to use autohotkey, which has the ability to actually read data from GUI-elements. It also can make requests to web servers. With that, things become a hell of a lot simpler than trying to process video streams, for I can simply read the data and send it over to the web server which puts it into the database where it needs to end up anyway. Unfortunately, the application the data is being read from has a bad habit of renaming the GUI-elements I need to read. This makes things difficult again. Autohotkey is a really nice tool, thoug
[gentoo-user] Re: Will installing grub-2.02 break my grub-0.97 setup?
On 2016-05-06, James wrote: > Grant Edwards gmail.com> writes: > >> I'd like to to install winusb, and it appears to depend on grub-2: >> $ sudo emerge -av winusb > > Ok, so I've never used winusb, so excuse me for asking a few dumb > questions here. Even after reading a bit and searching around, I > have these dumb questions. I did not find sufficient reading > materials to 'turn the light on' as to when and why and how this > winusb is used. > > 1. So winusb can put a window (vista-->8) image on a usbstick that will >boot most x86 orx86-64 hardware with the appropriate windows binary? >The hardware can then be installed with the windows image? That's my understanding. [I haven't actually done it yet.] Many of the machines I use no longer have (working) optical drives. When doing OS installs I almost always use USB flash drives. I've been doing Linux installs that way for yonks. Most Linux OS distro .iso images are already "hybrid" so they boot as-is from a block storage device. In my experience, those that aren't can be fixed up with a simple "isohybrid" command. Now I want to stop buring Windows DVDs. > 2. winusb can be used as a live_windows on a linux system where >changes are retain on the usb stick? No, I don't think so. > 3. winusb can be used to install windows in a VM? Presumably -- if you can boot the VM from a USB storage device. > 4. winusb can be used to install windows in a container? I don't know enough about containers to posit an answer. -- Grant
[gentoo-user] Re: Will installing grub-2.02 break my grub-0.97 setup?
On 2016-05-06, Neil Bothwick wrote: > On Fri, 6 May 2016 16:21:28 + (UTC), Grant Edwards wrote: > >> >> Thanks. That's good to know -- I'll definitely set things up so I'm >> >> not running winusb as root. >> > >> > Well, you could always reinstall grub-0 before rebooting, to make >> > sure. >> >> I just created a systemsrescuecd bootable USB flash drive that can be >> be used to re-install grub-0 in the MBR if something does go wrong. >> But, running winusb as a non-privlidged user should prevent any >> collateral damage to the MBR. > > It should also prevent winusb writing to the MBR of the USB stick, > which sort of defeats the point. Nope. I have my system configured so that my USB flash drives are writable for users in the group "usb" -- of which I am one. -- Grant
Re: [gentoo-user] extracting text, numbers from screencasts
Helmut Jarausch schrieb: On 04/08/2016 03:26:53 PM, hw wrote: Hi, what would be the best approach to extract data from a screencast? The task is to acquire some data from the display of a GUI program used interactively by a user. There are a couple 'fields' (as in "designated areas of the display") in which the relevant data is being displayed while the program is being used. The acquired data needs to be entered into a mysql database, preferably as soon as possible. (The program needs windoze, and the sources are unavailable :( ) The idea is to make a screen recording and postprocess the recording with some sort of OCR software. This might require using ffmpeg (or the like) to create a single image from each frame of the recording; then treat each image with an OCR software to get the interesting data which can then be entered into the database. Data to extract is mostly numbers. The relevant fields can be expected to be either filled or empty. The FPS rate of the recording can be kept reasonably low, like 1 FPS, or perhaps even less, depending on how frequent the relevant fields change. Using tesseract comes to mind, but after reading that "Tesseract's output will be very poor quality if the input images are not preprocessed to suit it: Images (especially screenshots) must be scaled up such that the text x-height is at least 20 pixels,[12] any rotation or skew must be corrected or no text will be recognized, low-frequency changes in brightness must be high-pass filtered, or Tesseract's binarization stage will destroy much of the page, and dark borders must be manually removed, or they will be misinterpreted as characters."[1] I'm even more doubtful that this would produce usable results with sufficient reliability. So what might be the best way to get text/numbers out of what a program displays? [1]: https://en.wikipedia.org/wiki/Tesseract_(software) I can't help with Gentoo. Try to find an old (free) version of FineReader which runs under wine. If you do it only occasionally, transfer the image to an Android phone where there a good and cheap OCR apps, even FineReader. It would be too much video to process. Besides, phones are ok for making phone calls and entirely incompatible with computers, which makes them useless for anything else but making phone calls.
Re: [gentoo-user] Calm
On 4/18/2016 3:32 PM, Marc Joliet wrote: On Saturday 16 April 2016 14:48:51 Alan Mackenzie wrote: Hello, Gentoo. I'm just saying hello to confirm I'm still here. For many months now, Gentoo has simply worked for me, without problems. I sync my system several times a week, and emerge just works. The last bit of excitement I had was in early 2015 when I was trying to sort out the mess in my xfce4 system after gnome-3 had been made stable. In the end, I gave up and reinstalled Gentoo, which this time took me only a week. Admittedly, there's very little which is cutting edge on my system - the box is 6½ years old, it boots with lilo on an old fashioned BIOS, my filesystems are ext3 (or in one case, ext2) on spinning rust. The only remotely adventurous things I've got are RAID-1 (via the kernel) and lvm2. So a big thanks to all the developers who've brought about this happy state of affairs! I concur! The first three entries in this thread, and the last few are what keep guys like me encouraged. Loved reading about all your success stories. Thanks for sharing! :) -maffblaster