I am currently trying to install ws-export ( https://github.com/wikimedia/ws-export) and I’m having trouble with “compose”, would anyone know anything about this?
> composer install --no-dev > Your lock file does not contain a compatible set of packages. Please run composer update. > composer update > Your requirements could not be resolved to an installable set of packages. > Problem 1 - Root composer.json requires PHP extension ext-dom * but it is missing from your system. Install or enable PHP's dom extension. Problem 2 - Root composer.json requires PHP extension ext-intl * but it is missing from your system. Install or enable PHP's intl extension. Problem 3 - Root composer.json requires PHP extension ext-sqlite3 * but it is missing from your system. Install or enable PHP's sqlite3 extension. Problem 4 - Root composer.json requires PHP extension ext-zip * but it is missing from your system. Install or enable PHP's zip extension. Problem 5 - symfony/framework-bundle[v5.4.0, ..., v5.4.12] require ext-xml * -> it is missing from your system. Install or enable PHP's xml extension. - Root composer.json requires symfony/framework-bundle 5.4.* -> satisfiable by symfony/framework-bundle[v5.4.0, ..., v5.4.12]. To enable extensions, verify that they are enabled in your .ini files: - /etc/php/7.4/cli/php.ini - /etc/php/7.4/cli/conf.d/10-opcache.ini - /etc/php/7.4/cli/conf.d/10-pdo.ini - /etc/php/7.4/cli/conf.d/20-calendar.ini - /etc/php/7.4/cli/conf.d/20-ctype.ini - /etc/php/7.4/cli/conf.d/20-exif.ini - /etc/php/7.4/cli/conf.d/20-ffi.ini - /etc/php/7.4/cli/conf.d/20-fileinfo.ini - /etc/php/7.4/cli/conf.d/20-ftp.ini - /etc/php/7.4/cli/conf.d/20-gettext.ini - /etc/php/7.4/cli/conf.d/20-iconv.ini - /etc/php/7.4/cli/conf.d/20-json.ini - /etc/php/7.4/cli/conf.d/20-phar.ini - /etc/php/7.4/cli/conf.d/20-posix.ini - /etc/php/7.4/cli/conf.d/20-readline.ini - /etc/php/7.4/cli/conf.d/20-shmop.ini - /etc/php/7.4/cli/conf.d/20-sockets.ini - /etc/php/7.4/cli/conf.d/20-sysvmsg.ini - /etc/php/7.4/cli/conf.d/20-sysvsem.ini - /etc/php/7.4/cli/conf.d/20-sysvshm.ini - /etc/php/7.4/cli/conf.d/20-tokenizer.ini You can also run `php --ini` in a terminal to see which files are used by PHP in CLI mode. Alternatively, you can run Composer with `--ignore-platform-req=ext-dom --ignore-platform-req=ext-intl --ignore-platform-req=ext-sqlite3 --ignore-platform-req=ext-zip --ignore-platform-req=ext-xml` to temporarily ignore these required extensions. I need to install these 5 extensions? Is that really the solution? Shouldn’t they be automatically installed? Thank you, Julius On Tue 20. Sep 2022 at 17:41, Julius Hamilton <juliushamilton...@gmail.com> wrote: > Thank you very much. > > > Did you look at the wikitext of that page? > > I did now, I see that the text displayed is not actually present in the > wikitext / source text. I am seeing these ".djvu include" lines: > > <pages index="A simplified grammar of the Swedish language.djvu" include=7 > /> > > What is this? Is it a common format for a Wikisource book? > > > prop=extracts works, but I would say it's a poor fit for many (most?) > wikisource pages. > > Why? Because it just pulls out sentences from the wikitext? What is > different about the functioning of prop=revisions, for example? > > > Plaintext as in wikitext or in parsed html converted to plaintext? > > Whatever you think is preferable, the point is to have some clean, > readable text. If the parsed HTML has any awkward formatting issues, I > might prefer the wikitext, or vice versa. Whichever is easier to work with. > Technically since wikitext is a markup format it might be easier to pull > out from specific fields you are seeking? I don't know. > > > You could use something like this to fetch every page > > Thanks. I tried replacing the title with a different, more normal book and > it didn't seem to work. > > > https://en.wikisource.org/w/api.php?generator=allpages&action=query&prop=revisions&rvprop=content&rvslots=main&gapprefix=Moby-Dick_(1851)_US_edition > > > I guess it's the same problem, "revisions" also pulls out wikitext but > Wikisource wikitext pulls in its text from separate files? > > > So would the "parse" action of the API be the tool of choice? > > > > the WS Export tool can do that > > > Thanks very much, will give that a shot next. > > > Thank you, > > Julius > > > > > > > On Tue, Sep 20, 2022 at 2:14 AM Sam Wilson <s...@samwilson.id.au> wrote: > >> >> >> >> How can I get the full plaintext from an entire book on Wikisource with >>> the API? >>> >> >> Plaintext as in wikitext or in parsed html converted to plaintext? >> >> >> >> If it's the latter, the WS Export tool can do that: >> https://ws-export.wmcloud.org/?format=txt >> >> >> _______________________________________________ >> Mediawiki-api mailing list -- mediawiki-api@lists.wikimedia.org >> To unsubscribe send an email to mediawiki-api-le...@lists.wikimedia.org >> >
_______________________________________________ Mediawiki-api mailing list -- mediawiki-api@lists.wikimedia.org To unsubscribe send an email to mediawiki-api-le...@lists.wikimedia.org