> I need to install these 5 extensions? Is that really the solution? Shouldn’t > they be automatically installed?
Yes. They are likely to be already provided alongside PHP or maybe not activated. For example on Debian and its derivatives they are packages as php-dom, php-intl... More conveniently you might also just use the public Wsexport instance: https://ws-export.wmcloud.org/ It's not suitable if you want to export tens of thousands of pages but for small workloads it should be fine. Thomas Le mar. 20 sept. 2022 à 18:43, Julius Hamilton <[email protected]> a écrit : > > I am currently trying to install ws-export ( > https://github.com/wikimedia/ws-export) and I’m having trouble with > “compose”, would anyone know anything about this? > > > composer install --no-dev > > > Your lock file does not contain a compatible set of packages. Please run > > composer update. > > > composer update > > > Your requirements could not be resolved to an installable set of packages. > > > Problem 1 > - Root composer.json requires PHP extension ext-dom * but it is missing > from your system. Install or enable PHP's dom extension. > Problem 2 > - Root composer.json requires PHP extension ext-intl * but it is missing > from your system. Install or enable PHP's intl extension. > Problem 3 > - Root composer.json requires PHP extension ext-sqlite3 * but it is > missing from your system. Install or enable PHP's sqlite3 extension. > Problem 4 > - Root composer.json requires PHP extension ext-zip * but it is missing > from your system. Install or enable PHP's zip extension. > Problem 5 > - symfony/framework-bundle[v5.4.0, ..., v5.4.12] require ext-xml * -> it > is missing from your system. Install or enable PHP's xml extension. > - Root composer.json requires symfony/framework-bundle 5.4.* -> > satisfiable by symfony/framework-bundle[v5.4.0, ..., v5.4.12]. > > To enable extensions, verify that they are enabled in your .ini files: > - /etc/php/7.4/cli/php.ini > - /etc/php/7.4/cli/conf.d/10-opcache.ini > - /etc/php/7.4/cli/conf.d/10-pdo.ini > - /etc/php/7.4/cli/conf.d/20-calendar.ini > - /etc/php/7.4/cli/conf.d/20-ctype.ini > - /etc/php/7.4/cli/conf.d/20-exif.ini > - /etc/php/7.4/cli/conf.d/20-ffi.ini > - /etc/php/7.4/cli/conf.d/20-fileinfo.ini > - /etc/php/7.4/cli/conf.d/20-ftp.ini > - /etc/php/7.4/cli/conf.d/20-gettext.ini > - /etc/php/7.4/cli/conf.d/20-iconv.ini > - /etc/php/7.4/cli/conf.d/20-json.ini > - /etc/php/7.4/cli/conf.d/20-phar.ini > - /etc/php/7.4/cli/conf.d/20-posix.ini > - /etc/php/7.4/cli/conf.d/20-readline.ini > - /etc/php/7.4/cli/conf.d/20-shmop.ini > - /etc/php/7.4/cli/conf.d/20-sockets.ini > - /etc/php/7.4/cli/conf.d/20-sysvmsg.ini > - /etc/php/7.4/cli/conf.d/20-sysvsem.ini > - /etc/php/7.4/cli/conf.d/20-sysvshm.ini > - /etc/php/7.4/cli/conf.d/20-tokenizer.ini > You can also run `php --ini` in a terminal to see which files are used by PHP > in CLI mode. > Alternatively, you can run Composer with `--ignore-platform-req=ext-dom > --ignore-platform-req=ext-intl --ignore-platform-req=ext-sqlite3 > --ignore-platform-req=ext-zip --ignore-platform-req=ext-xml` to temporarily > ignore these required extensions. > > > > I need to install these 5 extensions? Is that really the solution? Shouldn’t > they be automatically installed? > > Thank you, > Julius > > On Tue 20. Sep 2022 at 17:41, Julius Hamilton <[email protected]> > wrote: >> >> Thank you very much. >> >> > Did you look at the wikitext of that page? >> >> I did now, I see that the text displayed is not actually present in the >> wikitext / source text. I am seeing these ".djvu include" lines: >> >> <pages index="A simplified grammar of the Swedish language.djvu" include=7 /> >> >> What is this? Is it a common format for a Wikisource book? >> >> > prop=extracts works, but I would say it's a poor fit for many (most?) >> > wikisource pages. >> >> Why? Because it just pulls out sentences from the wikitext? What is >> different about the functioning of prop=revisions, for example? >> >> > Plaintext as in wikitext or in parsed html converted to plaintext? >> >> Whatever you think is preferable, the point is to have some clean, readable >> text. If the parsed HTML has any awkward formatting issues, I might prefer >> the wikitext, or vice versa. Whichever is easier to work with. Technically >> since wikitext is a markup format it might be easier to pull out from >> specific fields you are seeking? I don't know. >> >> > You could use something like this to fetch every page >> >> Thanks. I tried replacing the title with a different, more normal book and >> it didn't seem to work. >> >> https://en.wikisource.org/w/api.php?generator=allpages&action=query&prop=revisions&rvprop=content&rvslots=main&gapprefix=Moby-Dick_(1851)_US_edition >> >> >> I guess it's the same problem, "revisions" also pulls out wikitext but >> Wikisource wikitext pulls in its text from separate files? >> >> >> So would the "parse" action of the API be the tool of choice? >> >> >> > the WS Export tool can do that >> >> >> Thanks very much, will give that a shot next. >> >> >> Thank you, >> >> Julius >> >> >> >> >> >> >> On Tue, Sep 20, 2022 at 2:14 AM Sam Wilson <[email protected]> wrote: >>> >>> >>> >>> >>>> How can I get the full plaintext from an entire book on Wikisource with >>>> the API? >>> >>> >>> Plaintext as in wikitext or in parsed html converted to plaintext? >>> >>> >>> >>> If it's the latter, the WS Export tool can do that: >>> https://ws-export.wmcloud.org/?format=txt >>> >>> >>> _______________________________________________ >>> Mediawiki-api mailing list -- [email protected] >>> To unsubscribe send an email to [email protected] > > _______________________________________________ > Mediawiki-api mailing list -- [email protected] > To unsubscribe send an email to [email protected] _______________________________________________ Mediawiki-api mailing list -- [email protected] To unsubscribe send an email to [email protected]
