> I need to install these 5 extensions? Is that really the solution? Shouldn’t 
> they be automatically installed?

Yes. They are likely to be already provided alongside PHP or maybe not
activated. For example on Debian and its derivatives they are packages
as php-dom, php-intl...

More conveniently you might also just use the public Wsexport
instance: https://ws-export.wmcloud.org/
It's not suitable if you want to export tens of thousands of pages but
for small workloads it should be fine.

Thomas

Le mar. 20 sept. 2022 à 18:43, Julius Hamilton
<[email protected]> a écrit :
>
> I am currently trying to install ws-export (
> https://github.com/wikimedia/ws-export) and I’m having trouble with 
> “compose”, would anyone know anything about this?
>
> > composer install --no-dev
>
> > Your lock file does not contain a compatible set of packages. Please run 
> > composer update.
>
> > composer update
>
> > Your requirements could not be resolved to an installable set of packages.
>
> > Problem 1
>     - Root composer.json requires PHP extension ext-dom * but it is missing 
> from your system. Install or enable PHP's dom extension.
>   Problem 2
>     - Root composer.json requires PHP extension ext-intl * but it is missing 
> from your system. Install or enable PHP's intl extension.
>   Problem 3
>     - Root composer.json requires PHP extension ext-sqlite3 * but it is 
> missing from your system. Install or enable PHP's sqlite3 extension.
>   Problem 4
>     - Root composer.json requires PHP extension ext-zip * but it is missing 
> from your system. Install or enable PHP's zip extension.
>   Problem 5
>     - symfony/framework-bundle[v5.4.0, ..., v5.4.12] require ext-xml * -> it 
> is missing from your system. Install or enable PHP's xml extension.
>     - Root composer.json requires symfony/framework-bundle 5.4.* -> 
> satisfiable by symfony/framework-bundle[v5.4.0, ..., v5.4.12].
>
> To enable extensions, verify that they are enabled in your .ini files:
>     - /etc/php/7.4/cli/php.ini
>     - /etc/php/7.4/cli/conf.d/10-opcache.ini
>     - /etc/php/7.4/cli/conf.d/10-pdo.ini
>     - /etc/php/7.4/cli/conf.d/20-calendar.ini
>     - /etc/php/7.4/cli/conf.d/20-ctype.ini
>     - /etc/php/7.4/cli/conf.d/20-exif.ini
>     - /etc/php/7.4/cli/conf.d/20-ffi.ini
>     - /etc/php/7.4/cli/conf.d/20-fileinfo.ini
>     - /etc/php/7.4/cli/conf.d/20-ftp.ini
>     - /etc/php/7.4/cli/conf.d/20-gettext.ini
>     - /etc/php/7.4/cli/conf.d/20-iconv.ini
>     - /etc/php/7.4/cli/conf.d/20-json.ini
>     - /etc/php/7.4/cli/conf.d/20-phar.ini
>     - /etc/php/7.4/cli/conf.d/20-posix.ini
>     - /etc/php/7.4/cli/conf.d/20-readline.ini
>     - /etc/php/7.4/cli/conf.d/20-shmop.ini
>     - /etc/php/7.4/cli/conf.d/20-sockets.ini
>     - /etc/php/7.4/cli/conf.d/20-sysvmsg.ini
>     - /etc/php/7.4/cli/conf.d/20-sysvsem.ini
>     - /etc/php/7.4/cli/conf.d/20-sysvshm.ini
>     - /etc/php/7.4/cli/conf.d/20-tokenizer.ini
> You can also run `php --ini` in a terminal to see which files are used by PHP 
> in CLI mode.
> Alternatively, you can run Composer with `--ignore-platform-req=ext-dom 
> --ignore-platform-req=ext-intl --ignore-platform-req=ext-sqlite3 
> --ignore-platform-req=ext-zip --ignore-platform-req=ext-xml` to temporarily 
> ignore these required extensions.
>
>
>
> I need to install these 5 extensions? Is that really the solution? Shouldn’t 
> they be automatically installed?
>
> Thank you,
> Julius
>
> On Tue 20. Sep 2022 at 17:41, Julius Hamilton <[email protected]> 
> wrote:
>>
>> Thank you very much.
>>
>> > Did you look at the wikitext of that page?
>>
>> I did now, I see that the text displayed is not actually present in the 
>> wikitext / source text. I am seeing these ".djvu include" lines:
>>
>> <pages index="A simplified grammar of the Swedish language.djvu" include=7 />
>>
>> What is this? Is it a common format for a Wikisource book?
>>
>> > prop=extracts works, but I would say it's a poor fit for many (most?) 
>> > wikisource pages.
>>
>> Why? Because it just pulls out sentences from the wikitext? What is 
>> different about the functioning of prop=revisions, for example?
>>
>> > Plaintext as in wikitext or in parsed html converted to plaintext?
>>
>> Whatever you think is preferable, the point is to have some clean, readable 
>> text. If the parsed HTML has any awkward formatting issues, I might prefer 
>> the wikitext, or vice versa. Whichever is easier to work with. Technically 
>> since wikitext is a markup format it might be easier to pull out from 
>> specific fields you are seeking? I don't know.
>>
>> > You could use something like this to fetch every page
>>
>> Thanks. I tried replacing the title with a different, more normal book and 
>> it didn't seem to work.
>>
>> https://en.wikisource.org/w/api.php?generator=allpages&action=query&prop=revisions&rvprop=content&rvslots=main&gapprefix=Moby-Dick_(1851)_US_edition
>>
>>
>> I guess it's the same problem, "revisions" also pulls out wikitext but 
>> Wikisource wikitext pulls in its text from separate files?
>>
>>
>> So would the "parse" action of the API be the tool of choice?
>>
>>
>> >  the WS Export tool can do that
>>
>>
>> Thanks very much, will give that a shot next.
>>
>>
>> Thank you,
>>
>> Julius
>>
>>
>>
>>
>>
>>
>> On Tue, Sep 20, 2022 at 2:14 AM Sam Wilson <[email protected]> wrote:
>>>
>>>
>>>
>>>
>>>> How can I get the full plaintext from an entire book on Wikisource with 
>>>> the API?
>>>
>>>
>>> Plaintext as in wikitext or in parsed html converted to plaintext?
>>>
>>>
>>>
>>> If it's the latter, the WS Export tool can do that: 
>>> https://ws-export.wmcloud.org/?format=txt
>>>
>>>
>>> _______________________________________________
>>> Mediawiki-api mailing list -- [email protected]
>>> To unsubscribe send an email to [email protected]
>
> _______________________________________________
> Mediawiki-api mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
_______________________________________________
Mediawiki-api mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to