Re: [basex-talk] get and extract .gz files from web

Marc van Grootel Wed, 27 Jan 2016 01:11:26 -0800

Hi Andy,

Nice use of syntax (though you have to loose the semi-colon of course).


Visually i like the arrow operator a lot. Looks like a visual pipeline

    "https://wiki.mozilla.org/images/f/ff/Example.json.gz";
    => fetch:binary()
    => archive:extract-text()

I also think that this could be a bug or at least a good improvement
to make as the docs say gzip archives can be created. Christian, you
think we should file an issue for this?

--Marc

On Tue, Jan 26, 2016 at 9:51 PM, Andy Bunce <bunce.a...@gmail.com> wrote:
> Hi Marco,
>
> I get the same. This works:
>
> ````
> "https://wiki.mozilla.org/images/f/ff/Example.json.gz";
> !fetch:binary(.)
> !archive:extract-text(.)
> ````
>
> But this returns empty:
>
> ````
> "https://wiki.mozilla.org/images/f/ff/Example.json.gz";
> !fetch:binary(.)
> !archive:entries(.)
>
> <archive:entry xmlns:archive="http://basex.org/modules/archive"/>
> ````
>
> Expecting to see "example.json"
>
> Could this be a bug?
>
> /Andy
>
>
>
> On 26 January 2016 at 18:51, Maximilian Gärber <mgaer...@arcor.de> wrote:
>>
>> Hi,
>>
>> I think this should work, I use it for OData requests from IIS.
>>
>> Need to dig through the source...but I used one oft the extract-binary
>> functions
>>
>> Regards, Max
>>
>> Am 26.01.2016 16:04 schrieb "Marc van Grootel"
>> <marc.van.groo...@gmail.com>:
>>>
>>> Well, shelling out wasn't so hard even on Windows with cygwin tools it's
>>> simply
>>>
>>>     proc:execute('gunzip', $path-to-gzipped-file)
>>>
>>> Worked quite transparently as it extracts the files and removes the
>>> .gz file. Would be nice if there's a pure XQuery solution but for now
>>> I'm okay.
>>>
>>> Cheers,
>>>
>>> On Tue, Jan 26, 2016 at 3:13 PM, Marc van Grootel
>>> <marc.van.groo...@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I hoped that I could use archive module to also extract gzipped files.
>>> > I need to fetch/sync large XML from a web service that has the option
>>> > of getting files with gzip encoding (to be nice to the web server).
>>> >
>>> > First attempt was to explicitly get the gz file via the URL and then
>>> > treat it like an archive binary (extracting it with the recipe from
>>> > the archive module page). The entries XML I get is empty so I suppose
>>> > that I cannot read .gz
>>> >
>>> > Second attempt was to specify Accept-Encoding = gzip which indeed
>>> > delivers the XML as a binary. But I probably run into the same issue
>>> > when trying to extract.
>>> >
>>> > Is there a way to do the extraction of .gz encoded files without
>>> > having to shell out to some kind of unzipper?
>>> >
>>> > Cheers,
>>> > --Marc
>>>
>>>
>>>
>>> --
>>> --Marc
>
>



-- 
--Marc

Re: [basex-talk] get and extract .gz files from web

Reply via email to